Download the latest version of VEDA-FE (45824) and VEDA-BE (492012)

Veda Application Installation guide


Appropriate solve parameter for large model
#1
Dear All

I thought of sharing some of our experience is solving a computational intensive TIMES models. Here goes the background…..

Some of you may ware that we at Paul Scherrer Institut (PSI) developed a Swiss TIMES electricity model (STEM-E) with an hourly diurnal representation, i.e. 288 annual timeslices. This model runs well and we have publications too.

Since January 2013, we extended the electricity model to a whole energy system model (by including four end use sectors, with 9 industrial subsectors). Since then, we have computational issues in solving the model, despite a PC upgrade (Intel Xeon® CPU E31245 @ 3.30GHz, 8 Cores, 16.0 GB RAM, Windows 7 professional 64-bit). Literally we were unable to solve the model. We often faced with out of memory or resource limits….. issues We worked around, but nothing helped. Hence in May 2014, we reduced the number of timeslices to 144 (from 288) and number of periods to 14 (from 18 in STEM-E). Then, with primal simplex, the model was solved in LP but took about 55+ hours. We were unable to solve in Barrier algorithm, which was giving ‘out of memory’ error……I have put together the model size below for your reference

MODEL STATISTICS
BLOCKS OF EQUATIONS 84
SINGLE EQUATIONS 1,074,941
BLOCKS OF VARIABLES 11
SINGLE VARIABLES 996,198 12 projected
NON ZERO ELEMENTS 6,066,235

Though the model was solve in Primal, the long computational time was not affordable because we are at the very beginning of the model development. Again we thought of reducing the number of timeslices to 72 or so. However, we were not sure whether we are making right decision. I was not personally convinced on the computational limits! Hence we were experimenting with cplex parameter after a few consultation with Amit, Gary, Antti, my colleague Dr.Panos Evangelos. ……. It took us a while to go through the problem. It was really an iterative process to experiment various cplex options and combination.   The good news is that eventually we managed to solve our model!   Thus I thought of sharing the new cplex parameters
workmem 7000 (note that the 7000 refer to the allocated memory of 7 GB for this problem)
     memoryemphasis 1
        aggind 4
        preind 1
        lpmethod 4
        barcrossalg 0
        THREADS=-1

Clearly our experience indicates that smart approach is required in choosing solver parameters. At this stage, I must acknowledge the support and expertise of my colleague Dr.Panos Evangelos.

I hope this posting would help those who are running large models.   Please let us know, if you see any further scope to choose an optimal solver or solver parameters
Reply
#2
Thanks for this post Kannan.

I tried *memoryemphasis 1* on a 3.5 million row problem. RAM usage dropped by 50-60%. BUT, a large file was written to the disk - around the same size as the reduction in RAM usage, and CPLEX could not really use more than one core. Looking at the processes in task manager, I could see a few spikes but basically it was solving on one core. This was probably due to the disk I/O bottleneck.

If you also see a low CPU utilization, then you will get a significant performance improvement in the barrier phase without this option.
Reply
#3
Amit,
yes, you are right. We noticed the same behaviour on processor usage. Basically, the hard disk is being used as buffer memory for occasional spiking memory demand, which eventually slow down a bit.
on the other hand, if the spike is more than physical RAM, then GAMS abort immediately and report a out of memory error. So, there are some trade-off.
Reply
#4
Amit
did you also change the 'workmem' parameter as well?

we understood that the 'memoryemphasis 1' must be used with 'workmem xxxx' otherwise it don't help as cplx take a default workmem!
Reply
#5
Dear Kannan and Amit,

I am experiencing a memory problem (Cplex error 1001), while running my model. I would like do try changing the Cplex parameters, but I am not sure how I could do this. Could you please show me th best way on doing so? Any other suggestions on how to overcome this issue?

Thanks,
Regards,
Reply
#6
On case manager, if you click the button where you select the solver, a solver options form will open. check the box "Reduce memory usage" on the bottom left.

However, this will not help you much and it is likely that you need to upgrade the RAM on your machine. How much do you have, and how many equations/variables do you have in your model? You can find this information in the .LST file.
Reply
#7
Hi Amit,


Thanks for your response.

Yes, I have already checked the "Reduce memory usage" box, but it didnt work. I have an i7 @ 2.40 GHz with memory RAM of 8 GB (Surely this is a bottleneck - I was intending to add up to 16 GB). 

My model statistics:

MODEL STATISTICS

BLOCKS OF EQUATIONS         100     SINGLE EQUATIONS    1,831,629
BLOCKS OF VARIABLES           8     SINGLE VARIABLES    1,715,530  585 projected
NON ZERO ELEMENTS     7,549,743

Thanks,
Reply
#8
Just to double-check, you can observe the task manager when the run is going on. To confirm that the GAMS is using all the memory, and also to identify any other processes that might be using a lot of memory. In case you can get rid of some.

I suggest you use a coarser period definition while waiting for the hardware upgrade.
Reply
#9
Thanks, Amit.

A friend, also a Veda user, has provided me with a CPLEX.OPT file with changed Cplex parameters. Now, I got rid of "Cplex error 1001" and solver doesnt crash anymore. But the process is very slow, 2 lines in the solver iteration proccess took around 10 hours to run. 

While running the model, the memory usage was almost 100%, much of time, but CPU usage was quite low, around 10%. Thus, indeed it seems to be a RAM bottleneck issue.
Reply
#10
Dear large-model-runners,

we are facing the weird behavior that our model solves faster on a regular desktop computer than on a powerful workstation (on the latter it sometimes doesn't find a solution after several days of running time).

Desktop Hardware: Intel Core i7-4790 w/ 4 cores (8 threads) @ 3.6 GHz, 16 GB RAM, Windows 7 x64
Workstation Hardware: Intel Xeon E5-2667v3 w/ 16 cores (32 threads) @ 3.2 GHz, 256 GB RAM, Windows 8.1 Pro x64

I tried it out with

  • the same model (to be sure, I zipped the entire folder, copied it to the workstation, unzipped and loaded it into VEDA without any import errors)

  • the same VEDA-Version (VFE 4.5.609)

  • the same GAMS Source Code (GAMS_SRCTIMESV414)

  • the same GAMS-Version (24.4.1 r50296)

  • the same CPLEX-Version (12.6.1.0)

  • the same CPLEX.opt file (the only difference is, of course, the THREADS= integer)

  • the same RUNFile_Template

  • the same VEDA Settings (at least, I couldn't find any difference in the User Options and in the Control Panel)

Since I do not face any memory issues on both machines, solver options like workmem or memoryemphasis are not required (yet).
I tried aggind 4 on the workstation which did not help.
So, in an attempt of troubleshooting I observed the following three things:


1. The problem sizes do not match exactly
...even though quite close. This already raises the question, where these differences originate from.

Desktop

Code:
---   4,321,737 rows  5,760,285 columns  25,952,315 non-zeroes
--- Executing CPLEX: elapsed 0:02:01.956

IBM ILOG CPLEX   24.4.1 r50296 Released Dec 20, 2014 WEI x86 64bit/MS Windows
--- GAMS/Cplex licensed for continuous and discrete problems.
Cplex 12.6.1.0

Reading parameter(s) from "D:\VEDA\Veda_FE\GAMS_WRKTIMES\cplex.opt"
>>  scaind 0
>>  rerun yes
>>  iis yes
>>  lpmethod 4
>>  baralg 1
>>  barcrossalg 0
>>  barorder 2
>>  THREADS=4
Finished reading from "D:\VEDA\Veda_FE\GAMS_WRKTIMES\cplex.opt"
Reading data...
Starting Cplex...
Space for names approximately 541.51 Mb
Use option 'names no' to turn use of names off
Tried aggregator 1 time.
Aggregator has done 915385 substitutions...
LP Presolve eliminated 1121680 rows and 1693544 columns.
Aggregator did 915385 substitutions.
Reduced LP has 2284672 rows, 3151356 columns, and 15283699 nonzeros.
Presolve time = 33.42 sec. (69251.93 ticks)
Parallel mode: using up to 4 threads for barrier.

***NOTE: Found 3328 dense columns.

Number of nonzeros in lower triangle of A*A' = 8136064
Total time for approximate-min-fill ordering = 1.17 sec. (661.63 ticks)
Summary statistics for Cholesky factor:
 Threads                   = 4
 Rows in Factor            = 2288000
 Integer space required    = 21736118
 Total non-zeros in factor = 221953926
 Total FP ops to factor    = 1020198134836


Workstation

Code:
---   4,321,765 rows  5,760,313 columns  25,991,340 non-zeroes
--- Executing CPLEX: elapsed 0:02:27.543

IBM ILOG CPLEX   24.4.1 r50296 Released Dec 20, 2014 WEI x86 64bit/MS Windows
--- GAMS/Cplex licensed for continuous and discrete problems.
Cplex 12.6.1.0

Reading parameter(s) from "C:\VEDA\VEDA_FE\GAMS_WRKTIMES\cplex.opt"
>>  scaind 0
>>  rerun yes
>>  iis yes
>>  lpmethod 4
>>  baralg 1
>>  barcrossalg 0
>>  barorder 2
>>  THREADS=16
Finished reading from "C:\VEDA\VEDA_FE\GAMS_WRKTIMES\cplex.opt"
Reading data...
Starting Cplex...
Space for names approximately 541.51 Mb
Use option 'names no' to turn use of names off
Tried aggregator 1 time.
Aggregator has done 915385 substitutions...
LP Presolve eliminated 1121708 rows and 1771524 columns.
Aggregator did 915385 substitutions.
Reduced LP has 2284672 rows, 3073404 columns, and 15205747 nonzeros.
Presolve time = 37.66 sec. (69455.13 ticks)
Parallel mode: using up to 16 threads for barrier.

***NOTE: Found 3328 dense columns.

Number of nonzeros in lower triangle of A*A' = 8136064
Total time for approximate-min-fill ordering = 1.48 sec. (660.89 ticks)
Summary statistics for Cholesky factor:
 Threads                   = 16
 Rows in Factor            = 2288000
 Integer space required    = 21736118
 Total non-zeros in factor = 221953926
 Total FP ops to factor    = 1020198134836


2. The barrier phase seems to end earlier on the workstation
Or in other words: The desktop machine seems to come much closer to a solution than the workstation does before crossover begins.

Desktop

Code:
[...]
 83  4.3577287e+006  4.3577287e+006 2.38e-004 5.39e-007 8.19e-005 1.61e+009
 84  4.3577287e+006  4.3577287e+006 2.17e-004 7.22e-008 3.15e-005 4.33e+009
Barrier time = 2733.68 sec. (3540167.56 ticks)
Parallel mode: deterministic, using up to 4 threads for concurrent optimization.

Dual crossover.
 Dual:  Fixing 1673253 variables.


Workstation

Code:
[...]
 74  4.3605961e+006  4.3538311e+006 9.42e+003 9.01e-002 1.35e+004 9.14e+002
 75  4.3600505e+006  4.3546846e+006 8.55e+003 7.47e-002 1.03e+004 1.22e+003
 *   4.3841561e+006  4.3344523e+006 5.68e+000 6.58e-001 1.02e+005 1.09e+002
Barrier time = 1023.39 sec. (936226.63 ticks)
Parallel mode: deterministic, using up to 16 threads for concurrent optimization.

Dual crossover.
 Dual:  Fixing 2054489 variables.


3. The behavior is repeatable.
I ran it twice on both machines, and observed the same behavior, so I can exclude any random / stochastic behavior.

Do you have any ideas what could be the cause for these differences, i.e. the poor performance on the supposedly better hardware?

Thank you for any hints!
Fabian
Reply
#11
I think that possibly for some reason you have differences in the TIMES input data files on the 2 machines.
Could you compare the "Data-only" GDX files VEDA produces between the two machines, and check what the differences are (if any)? (Using the GAMS GDXdiff and GDX2XLS utilities).
Reply
#12
Ok, hoping I understood you correctly, I did the following:

1.) I copied the .gdx files from C:\VEDA\Veda_FE\GAMS_WrkTIMES\GamsSave from both machines into one folder.
2.) I ran c:\gams\win64\24.4\gdxdiff.exe and produced a diff_result.gdx file.
3.) I ran 'c:\gams\win64\24.4\gdx2xls.exe diff_result.gdx' and produced a diff_result.xlsx file.

The resulting excel file is huge (159 MB). Looks like many equations, parameters, variables and sets show differences (see the screenshot attached).

Curiously, the objective value of both runs is exactly the same (until the 6th digit behind the decimal separator).


Attached Files Thumbnail(s)
   
Reply
#13
That is not what I meant. It is not at all surprising that you have lots of differences in the full "result" GDX files, because the Cplex solution progressed so differently. That was already known, but what is not clear is why the runs progress so differently, and where these differences are stemming from.

That's why I suggested to compare the Data_only GDX file, which VEDA-FE also produces by default.  Do you not see such GDX files (named *Data_Only*.gdx) in the Gamssave folder?  Comparing that file (for the model run in question) between the two machines would be an important thing to check, because basically there should be no differences in the input data, but it seems you may have some.
Reply
#14
Ahhh, no in fact there was no *DataOnly*.gdx file in the GamsSave folder, since in my case it was not activated in the VFE User Options.
Well, I activated it and repeated the beforementioned steps with the generated DataOnly files.

This second diff gave me differences in the StartYear, since the desktop interpreted default values from the BY Templates als StartYear 2017, whereas the workstation interpretes them as 2015 (which is the deserved behavior). So I started the Desktop 'from scratch' again, repeated the whole process and, finally, in the third diff there were no differences any more  Angel

For further explanation: Initially, I had defined in the SysSettings
~StartYear=2015
~TimePeriods=[1, 2, 5, 5, 5, ...]
so that the periods [2015, 2016, 2020, 2025, 2030, ...] resulted.

Because of my current research question, I'm not too much interested in this two-yearly period 2016/2017 in the beginning, I changed the SysSettings in such a way that:
~StartYear=2013
~TimePeriods=[5, 5, 5, 5, ...]
so that the periods [2015, 2020, 2025, 2030, ...] resulted.

I did this change in an attempt of accellerating the solving time. Can I do it that way, or would this be wrong?

So both good and bad news for me: I found the source of the differences between the two machines but the solving time on the desktop shot up as well.
Do you have any suggestions which other solver options I could use to speed up the process?
Reply
#15
Thanks for the enlightening follow-up.

To me it seems that you might have numerical problems caused by extending the StartYear backwards from 2015 to 2013. There is nothing wrong with it as such, but there could be problems caused by e.g. cumulative constraints when changing the StartYear like that. I would suggest to try using the following period definition, and see if it improves the numerical stability:

~StartYear=2015
~TimePeriods=[2, 7, 4, 5, 5, 5, ...]

That would still give you the Milestone years [2015, 2020, 2025, 2030, ...].
I also suggest using OBLONG if you don't have it already activated, or the MOD objective variant (in the Control Panel).

But if that does not help, I hope the large-model-runners can suggest other solver options.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)