[Users] ET_2013_11 run performance
Ian Hinder
ian.hinder at aei.mpg.de
Mon Jan 20 07:26:09 CST 2014
On 20 Jan 2014, at 14:23, Yosef Zlochower <yosef at astro.rit.edu> wrote:
> On 01/20/2014 08:06 AM, Ian Hinder wrote:
>> On 20 Jan 2014, at 06:14, James Healy <jchsma at rit.edu> wrote:
>>
>>> Hello all,
>>>
>>> On Thursday morning, I pulled a fresh checkout of the newest version of
>>> the Einstein Toolkit (ET_2013_11) to use with RIT's LazEv code. I
>>> compiled it on stampede using the current stampede.cfg located in
>>> simfactory/mdb/optionlists which uses Intel MPI version 4.1.0.030 and
>>> the intel compilers version 13.1.1.163 (enabled through a module load).
>>> I submitted a short job which I ran previously with ET_2013_05. The
>>> results come out the same. However, the run speed as reported in
>>> Carpet::physical_time_per_hour is poor. It starts off good,
>>> approximately the same as with the previous build, but over time drops
>>> to as low as half the speed over 24 hours of evolution. On recovery from
>>> checkpoint, the speed is even worse, dropping to below 1/4 of the
>>> original run speed.
>>>
>>> So, I tried using the previous stampede.cfg included in the ET_2013_05
>>> branch of simfactory, the same one I used to compile my ET_2013_05
>>> build. This cfgfile uses the same version of IMPI but different Intel
>>> compilers (version 13.0.2.146). The run speed shows the same trends as
>>> when using the newer config file.
>> Hi Jim,
>>
>> I'm quite confused by this problem report. I guess that you are meaning the following:
>>
>> - You get the slowdown with the current ET_2013_11 release
>> - You don't get the slowdown with the ET_2013_05 release
>> - You do get the slowdown if you use the current ET_2013_11 release with the ET_2013_05 stampede.cfg
>>
>> Is that correct?
>>
>> I consider Intel MPI to be unusable on Stampede, and that it always has been. I used to get random crashes, hangs and slowdowns. I also experienced similar problems with Intel MPI on SuperMUC. For any serious work, I have always used MVAPICH2 on Stampede. In the current ET trunk Intel MPI has been replaced with MVAPICH2. I would try the current trunk and see if this fixes your problems. You can also use just the stampede files from the current trunk with the ET_2013_11 release (make sure you use the ones listed in stampede.ini).
> Interesting. I haven't been able to get a run to work with mvapich2 because of an issue with the runs
> dying during checkpoint. Which config file are you using (module loaded, etc)? How much ram per node
> do your production runs typically use?
I'm using exactly the default simfactory config from the current trunk, so you can see the modules etc there. Checkpointing (and recovery works fine). I usually aim for something like 75% memory usage for production runs.
>
>> We didn't change the MPI version before the release, as that would have been quite an invasive change at that point. However, I would consider backporting this, after suitable discussion.
>>
>> Of course, your problem might be unrelated to the version of MPI. I am running perfectly fine on stampede with the current trunk (MVAPICH2); runs have a consistent speed and retain this speed after recovery.
>>
>
--
Ian Hinder
http://numrel.aei.mpg.de/people/hinder
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20140120/f78e3450/attachment-0001.html
More information about the Users
mailing list