[Users] ET_2013_11 run performance

Ian Hinder ian.hinder at aei.mpg.de
Mon Jan 20 07:26:09 CST 2014


On 20 Jan 2014, at 14:23, Yosef Zlochower <yosef at astro.rit.edu> wrote:

> On 01/20/2014 08:06 AM, Ian Hinder wrote:
>> On 20 Jan 2014, at 06:14, James Healy <jchsma at rit.edu> wrote:
>> 
>>> Hello all,
>>> 
>>> On Thursday morning, I pulled a fresh checkout of the newest version of
>>> the Einstein Toolkit (ET_2013_11) to use with RIT's LazEv code. I
>>> compiled it on stampede using the current stampede.cfg located in
>>> simfactory/mdb/optionlists which uses Intel MPI version 4.1.0.030 and
>>> the intel compilers version 13.1.1.163 (enabled through a module load).
>>> I submitted a short job which I ran previously with ET_2013_05.  The
>>> results come out the same.  However, the run speed as reported in
>>> Carpet::physical_time_per_hour is poor. It starts off good,
>>> approximately the same as with the previous build, but over time drops
>>> to as low as half the speed over 24 hours of evolution. On recovery from
>>> checkpoint, the speed is even worse, dropping to below 1/4 of the
>>> original run speed.
>>> 
>>> So, I tried using the previous stampede.cfg included in the ET_2013_05
>>> branch of simfactory, the same one I used to compile my ET_2013_05
>>> build.  This cfgfile uses the same version of IMPI but different Intel
>>> compilers (version 13.0.2.146). The run speed shows the same trends as
>>> when using the newer config file.
>> Hi Jim,
>> 
>> I'm quite confused by this problem report.  I guess that you are meaning the following:
>> 
>> - You get the slowdown with the current ET_2013_11 release
>> - You don't get the slowdown with the ET_2013_05 release
>> - You do get the slowdown if you use the current ET_2013_11 release with the ET_2013_05 stampede.cfg
>> 
>> Is that correct?
>> 
>> I consider Intel MPI to be unusable on Stampede, and that it always has been.  I used to get random crashes, hangs and slowdowns.  I also experienced similar problems with Intel MPI on SuperMUC.  For any serious work, I have always used MVAPICH2 on Stampede.  In the current ET trunk Intel MPI has been replaced with MVAPICH2.  I would try the current trunk and see if this fixes your problems.  You can also use just the stampede files from the current trunk with the ET_2013_11 release (make sure you use the ones listed in stampede.ini).
> Interesting. I haven't been able to get a run to work with mvapich2 because of an issue with the runs
> dying during checkpoint. Which config file are you using (module loaded, etc)? How much ram per node
> do your production runs typically use?

I'm using exactly the default simfactory config from the current trunk, so you can see the modules etc there.  Checkpointing (and recovery works fine).  I usually aim for something like 75% memory usage for production runs.


> 
>> We didn't change the MPI version before the release, as that would have been quite an invasive change at that point.  However, I would consider backporting this, after suitable discussion.
>> 
>> Of course, your problem might be unrelated to the version of MPI.  I am running perfectly fine on stampede with the current trunk (MVAPICH2); runs have a consistent speed and retain this speed after recovery.
>> 
> 

-- 
Ian Hinder
http://numrel.aei.mpg.de/people/hinder

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20140120/f78e3450/attachment-0001.html 


More information about the Users mailing list