[Users] ET_2013_11 run performance
Yosef Zlochower
yosef at astro.rit.edu
Mon Jan 20 07:23:22 CST 2014
On 01/20/2014 08:06 AM, Ian Hinder wrote:
> On 20 Jan 2014, at 06:14, James Healy <jchsma at rit.edu> wrote:
>
>> Hello all,
>>
>> On Thursday morning, I pulled a fresh checkout of the newest version of
>> the Einstein Toolkit (ET_2013_11) to use with RIT's LazEv code. I
>> compiled it on stampede using the current stampede.cfg located in
>> simfactory/mdb/optionlists which uses Intel MPI version 4.1.0.030 and
>> the intel compilers version 13.1.1.163 (enabled through a module load).
>> I submitted a short job which I ran previously with ET_2013_05. The
>> results come out the same. However, the run speed as reported in
>> Carpet::physical_time_per_hour is poor. It starts off good,
>> approximately the same as with the previous build, but over time drops
>> to as low as half the speed over 24 hours of evolution. On recovery from
>> checkpoint, the speed is even worse, dropping to below 1/4 of the
>> original run speed.
>>
>> So, I tried using the previous stampede.cfg included in the ET_2013_05
>> branch of simfactory, the same one I used to compile my ET_2013_05
>> build. This cfgfile uses the same version of IMPI but different Intel
>> compilers (version 13.0.2.146). The run speed shows the same trends as
>> when using the newer config file.
> Hi Jim,
>
> I'm quite confused by this problem report. I guess that you are meaning the following:
>
> - You get the slowdown with the current ET_2013_11 release
> - You don't get the slowdown with the ET_2013_05 release
> - You do get the slowdown if you use the current ET_2013_11 release with the ET_2013_05 stampede.cfg
>
> Is that correct?
>
> I consider Intel MPI to be unusable on Stampede, and that it always has been. I used to get random crashes, hangs and slowdowns. I also experienced similar problems with Intel MPI on SuperMUC. For any serious work, I have always used MVAPICH2 on Stampede. In the current ET trunk Intel MPI has been replaced with MVAPICH2. I would try the current trunk and see if this fixes your problems. You can also use just the stampede files from the current trunk with the ET_2013_11 release (make sure you use the ones listed in stampede.ini).
Interesting. I haven't been able to get a run to work with mvapich2
because of an issue with the runs
dying during checkpoint. Which config file are you using (module loaded,
etc)? How much ram per node
do your production runs typically use?
> We didn't change the MPI version before the release, as that would have been quite an invasive change at that point. However, I would consider backporting this, after suitable discussion.
>
> Of course, your problem might be unrelated to the version of MPI. I am running perfectly fine on stampede with the current trunk (MVAPICH2); runs have a consistent speed and retain this speed after recovery.
>
More information about the Users
mailing list