[Users] ET_2013_11 run performance
Luca Baiotti
baiotti at ile.osaka-u.ac.jp
Wed Jan 22 19:41:25 CST 2014
On 1/20/14 10:26 PM, Ian Hinder wrote:
>
> On 20 Jan 2014, at 14:23, Yosef Zlochower <yosef at astro.rit.edu
> <mailto:yosef at astro.rit.edu>> wrote:
>
>> On 01/20/2014 08:06 AM, Ian Hinder wrote:
>>> On 20 Jan 2014, at 06:14, James Healy <jchsma at rit.edu
>>> <mailto:jchsma at rit.edu>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> On Thursday morning, I pulled a fresh checkout of the newest version of
>>>> the Einstein Toolkit (ET_2013_11) to use with RIT's LazEv code. I
>>>> compiled it on stampede using the current stampede.cfg located in
>>>> simfactory/mdb/optionlists which uses Intel MPI version 4.1.0.030 and
>>>> the intel compilers version 13.1.1.163 (enabled through a module load).
>>>> I submitted a short job which I ran previously with ET_2013_05. The
>>>> results come out the same. However, the run speed as reported in
>>>> Carpet::physical_time_per_hour is poor. It starts off good,
>>>> approximately the same as with the previous build, but over time drops
>>>> to as low as half the speed over 24 hours of evolution. On recovery from
>>>> checkpoint, the speed is even worse, dropping to below 1/4 of the
>>>> original run speed.
>>>>
>>>> So, I tried using the previous stampede.cfg included in the ET_2013_05
>>>> branch of simfactory, the same one I used to compile my ET_2013_05
>>>> build. This cfgfile uses the same version of IMPI but different Intel
>>>> compilers (version 13.0.2.146). The run speed shows the same trends as
>>>> when using the newer config file.
>>> Hi Jim,
>>>
>>> I'm quite confused by this problem report. I guess that you are
>>> meaning the following:
>>>
>>> - You get the slowdown with the current ET_2013_11 release
>>> - You don't get the slowdown with the ET_2013_05 release
>>> - You do get the slowdown if you use the current ET_2013_11 release
>>> with the ET_2013_05 stampede.cfg
>>>
>>> Is that correct?
>>>
>>> I consider Intel MPI to be unusable on Stampede, and that it always
>>> has been. I used to get random crashes, hangs and slowdowns. I also
>>> experienced similar problems with Intel MPI on SuperMUC. For any
>>> serious work, I have always used MVAPICH2 on Stampede. In the
>>> current ET trunk Intel MPI has been replaced with MVAPICH2. I would
>>> try the current trunk and see if this fixes your problems. You can
>>> also use just the stampede files from the current trunk with the
>>> ET_2013_11 release (make sure you use the ones listed in stampede.ini).
>> Interesting. I haven't been able to get a run to work with mvapich2
>> because of an issue with the runs
>> dying during checkpoint. Which config file are you using (module
>> loaded, etc)? How much ram per node
>> do your production runs typically use?
>
> I'm using exactly the default simfactory config from the current trunk,
> so you can see the modules etc there. Checkpointing (and recovery works
> fine). I usually aim for something like 75% memory usage for production
> runs.
Hello, I would like to report a different problem with the simfactory
settings for stampede: with ET Noether or trunk the job start/end emails
are not sent (or at least they do not reach the Osaka University server;
I had the systems administrators check).
I receive the emails if I use the simfactory of Gauss. In particular, if
I copy just the stampede.ini from Gauss to Noether (and no other files)
and recompile, I do receive the emails.
Luca
More information about the Users
mailing list