[ET Trac] [Einstein Toolkit] #1850: Severe performance problem on Stampede

Einstein Toolkit trac-noreply at einsteintoolkit.org
Wed Dec 16 15:59:43 CST 2015


#1850: Severe performance problem on Stampede
-------------------------+--------------------------------------------------
  Reporter:  hinder      |       Owner:                     
      Type:  defect      |      Status:  confirmed          
  Priority:  major       |   Milestone:                     
 Component:  SimFactory  |     Version:  development version
Resolution:              |    Keywords:                     
-------------------------+--------------------------------------------------

Comment (by michael.clark@…):

 Replying to [comment:16 hinder]:
 > Note that Michael Clark reported different results, but he says that
 they were probably not accurate, as he cannot reproduce them now.
 >
 > Michael: is it possible that the run script you were using was not the
 updated one you had modified?  Editing the run script in
 simfactory/mdb/runscripts is not sufficient.  It then needs to be added to
 the Cactus configuration before rerunning.  This requires a "sim build
 <config> --runscript <runscriptname>".

 Short version: I'm aware this is required to change the runscript for a
 configuration.  I double-checked the simulation directories to make sure
 that the runscripts were correct.

 Longer version: I ran a simulation "runA" with executable "exeA", and
 "runB" with executable "exeB". These executables used the same optionlist
 (default), thornlist (containing hwloc and SystemTopology), and
 submitscript (default), and the runscripts differed by one having an
 additional, commented out line.  The simulations used the same parameter
 file that has both hwloc and SystemTopology.  Both runscripts had the
 "export KMP_AFFINITY..." line commented out as well.  I ran "runA" on
 Monday, and it ran 9-10x slower than baseline, leading me to make my
 previous comment.  I ran "runB" yesterday, however, and I saw baseline
 performance.

 For good measure, I performed a few other tests: I reconfigured with
 identical runscripts, and I also used the executable exeA to perform the
 same simulation runA again, without recovery, to see if that executable
 was still slow.  I found it ran yesterday at the same (high) speed as
 baseline.

 Some misc notes on obstacles to these tests: I found it inconvenient
 that...
 (a) the configuration has to be rebuilt merely to change the default
 runscript, as in particular this means the resulting executables are
 different, despite having the same optionlist and thornlist (different in
 the sense of having different md5 hashes). This is why I went through with
 redoing the simulation runA with exeA. I suspect the executables are
 different because, in part, they have information about the date of
 compilation that is printed at the beginning of a run.

 I think having a command line option to provide the runscript would be
 convenient, albeit unlikely to be used once performance considerations
 have been resolved.  Moreover, you can provide --runscript to simfactory's
 create-submit command, and simfactory will silently ignore this.
 (Thankfully, this wasted only 5 minutes of my time.)

 (b) The option --norecover flatly did not work for rerunning a simulation
 from the beginning, despite being advertised in simfactory as the default.
 I had to manually delete checkpoint directories to perform the simulation
 again in the same simulation directory.

 So as to what could have caused the discrepancy I originally observed?  I
 cannot say with any certainty.  As far as something under my control, I
 considered whether this had to do with envsetup or module loads.  I have
 often in the past run with envsetup set to "sleep 0", but I have not at
 any point observed a performance impact of changing envsetup on runs using
 mvapich2. I tested this with intel MPI module loaded as default as well as
 with intel MPI loaded in envsetup; neither had any performance impact.

 That is all I have at this time.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1850#comment:18>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list