[Users] ET_2013_11 run performance

Luca Baiotti baiotti at ile.osaka-u.ac.jp
Thu Jan 23 07:11:40 CST 2014


Hello.

OK, so the emails not being sent is a known problem.

Here is the diff between my Gauss simfactory (emails working) and my 
Noether simfactory (emails not working). Both are repository versions.



~> diff -ur Cactus_Noether/simfactory Cactus_Gauss/simfactory/
Only in Cactus_Gauss/simfactory/.svn/pristine/00: 
00d3f989d1b1a9928c40c7f3d4878c6cd40e0da6.svn-base
Only in Cactus_Gauss/simfactory/.svn/pristine: 05
Only in Cactus_Gauss/simfactory/.svn/pristine/06: 
0686da6a0b024c3b7a1c190cb357ce266d80b8de.svn-base
Only in Cactus_Gauss/simfactory/.svn/pristine/24: 
24336cbf8816e71d47912c56a5fbb0d9d3e002a9.svn-base
Only in Cactus_Gauss/simfactory/.svn/pristine/2b: 
2ba3fdbaef423054e91d12657a045899fcdbcd87.svn-base
Only in Cactus_Gauss/simfactory/.svn/pristine/45: 
456b5848c3702e015f203c6ba1dd8e4e044a91c7.svn-base
Only in Cactus_Gauss/simfactory/.svn/pristine/48: 
48833b1078909902047c656db2c7d538ace7b84f.svn-base
Only in Cactus_Gauss/simfactory/.svn/pristine: 50
Only in Cactus_Gauss/simfactory/.svn/pristine: 5e
Only in Cactus_Gauss/simfactory/.svn/pristine: 67
Only in Cactus_Gauss/simfactory/.svn/pristine/69: 
69616b60ff0814f4ac227c5b37a5c40eeaff688f.svn-base
Only in Cactus_Gauss/simfactory/.svn/pristine/6f: 
6f80c656a7e0f597ffdd8d826af35361ab0bf116.svn-base
Only in Cactus_Gauss/simfactory/.svn/pristine/84: 
8461ad8dc9d4a64d707fa9bb4cf5f4e544a65f41.svn-base
Only in Cactus_Gauss/simfactory/.svn/pristine: b6
Binary files Cactus_Noether/simfactory/.svn/wc.db and 
Cactus_Gauss/simfactory/.svn/wc.db differ
diff -ur Cactus_Noether/simfactory/etc/defs.ini 
Cactus_Gauss/simfactory/etc/defs.ini
--- Cactus_Noether/simfactory/etc/defs.ini	2013-12-08 11:53:15.000000000 
+0900
+++ Cactus_Gauss/simfactory/etc/defs.ini	2013-12-08 11:40:25.000000000 +0900
@@ -16,6 +16,7 @@
  simfactory
  src
  utils
+thornlists
  EOT

  # Official Cactus, SimFactory, and GetComponents entries
Only in Cactus_Noether/simfactory/etc: defs.local.ini.noether
Only in Cactus_Gauss/simfactory/etc: defs.local.ini~
Binary files Cactus_Noether/simfactory/lib/archive/__init__.pyc and 
Cactus_Gauss/simfactory/lib/archive/__init__.pyc differ
Binary files Cactus_Noether/simfactory/lib/archive/petashare.pyc and 
Cactus_Gauss/simfactory/lib/archive/petashare.pyc differ
Binary files Cactus_Noether/simfactory/lib/archive/uberftp.pyc and 
Cactus_Gauss/simfactory/lib/archive/uberftp.pyc differ
Binary files Cactus_Noether/simfactory/lib/libutil.pyc and 
Cactus_Gauss/simfactory/lib/libutil.pyc differ
Binary files Cactus_Noether/simfactory/lib/pyini.pyc and 
Cactus_Gauss/simfactory/lib/pyini.pyc differ
Binary files Cactus_Noether/simfactory/lib/restartlib.pyc and 
Cactus_Gauss/simfactory/lib/restartlib.pyc differ
Binary files Cactus_Noether/simfactory/lib/sim-build.pyc and 
Cactus_Gauss/simfactory/lib/sim-build.pyc differ
Binary files Cactus_Noether/simfactory/lib/sim-info.pyc and 
Cactus_Gauss/simfactory/lib/sim-info.pyc differ
Binary files Cactus_Noether/simfactory/lib/sim-manage.pyc and 
Cactus_Gauss/simfactory/lib/sim-manage.pyc differ
Binary files Cactus_Noether/simfactory/lib/sim-sync.pyc and 
Cactus_Gauss/simfactory/lib/sim-sync.pyc differ
Binary files Cactus_Noether/simfactory/lib/sim-util.pyc and 
Cactus_Gauss/simfactory/lib/sim-util.pyc differ
Binary files Cactus_Noether/simfactory/lib/simarchive.pyc and 
Cactus_Gauss/simfactory/lib/simarchive.pyc differ
Binary files Cactus_Noether/simfactory/lib/simdb.pyc and 
Cactus_Gauss/simfactory/lib/simdb.pyc differ
Binary files Cactus_Noether/simfactory/lib/simdt.pyc and 
Cactus_Gauss/simfactory/lib/simdt.pyc differ
Binary files Cactus_Noether/simfactory/lib/simenv.pyc and 
Cactus_Gauss/simfactory/lib/simenv.pyc differ
Binary files Cactus_Noether/simfactory/lib/simlib.pyc and 
Cactus_Gauss/simfactory/lib/simlib.pyc differ
Binary files Cactus_Noether/simfactory/lib/simopts.pyc and 
Cactus_Gauss/simfactory/lib/simopts.pyc differ
Binary files Cactus_Noether/simfactory/lib/simproperties.pyc and 
Cactus_Gauss/simfactory/lib/simproperties.pyc differ
Binary files Cactus_Noether/simfactory/lib/simremote.pyc and 
Cactus_Gauss/simfactory/lib/simremote.pyc differ
Binary files Cactus_Noether/simfactory/lib/simrestart.pyc and 
Cactus_Gauss/simfactory/lib/simrestart.pyc differ
Binary files Cactus_Noether/simfactory/lib/simsubs.pyc and 
Cactus_Gauss/simfactory/lib/simsubs.pyc differ
diff -ur Cactus_Noether/simfactory/mdb/machines/datura.ini 
Cactus_Gauss/simfactory/mdb/machines/datura.ini
--- Cactus_Noether/simfactory/mdb/machines/datura.ini	2013-12-08 
11:53:11.000000000 +0900
+++ Cactus_Gauss/simfactory/mdb/machines/datura.ini	2013-12-08 
11:40:25.000000000 +0900
@@ -15,7 +15,7 @@
  hostname        = login-damiana.aei.mpg.de
  rsynccmd        = /home/eschnett/rsync-3.0.9/bin/rsync
  aliaspattern    = 
^(((login-)?damiana)|(sl-\d\d))(\.aei\.mpg\.de|\.damiana\.admin)?$
-envsetup        = export INTEL_LICENSE_FILE=/cluster/intel/licenses; 
export LM_LICENSE_FILE=28518 at vlicense.aei.mpg.de
+envsetup        = source /etc/profile && export 
INTEL_LICENSE_FILE=/cluster/intel/licenses; export 
LM_LICENSE_FILE=28518 at vlicense.aei.mpg.de

  # Source tree management
  sourcebasedir   = /home/@USER@/datura
diff -ur Cactus_Noether/simfactory/mdb/machines/mike.ini 
Cactus_Gauss/simfactory/mdb/machines/mike.ini
--- Cactus_Noether/simfactory/mdb/machines/mike.ini	2013-12-08 
11:53:12.000000000 +0900
+++ Cactus_Gauss/simfactory/mdb/machines/mike.ini	2013-12-08 
11:40:25.000000000 +0900
@@ -14,7 +14,7 @@
  # Access to this machine
  hostname        = mike1.hpc.lsu.edu
  rsynccmd        = /usr/bin/rsync
-aliaspattern    = ^mike1(\.hpc\.lsu\.edu)?$
+aliaspattern    = ^mike[0-9]+(\.hpc\.lsu\.edu)?$

  # Source tree management
  sourcebasedir   = /project/@USER@
diff -ur Cactus_Noether/simfactory/mdb/machines/stampede.ini 
Cactus_Gauss/simfactory/mdb/machines/stampede.ini
--- Cactus_Noether/simfactory/mdb/machines/stampede.ini	2014-01-23 
22:06:18.000000000 +0900
+++ Cactus_Gauss/simfactory/mdb/machines/stampede.ini	2014-01-18 
23:01:24.000000000 +0900
@@ -17,7 +17,7 @@
  # Access to this machine
  hostname        = stampede.tacc.utexas.edu
  rsynccmd        = /home1/00507/eschnett/rsync-3.0.9/bin/rsync
-envsetup        = module load intel/13.1.1.163 && module unload 
mvapich2 && module load impi/4.1.1.036 && module load papi
+envsetup        = module load intel/13.1.1.163 && module load papi
  aliaspattern    = ^login[1234](\.stampede\.tacc\.utexas\.edu)?$

  # Source tree management
@@ -50,9 +50,9 @@
  #        McLachlan/ML_BSSN_CL_Helper
  #        McLachlan/ML_WaveToy_CL
  EOT
-optionlist      = stampede.cfg
+optionlist      = stampede-mvapich2.cfg
  submitscript    = stampede.sub
-runscript       = stampede.run
+runscript       = stampede-mvapich2.run
  make            = make -j8

  # Simulation management
@@ -101,3 +101,17 @@
  #stdout          = cat @SIMULATION_NAME at .out
  #stderr          = cat @SIMULATION_NAME at .err
  #stdout-follow   = tail -n 100 -f @SIMULATION_NAME at .out 
@SIMULATION_NAME at .err
+
+# Intel MPI:
+#
+# Create a configuration using
+#
+#  --optionlist stampede-impi.cfg --runscript stampede-impi.run
+#
+# if you want to use Intel MPI, or set
+#
+#  optionlist      = stampede-impi.cfg
+#  runscript       = stampede-impi.run
+#
+# in your defs.local.ini for this to be the default for all configurations.
+
Only in Cactus_Gauss/simfactory/mdb/optionlists: stampede-impi.cfg
Only in Cactus_Gauss/simfactory/mdb/optionlists: stampede-mvapich2.cfg
Only in Cactus_Noether/simfactory/mdb/optionlists: stampede.cfg
Only in Cactus_Gauss/simfactory/mdb/runscripts: stampede-impi.run
Only in Cactus_Gauss/simfactory/mdb/runscripts: stampede-mvapich2.run
Only in Cactus_Noether/simfactory/mdb/runscripts: stampede.run

***********



I could not reproduce my original claim that the Noether version with 
the Gauss stampede.ini worked for emails. So please forget it.

But indeed it is strange that Gauss works and Noether does not, with 
just such differences as above! Can you reproduce this?

Luca




On 1/23/14 7:58 PM, Ian Hinder wrote:
>
> On 23 Jan 2014, at 02:51, Erik Schnetter <schnetter at cct.lsu.edu>
> wrote:
>
>> Luca
>>
>> The emails depend on the settings in the submit script, i.e. the
>> file simfactory/mdb/submitscripts/stampede.sub. The file
>> "stampede.ini" should not matter.
>
> I just checked, and there is nothing different between the
> stampede.ini files for the two versions that should affect whether
> the job emails are sent.  There is also no change at all between the
> submit scripts between the two versions.
>
> Assuming that you have two simulations, one where the emails are
> sent, and one where they are not, can you perform a diff between the
> recorded submission script in the simulation SIMFACTORY directory
> between the two simulations and post the output to the list?
>
> If you are using the versions from the repository, you should get:
>
>> --- a/mdb/machines/stampede.ini +++ b/mdb/machines/stampede.ini @@
>> -1,6 +1,6 @@ [stampede]
>>
>> -# last-tested-on: 2013-04-29 +# last-tested-on: 2013-11-04 #
>> last-tested-by: Erik Schnetter <schnetter at gmail.com>
>>
>> # NOTE: This machine configuration uses only the regular CPUs of @@
>> -17,7 +17,7 @@ status          = experimental # Access to this
>> machine hostname        = stampede.tacc.utexas.edu rsynccmd
>> = /home1/00507/eschnett/rsync-3.0.9/bin/rsync -envsetup        =
>> module unload mvapich2 && module load impi +envsetup        =
>> module load intel/13.1.1.163 && module unload mvapich2 && module
>> load impi/4.1.1.036 && module load papi aliaspattern    =
>> ^login[1234](\.stampede\.tacc\.utexas\.edu)?$
>>
>> # Source tree management @@ -40,6 +40,16 @@ disabled-thorns =
>> <<EOT LSUDevelopment/WaveToyNoGhostsPETSc TAT/TATPETSc EOT
>> +enabled-thorns = <<EOT +#    CactusTest/TestAllTypes +#
>> ExternalLibraries/OpenCL +#        CactusExamples/WaveToyOpenCL +#
>> CactusUtils/Accelerator +#        CactusUtils/OpenCLRunTime +#
>> McLachlan/ML_BSSN_CL +#        McLachlan/ML_BSSN_CL_Helper +#
>> McLachlan/ML_WaveToy_CL +EOT optionlist      = stampede.cfg
>> submitscript    = stampede.sub runscript       = stampede.run @@
>> -76,15 +86,15 @@ nodes           = 6400 min-ppn         = 16
>> allocation      = NO_ALLOCATION queue           = normal        #
>> [normal, large, development] -maxwalltime     = 24:00:00      #
>> development has 4:0:0 +maxwalltime     = 48:00:00      #
>> development has 4:0:0 maxqueueslots   = 49 -submit          =
>> sbatch @SCRIPTFILE@ +submit          = sbatch @SCRIPTFILE@; sleep
>> 60 getstatus       = squeue -j @JOB_ID@ stop            = scancel
>> @JOB_ID@ submitpattern   = Submitted batch job ([0-9]+)
>> -statuspattern   = ' @JOB_ID@ ' +statuspattern   = '@JOB_ID@ '
>> queuedpattern   = ' PD ' -runningpattern  = ' R ' +runningpattern
>> = ' (CF|CG|R|TO) ' holdingpattern  = ' S ' #exechost        = head
>> -n 1 SIMFACTORY/NODES #exechostpattern = ^(\S+)
>
>
> Since you say that changing the stampede.ini file causes the emails
> to appear, please can you also post the diff between the two
> stampede.ini files that you are using?
>
> Maybe there is some weird interaction between the queuing system and
> some environment settings in stampede.ini, e.g. an environment
> variable set by the modules.
>
> I never receive email notifications from stampede, even though I am
> using the default submission script.  I assumed it was just broken.
>
>>
>> -erik
>>
>> On Jan 22, 2014, at 20:41 , Luca Baiotti
>> <baiotti at ile.osaka-u.ac.jp> wrote:
>>
>>> On 1/20/14 10:26 PM, Ian Hinder wrote:
>>>>
>>>> On 20 Jan 2014, at 14:23, Yosef Zlochower <yosef at astro.rit.edu
>>>> <mailto:yosef at astro.rit.edu>> wrote:
>>>>
>>>>> On 01/20/2014 08:06 AM, Ian Hinder wrote:
>>>>>> On 20 Jan 2014, at 06:14, James Healy <jchsma at rit.edu
>>>>>> <mailto:jchsma at rit.edu>> wrote:
>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> On Thursday morning, I pulled a fresh checkout of the
>>>>>>> newest version of the Einstein Toolkit (ET_2013_11) to
>>>>>>> use with RIT's LazEv code. I compiled it on stampede
>>>>>>> using the current stampede.cfg located in
>>>>>>> simfactory/mdb/optionlists which uses Intel MPI version
>>>>>>> 4.1.0.030 and the intel compilers version 13.1.1.163
>>>>>>> (enabled through a module load). I submitted a short job
>>>>>>> which I ran previously with ET_2013_05.  The results come
>>>>>>> out the same.  However, the run speed as reported in
>>>>>>> Carpet::physical_time_per_hour is poor. It starts off
>>>>>>> good, approximately the same as with the previous build,
>>>>>>> but over time drops to as low as half the speed over 24
>>>>>>> hours of evolution. On recovery from checkpoint, the
>>>>>>> speed is even worse, dropping to below 1/4 of the
>>>>>>> original run speed.
>>>>>>>
>>>>>>> So, I tried using the previous stampede.cfg included in
>>>>>>> the ET_2013_05 branch of simfactory, the same one I used
>>>>>>> to compile my ET_2013_05 build.  This cfgfile uses the
>>>>>>> same version of IMPI but different Intel compilers
>>>>>>> (version 13.0.2.146). The run speed shows the same trends
>>>>>>> as when using the newer config file.
>>>>>> Hi Jim,
>>>>>>
>>>>>> I'm quite confused by this problem report.  I guess that
>>>>>> you are meaning the following:
>>>>>>
>>>>>> - You get the slowdown with the current ET_2013_11 release
>>>>>> - You don't get the slowdown with the ET_2013_05 release -
>>>>>> You do get the slowdown if you use the current ET_2013_11
>>>>>> release with the ET_2013_05 stampede.cfg
>>>>>>
>>>>>> Is that correct?
>>>>>>
>>>>>> I consider Intel MPI to be unusable on Stampede, and that
>>>>>> it always has been.  I used to get random crashes, hangs
>>>>>> and slowdowns.  I also experienced similar problems with
>>>>>> Intel MPI on SuperMUC.  For any serious work, I have always
>>>>>> used MVAPICH2 on Stampede.  In the current ET trunk Intel
>>>>>> MPI has been replaced with MVAPICH2.  I would try the
>>>>>> current trunk and see if this fixes your problems.  You
>>>>>> can also use just the stampede files from the current trunk
>>>>>> with the ET_2013_11 release (make sure you use the ones
>>>>>> listed in stampede.ini).
>>>>> Interesting. I haven't been able to get a run to work with
>>>>> mvapich2 because of an issue with the runs dying during
>>>>> checkpoint. Which config file are you using (module loaded,
>>>>> etc)? How much ram per node do your production runs typically
>>>>> use?
>>>>
>>>> I'm using exactly the default simfactory config from the
>>>> current trunk, so you can see the modules etc there.
>>>> Checkpointing (and recovery works fine).  I usually aim for
>>>> something like 75% memory usage for production runs.
>>>
>>> Hello, I would like to report a different problem with the
>>> simfactory settings for stampede: with ET Noether or trunk the
>>> job start/end emails are not sent (or at least they do not reach
>>> the Osaka University server; I had the systems administrators
>>> check). I receive the emails if I use the simfactory of Gauss. In
>>> particular, if I copy just the stampede.ini from Gauss to Noether
>>> (and no other files) and recompile, I do receive the emails.
>>>
>>> Luca
>>>
>>>
>>>
>>> _______________________________________________ Users mailing
>>> list Users at einsteintoolkit.org
>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>
>> -- Erik Schnetter <schnetter at cct.lsu.edu>
>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>
>> My email is as private as my paper mail. I therefore support
>> encrypting and signing email messages. Get my PGP key from
>> http://pgp.mit.edu/.
>>
>> _______________________________________________ Users mailing list
>> Users at einsteintoolkit.org
>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>







More information about the Users mailing list