[Users] simulationfactory issue?

Richard O'Shaughnessy oshaughn at gravity.phys.uwm.edu
Sun Oct 17 10:17:01 CDT 2010


Hi Erik,

Thanks for the breakdown.

With regard to condor, I know Duncan Brown at Syracuse has a working script 'condor_mpirun' that's GPL'd.  I'm including a copy of a message that circulated to a LIGO software email list that describes how to run condor mpi jobs.

So you'd submit to a cluster with
     condor_submit  <name-of-script-text-file-included-below>
which would in turn call this script

See http://www.cs.wisc.edu/condor/manual/v7.4/2_9Parallel_Applications.html
for a description of the underlying issues.  I personally haven't used condor in parallel in a few years, but it was particularly convenient when I did.  And it's the submission system I'll likely be using in the future.

-Richard

> From: Duncan Brown <dabrown at physics.syr.edu>
> Date: October 6, 2010 8:35:36 AM CDT
> To: daswg at gravity.phys.uwm.edu, Vivien Raymond <vivien at u.northwestern.edu>
> Cc: Stuart Anderson <anderson at ligo.caltech.edu>, LDAS_ADMIN_ALL <ldas_admin_all at ligo.caltech.edu>
> Subject: Re: [DASWG] MPI jobs on LDG clusters
> Reply-To: daswg at gravity.phys.uwm.edu
> 
> Hi Vivien,
> On Oct 5, 2010, at 8:26 PM, Vivien Raymond wrote:
>> Thanks Stuart, I can now compile and run my jobs locally on
>> ldas-pcdev1.ligo.caltech.edu. As far as I understand from Xavier,  
>> there
>> is no way to condor_submit MPI jobs yet, is there?
> 
> 
> On sugar-dev1.phy.syr.edu, you can use the installation of openmpi  
> 1.2.5 installed in
> 
> /opt/condor/openmpi-1.2.5
> 
> You should be able to build and link your code against this without  
> any problems. Once you've mpi-compiled, you can use the script below  
> to run your job. Change
> 
> /path/to/your/mpiexecutable
> 
> as appropriate. Note, *don't* change executable, that has to be  
> condor_mpirun. Change HOWMANYCORES to the appropriate integer. Dump  
> the lines below into a condor .sub file (e.g. spinspiral.sub) and the  
> condor_submit it as normal.
> 
> Let me know if you have problems. A similar thing should work at CIT  
> (I'll have to point you to my condor_mpirun, as it's not installed  
> system-wide) but let's try sugar first. There's may be some SpEC- 
> specific stuff we have to iron out.
> 
> BTW, if you modify your code so that it exits gracefully and can  
> resume after a SIGUSR2 the code can checkpoint on eviction (SpEC does  
> this). If not, I'll just lock you on as many cores as you need and we  
> can deal with this later.
> 
> Cheers,
> Duncan.
> 
> universe = parallel
> executable = /opt/condor/bin/condor_mpirun
> arguments = --verbose --stdout cluster$(CLUSTER).proc$(PROCESS).mpiout  
> --stderr cluster$(CLUSTER).proc$(PROCESS).mpierr /path/to/your/ 
> mpiexecutable
> machine_count = HOWMANYCORES
> log = cluster$(CLUSTER).proc$(PROCESS).log
> output = cluster$(CLUSTER).proc$(PROCESS).subproc$(NODE).out
> error = cluster$(CLUSTER).proc$(PROCESS).subproc$(NODE).err
> notification = Always
> on_exit_remove = (ExitBySignal == True) || (ExitCode != 143)
> rank = (40 - (2.0 * TotalCondorLoadAvg))
> queue
> 
> 
> -- 
> 
> Duncan Brown                          Room 263-1, Department of Physics,
> Assistant Professor of Physics        Syracuse University, NY 13244, USA
> Phone: (315) 443 5993             http://www.gravity.phy.syr.edu/~duncan
> 
> 

------


On Oct 16, 2010, at 2:48 PM, Erik Schnetter wrote:

> Richard
> 
> A cactus configuration is defined by its thorn list, i.e. by the set
> of thorns which it includes. You can compile more thorns into a
> configuration than you activate at run time. I usually have a single,
> large configuration containing many thorns, and activate a few of them
> at run time. I use the default name "sim" for this configuration.
> 
> (I also have a variant sim-debug with debugging enabled, which runs at
> reduced speeds, but contains more run-time checks to catch coding
> errors.)
> 
> Others prefer to have smaller configurations, and create a new
> configuration for each project or simulation. I can see that this is a
> good idea, since re-building my large configuration from scratch can
> take some time.
> 
> No, there is no consistency between configurations on different
> machines. We enforce consistency between source trees, but not
> configuration. However, I think this is a good idea. We could e.g.
> derive the configuration name from the thorn list, and replicate thorn
> lists to remote systems -- this would (in a way) ensure consistency.
> Of course, you then still need to ensure that a configuration is
> re-built whenever the thorn list or the source tree changes, which we
> don't do yet automatically.
> 
> We don't have submit scripts for Condor's Parallel Universe. Can you
> give us a pointer to details of this system?
> 
> -erik
> 
> On Fri, Oct 15, 2010 at 4:21 PM, Richard O'Shaughnessy
> <oshaughn at gravity.phys.uwm.edu> wrote:
>> Hi Erik,
>> Thanks -- I didn't realize I accidentally (and consistently!) was adding
>> static_tov to the build command.
>> How does the build system work?  It's obvious now that I can (for example)
>> build wavetoy, static_tov, and ks-mclachlan at once, and run instances of
>> each independently.   But what about maintaining the source trees -- do I
>> need to rebuild the name?  Can I check if there are updates to a particular
>> configuration's source tree? Is there any consistency enforced between
>> configurations used on a local host and remote build (i.e., if I want to be
>> sure I use the same code tag on each of many target clusters)?
>> -Richard
>> PS: On a related note, are there submission scripts for condor's parallel
>> universe?
>> On Oct 15, 2010, at 3:58 PM, Erik Schnetter wrote:
>> 
>> Richard
>> 
>> You created a configuration with the non-default name "static_tov".
>> (The default name would be "sim"). Therefore you need to specify this
>> configuration name when you create and submit a simulation:
>> 
>> ./simfactory/sim create-submit static_tov --configuration=static_tov
>> --parfile=...
>> 
>> You will then have a configuration and a simulation with the same
>> name; this does not matter. I usually have a single configuration
>> "sim", and use this configuration for all my simulations.
>> 
>> -erik
>> 
>> On Fri, Oct 15, 2010 at 1:16 PM, Richard O'Shaughnessy
>> <oshaughn at gravity.phys.uwm.edu> wrote:
>> 
>> Hi Erik
>> 
>> 
>> After a seemingly successful compile, I tried a simple single-cpu run (using
>> 
>> generic.sh submission) on my cluster head node.   I believe this
>> 
>> installation is the release version; the only changes have been to the
>> 
>> optionlist and udb.pm.  A similar configuration works on other machines
>> 
>> (i.e, my laptop, albeit with svn rather than release versions), but for some
>> 
>> reason not here.  It's not creating the simulations directory at all.
>> 
>> Thoughts?
>> 
>> --- command-line
>> 
>> [oshaughn at hydra Cactus]$ ./simfactory/sim create-submit static_tov
>> 
>> --parfile=par/static_tov.par --procs=1 --walltime=8:0:0
>> 
>> Simulation Factory:
>> 
>> Configuration name(s) not specified -- using default configuration "sim"
>> 
>> Uncaught exception from user code:
>> 
>>         Configuration "sim" contains no executable at ./simfactory/sim line
>> 
>> 5216.
>> 
>>  at ./simfactory/sim line 5216
>> 
>>         main::get_executable() called at ./simfactory/sim line 1883
>> 
>>         main::command_create('static_tov') called at ./simfactory/sim line
>> 
>> 2955
>> 
>>         main::command_create_submit('static_tov') called at ./simfactory/sim
>> 
>> line 452
>> 
>> Richard
>> 
>> The error message "no executable" indicates that your build didn't
>> 
>> complete. There are probably problems with your compiler or linker
>> 
>> options.
>> 
>> - All Cactus executables are stored in Cactus's "exe" directories.
>> 
>> What does "ls exe" say?
>> 
>> - Did you specify a different name for your configuration while
>> 
>> building? If so, you need to use the --configuration=... option when
>> 
>> submitting the simulation.
>> 
>> - If you use the --debug or --profile flag while building, you also
>> 
>> need to specify it while submitting a simulation, since you'll need to
>> 
>> use the debugging or profiling executable.
>> 
>> 
>> Ok.  I'm simply trying to follow the EinsteinToolkit new user instructions
>> 
>> on a new machine, as  a test case, with one CPU.
>> 
>> 1) executables are made
>> 
>> [oshaughn at hydra Cactus]$ ls exe
>> 
>> cactus_static_tov  static_tov
>> 
>> 2) I just changed udb.pm (see original email)   The build command (which
>> 
>> says it completed successfully) is
>> 
>> [oshaughn at hydra Cactus]$ ./simfactory/sim  build static_tov
>> 
>> --thornlist=manifest/einsteintoolkit.th
>> 
>> 3) I didn't specify any debugging options.
>> 
>> -erik
>> 
>> --
>> 
>> Erik Schnetter <schnetter at cct.lsu.edu>   http://www.cct.lsu.edu/~eschnett/
>> 
>> Richard O'Shaughnessy oshaughn at gravity.phys.uwm.edu
>> 
>> 462 Physics Building Phone: 414 229 6674
>> 
>> Center for Gravitation and Cosmology
>> 
>> University of Wisconsin, Milwaukee 53211
>> 
>> 
>> 
>> 
>> --
>> Erik Schnetter <schnetter at cct.lsu.edu>   http://www.cct.lsu.edu/~eschnett/
>> 
>> Richard O'Shaughnessy oshaughn at gravity.phys.uwm.edu
>> 462 Physics Building Phone: 414 229 6674
>> Center for Gravitation and Cosmology
>> University of Wisconsin, Milwaukee 53211
>> 
> 
> 
> 
> -- 
> Erik Schnetter <schnetter at cct.lsu.edu>   http://www.cct.lsu.edu/~eschnett/

Richard O'Shaughnessy							oshaughn at gravity.phys.uwm.edu
462 Physics Building								Phone: 414 229 6674
Center for Gravitation and Cosmology
University of Wisconsin, Milwaukee 53211









-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20101017/b8dbb578/attachment-0002.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: condor_mpirun
Type: application/octet-stream
Size: 21897 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20101017/b8dbb578/attachment-0001.obj 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20101017/b8dbb578/attachment-0003.html 


More information about the Users mailing list