[Users] Checkpointing with Cactus/Simfactory

dumsani g14n8326 at campus.ru.ac.za
Wed Aug 31 16:02:16 CDT 2016


Hi All,

I have some very long BH simulations to run and I'd like to checkpoint 
for these. I haven't really done checkpointing before. But what I
know is that chekpointing information can be specified in the parameter 
file (for use by Cactus), and also that Simfactory
does seem to have some stuff to do with or handle checkointing ( 
"restart-id", etc...). Of course, scheduling systems (e.g. PBSPro) at
HPCs would have support for checkpointing but I don't want to use that. 
Probably it is only best to use that to set the walltime.

So, my main question is: Assuming I set a maximum walltime of 12 hours, 
and I set my simulation to dump checkpoints every 3hrs  (in
walltime units), how do I *restart* my job at the end of the 12 hrs 
using Simfactory in a way that the simulation starts off from the last
checkpoint it droppped before terminating? What extra command line 
options should I pass to the sumbit command of SImfactory?

Below is a segment of my parfile where checkpointing information is 
given, provides as a sample or as a basis for anyone who would
want to advise me on how such information should be given.

#####  Checkpointing #########
CarpetIOHDF5::checkpoint                = yes
IO::checkpoint_ID                       = yes
IO::recover                             = autoprobe
IO::checkpoint_every                    = 1024
IO::out_proc_every                      = 2
IO::checkpoint_keep                     = 3
IO::checkpoint_dir                      = $parfile
Carpet::regrid_during_recovery          = no
CarpetIOHDF5::use_grid_structure_from_checkpoint = yes
CarpetIOHDF5::open_one_input_file_at_a_time = yes

Your advice and assistance will be highly appreciated.

Best,
Dumsani


More information about the Users mailing list