[Users] Checkpointing with Cactus/Simfactory
g14n8326 at campus.ru.ac.za
Wed Aug 31 16:02:16 CDT 2016
I have some very long BH simulations to run and I'd like to checkpoint
for these. I haven't really done checkpointing before. But what I
know is that chekpointing information can be specified in the parameter
file (for use by Cactus), and also that Simfactory
does seem to have some stuff to do with or handle checkointing (
"restart-id", etc...). Of course, scheduling systems (e.g. PBSPro) at
HPCs would have support for checkpointing but I don't want to use that.
Probably it is only best to use that to set the walltime.
So, my main question is: Assuming I set a maximum walltime of 12 hours,
and I set my simulation to dump checkpoints every 3hrs (in
walltime units), how do I *restart* my job at the end of the 12 hrs
using Simfactory in a way that the simulation starts off from the last
checkpoint it droppped before terminating? What extra command line
options should I pass to the sumbit command of SImfactory?
Below is a segment of my parfile where checkpointing information is
given, provides as a sample or as a basis for anyone who would
want to advise me on how such information should be given.
##### Checkpointing #########
CarpetIOHDF5::checkpoint = yes
IO::checkpoint_ID = yes
IO::recover = autoprobe
IO::checkpoint_every = 1024
IO::out_proc_every = 2
IO::checkpoint_keep = 3
IO::checkpoint_dir = $parfile
Carpet::regrid_during_recovery = no
CarpetIOHDF5::use_grid_structure_from_checkpoint = yes
CarpetIOHDF5::open_one_input_file_at_a_time = yes
Your advice and assistance will be highly appreciated.
More information about the Users