<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On 31 Aug 2016, at 23:02, dumsani <<a href="mailto:g14n8326@campus.ru.ac.za">g14n8326@campus.ru.ac.za</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Hi All,<br><br>I have some very long BH simulations to run and I'd like to checkpoint <br>for these. I haven't really done checkpointing before. But what I<br>know is that chekpointing information can be specified in the parameter <br>file (for use by Cactus), and also that Simfactory<br>does seem to have some stuff to do with or handle checkointing ( <br>"restart-id", etc...). Of course, scheduling systems (e.g. PBSPro) at<br>HPCs would have support for checkpointing but I don't want to use that. <br>Probably it is only best to use that to set the walltime.<br><br>So, my main question is: Assuming I set a maximum walltime of 12 hours, <br>and I set my simulation to dump checkpoints every 3hrs (in<br>walltime units), how do I *restart* my job at the end of the 12 hrs <br>using Simfactory in a way that the simulation starts off from the last<br>checkpoint it droppped before terminating? What extra command line <br>options should I pass to the sumbit command of SImfactory?<br></blockquote><div><br></div><div>You don't need anything extra; just "sim submit <simulationname>".</div><div><br></div><div><span class="Apple-tab-span" style="white-space:pre">        </span>– If the job has completed already, the next job will be queued.</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>– If the job is queued or running, the next job will be queued with a dependency to only start when the previous one finishes. (The dependency logic is in the submit script of the machine; it's possible that the machine you are using does not have this defined. Look for references to "chain" in the other submit scripts in case you need to add this to your own machine.)</div><div><br></div><div>You can also use</div><div><br></div><div><div><div>TerminationTrigger::max_walltime = @WALLTIME_HOURS@</div><div>TerminationTrigger::on_remaining_walltime = 30 # minutes</div><div>TerminationTrigger::output_remtime_every_minutes = 30</div><div><br></div><div><div>This will cause Cactus to cleanly terminate 30 minutes before the end of the job's walltime (as a margin). If you additionally use</div></div><div><br></div><div><div>IO::checkpoint_on_terminate = yes</div><div><br></div></div><div>then you will get a checkpoint written. Without this, your job will be unceremoniously killed by the scheduler, leaving you with up to 3 hours of wasted computer time, possible corrupted output files, and duplicate data.</div><div><br></div><div>It is also convenient to use</div></div><div><br></div><div>TerminationTrigger::termination_from_file = yes</div><div>TerminationTrigger::termination_file = "terminate.txt"</div><div>TerminationTrigger::create_termination_file = yes</div><div><br></div><div>This will create a file called "terminate.txt" in the output directory. If you add a "1" to this file, Cactus will terminate immediately (and checkpoint, if you have set checkpoint_on_terminate as above). You can then resubmit the simulation if you like. This allows you to easily stop and start simulations without losing any runtime.</div></div></div><br><div apple-content-edited="true">
<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>-- </div><div>Ian Hinder</div><div><a href="http://members.aei.mpg.de/ianhin">http://members.aei.mpg.de/ianhin</a></div></div></div></div></div>
</div>
<br></body></html>