[Users] Restarting simulations using simfactory

Erik Schnetter schnetter at cct.lsu.edu
Sun Aug 16 17:04:57 CDT 2020


On Sun, Aug 16, 2020 at 5:11 PM Atul Kedia <akedia at nd.edu> wrote:
>
> Hi Erik,
>
> Thanks for your suggestion. I was doing a sim run because I get this error when I start a new simulation doing a 'submit' on my cluster "Executing submit command: exec nohup <sim directory address>". Which made me think 'submit' cannot be used when submitting a job to a cluster; rather it can be used only when running a sim on the front end (to hide the output from the terminal).
>
> Anyway, now as I am doing a restart using 'submit' I am getting these messages and then the simulation stops.
> Warning: Simulation nsnstohmns_low_resolution_Marronetti already has an active submission. Chaining this submission onto job id 999999

This is already wrong.

I never use "sim run" myself, so I might be wrong about this. I think
that "sim run" might not create the necessary wrappers that allow
multiple restarts for a simulation. Thus when you submit a simulation
with "sim run", it's not easily possible to restart the simulation.

If that is true, then I would create a new simulation (i.e. use a
different name for the simulation), and modify the parameter file to
point to the checkpoint files in the old simulations as restart files.
The respective parameters are provided by the "IOUtil" thorn. This
way, you sidestep Simfactory's automatic restart mechanism, and you
are only using the "sim run" command you already know is working for
your machine. The disadvantage is that you need to modify the
parameter file each time you restart to point to the checkpoint files
written by the previous run.

In the future, I recommend using "sim submit" to start simulations,
but here it might be easier to use something that is known to work.
You could instead use "sim submit" to re-run your original
simulations. That might be easiest if possible, but of course you'd
lose the progress you made so far.

I hope others (who are more familiar with "sim run") can also pitch in
and comment.

-erik

> Warning: job status is U
> Warning: Job chaining requested but job id 999999 is not in the queue. Its status is U. Aborting submission.
>
> I guess the issue is with the "status is U" part now.
>
> Best regards,
> Atul.
>
> On Sat, Aug 15, 2020 at 8:57 PM Erik Schnetter <schnetter at cct.lsu.edu> wrote:
>>
>> Atul
>>
>> "sim run" starts a simulation right away. Have you tried "sim submit"
>> instead? This should check whether the simulation is still active, and
>> if so, deactivate it before running the next restart.
>>
>> -erik
>>
>> On Sat, Aug 15, 2020 at 7:54 PM Atul Kedia <akedia at nd.edu> wrote:
>> >
>> > Hello,
>> >
>> > I want to restart a simulation to make it run for longer. It currently stopped at the time it was asked it at my par file. It has checkpoints enabled in the par file.
>> >
>> > I have increased the time in the par file at <sim_name>/output-0000/ and at  <sim_name>/SIMFACTORY/par and I tried the commands :
>> >
>> > simfactory/bin/sim cleanup <sim_name>
>> > followed by
>> > simfactory/bin/sim run <sim_name>
>> > and set the added a line "jobid = 999999" as suggested at : http://lists.einsteintoolkit.org/pipermail/users/2018-September/006528.html
>> >
>> > and I get the error message :
>> > "Error: Internal error: Cannot submit simulation <sim_name> because it is already active"
>> >
>> > Another email thread I used for reference was this one: http://lists.einsteintoolkit.org/pipermail/users/2018-May/006281.html
>> >
>> > I am using ET_Mayer with the default simfactory that it comes with (simfactory 2, I think).
>> >
>> > Any help would be really appreciated.
>> >
>> > Thank you,
>> >
>> > --
>> > Atul Kedia
>> > PhD student,
>> > Physics department,
>> > University of Notre Dame.
>> > _______________________________________________
>> > Users mailing list
>> > Users at einsteintoolkit.org
>> > http://lists.einsteintoolkit.org/mailman/listinfo/users
>>
>>
>>
>> --
>> Erik Schnetter <schnetter at cct.lsu.edu>
>> http://www.perimeterinstitute.ca/personal/eschnetter/
>
>
>
> --
> Atul Kedia
> PhD student,
> Physics department,
> University of Notre Dame.



-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/


More information about the Users mailing list