[ET Trac] #2725: Walltime being overwritten to 24 hours

Roland Haas trac-noreply at einsteintoolkit.org
Tue May 9 11:16:21 CDT 2023


#2725: Walltime being overwritten to 24 hours

 Reporter: Matthew Cerep
   Status: submitted
Milestone: 
  Version: 
     Type: bug
 Priority: trivial
Component: SimFactory

Comment (by Roland Haas):

Hmm. I am very puzzled. In your log file \(thank you for including it\) you have the lines:  

```text
[LOG:2023-05-07 21:19:25] self.submit(submitScript)::No previous walltime available to be used, using walltime 168:00:00
[LOG:2023-05-07 21:19:25] self.submit(submitScript)::Defined substituion properties for submission
[LOG:2023-05-07 21:19:25] self.submit(submitScript)::{'SOURCEDIR': '/users/mtc00017/scratch/EToolKitNew/Cactus', 'SIMULATION_NAME': 'insp08_final_small_2', 'SHORT_SIMULATION_NAME': 'insp08_final_sm', 'SIMULATION_ID': 'simulation-insp08_final_small_2-thornyflat-tf.hpc.wvu.edu-mtc00017-2023.05.07-21.19.25-2587', 'RESTART_ID': 0, 'RUNDIR': '/users/mtc00017/scratch/simulations/insp08_final_small_2/output-0000', 'SCRIPTFILE': '/users/mtc00017/scratch/simulations/insp08_final_small_2/output-0000/SIMFACTORY/SubmitScript', 'EXECUTABLE': '/users/mtc00017/scratch/simulations/insp08_final_small_2/SIMFACTORY/exe/cactus_sim', 'PARFILE': '/users/mtc00017/scratch/simulations/insp08_final_small_2/output-0000/charged_binary_inspiral_final.rpar', 'HOSTNAME': 'tf.hpc.wvu.edu', 'USER': 'mtc00017', 'NODES': 2, 'PROCS_REQUESTED': 80, 'PPN': 40, 'NUM_PROCS': 80, 'NODE_PROCS': 40, 'PROCS': 80, 'NUM_THREADS': 1, 'PPN_USED': 40, 'NUM_SMT': 1, 'MEMORY': '98304', 'CPUFREQ': '2.10', 'ALLOCATION': 'NOALLOCATION', 'QUEUE': 'comm_small_week', 'EMAIL': 'mtc00017', 'WALLTIME': '24:00:00', 'WALLTIME_HH': '24', 'WALLTIME_MM': '00', 'WALLTIME_SS': '00', 'WALLTIME_SECONDS': 86400, 'WALLTIME_MINUTES': 1440.0, 'WALLTIME_HOURS': 24.0, 'SIMFACTORY': '/gpfs20/scratch/mtc00017/EToolKitNew/Cactus/repos/simfactory2/bin/sim', 'SUBMITSCRIPT': '/users/mtc00017/scratch/simulations/insp08_final_small_2/output-0000/SIMFACTORY/SubmitScript', 'CONFIGURATION': 'sim', 'FROM_RESTART_COMMAND': '', 'CHAINED_JOB_ID': ''}
```

So the walltime was 168 hours at one point \(the “`No previous walltime available”` really just means that this is `output-0000`\) but then switched to 24hrs just afterwards. Looking at the Python code in `lib/simrestart.py` I cannot see how it could change to 24hrs.

```python
    # import walltime if no --walltime is specified.
    if existingProperties is not None and not simenv.OptionsManager.HasOption('walltime') and existingProperties.HasProperty('walltime'):
        Walltime = restartlib.WallTime(existingProperties.GetProperty("walltime"))
        self.SimulationLog.Write("Using walltime %s from previous restart %s" % (existingProperties.GetProperty("walltime"), self.MaxRestartID))
    else:
        self.SimulationLog.Write("No previous walltime available to be used, using walltime %s" % Walltime.Walltime)

[..stuff that does not touch Walltime...]

        walltt = Walltime
        
        # always restrict our walltime to maxwalltime if requested walltime
        # is too large.
        if MaxWalltime.walltime_seconds < Walltime.walltime_seconds:
            walltt = MaxWalltime

            # okay, our walltime requested was too large
            # find out if we should use automatic job chaining.
            if chainedJobId is None:
                UseChaining = True
                # TODO: i don't understand the job chaining logic. a
                # restart should be presubmitted (instead of
                # submitted) if there is a restart currently running.
                # yet there is no check for this.
        
        new_properties['WALLTIME'] = walltt.Walltime
```

Worse, if I try to mimic thornyflat \(I have no account\) on my workstation using

```
[thornyflat]
submit = cat @SCRIPTFILE@ >/dev/tty
basedir = /data/rhaas/simulations
sourcebasedir = /data/@USER@
envsetup = true
```

and running:

```shell
./simfactory/bin/sim create-submit --machine thornyflat --parfile par/tov_ET.par  --cores 1 foobar
```

I do not see any such change in walltime myself.

Could you run, on thornyflat, this command, please:  

```
 ./simfactory/bin/sim print-mdb-entry thornyflat | grep maxwalltime
```

which should output what simfactory thinks the maximum allowed walltime is \(this is not output to the log, the 168 that you see comes from your command line option for walltime\).

You could also add some `print("maxwalltime is ", MaxWalltime,Walltime)` debug statements to `lib/simrestart.py`'s `submit` function to print out the value. Though as said, I am not sure what is going on.

--
Ticket URL: https://bitbucket.org/einsteintoolkit/tickets/issues/2725/walltime-being-overwritten-to-24-hours
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.einsteintoolkit.org/pipermail/trac/attachments/20230509/c1bbd401/attachment.htm>


More information about the Trac mailing list