[Users] Number of processors and ppn used
Nisa Amir
nisaamir at math.qau.edu.pk
Tue Mar 29 20:47:10 CDT 2022
Also I get the same issue when I run and submit the simulation on the
official Einstein Toolkit servel by using the jupyter note book
On Wed, 30 Mar 2022, 6:44 am Nisa Amir, <nisaamir at math.qau.edu.pk> wrote:
> This happens on the laptop that I autofigured by following the tutorial.
> After the warning the the job has been started I think the issue is that
> the memory has been killed and it stops further processing of the
> simulation.
>
> On Wed, 30 Mar 2022, 4:35 am Roland Haas, <rhaas at illinois.edu> wrote:
>
>> Hello Nisa,
>>
>> > When I submit my simulation
>> > %%bash
>> > # start simulation segment
>> > ./simfactory/bin/sim submit NH --cores=1 --ppn-used=8 --walltime=0:2:00
>> > Also tried this %%bash
>> >
>> > ./simfactory/bin/sim submit NH --cores=2 --num-threads=1
>> --walltime=0:20:00
>> > again it gives the same warning
>> > it gives the warning that Total number of threads and number of cores
>> per
>> > node are inconsistent: procs=1, ppn-used=8 (procs must be an integer
>> > multiple of ppn-used)
>> > and after that when i run the parameter file it does not run completely
>> and
>> > shows only half or more than half output.
>> > How can I resolve this issue so that I get the complete output.
>>
>> If there is output missing then most likely the job was killed by the
>> queuing system since it ran out of walltime. Note that the first
>> command requested only 2 minutes of walltime which is almost certainly
>> too short for any "real" run.
>>
>> Usually this will show up at the bottom of the *.err file.
>>
>> You can either let simfactory print both the *.out and the *.err file
>> to screen (or pipe into less) using:
>>
>> ./simfactory/bin/sim show-output NH | less
>>
>> or query where the simulation output directory is:
>>
>> ./simfactory/bin/sim get-output-dir NH
>>
>> then use cd to go there and less to take a look at the err file.
>>
>> The other option is that the job hung, which will usually also show up
>> as the queueing system killing your run due to it running out of
>> walltime, but also will typically mean that the last output (timestamp
>> of the output files eg *.asc visible via ls -l) is much older than the
>> time the job was killed by the queuing system.
>>
>> If there is no queueing system (laptop) then something else could kill
>> the job (eg runs out of memory).
>>
>> The warning about ppn-use is due to inconsistent options. Namely you
>> are claiming via ppn-used=8 to use 8 cores per node but then are
>> requesting only 1 core. It is just a warning though, if the job started
>> then you do not have to worry. If you would like to avoid the warning
>> you could use --cores 1 --ppn-used 1. Does his happen on a cluster
>> (private? One officially supported by the ET?)? Or you laptop that you
>> auto-configured via "sim setup-silent" or on the tutorial server?
>>
>> Yours,
>> Roland
>>
>> --
>> My email is as private as my paper mail. I therefore support encrypting
>> and signing email messages. Get my PGP key from http://keys.gnupg.net.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20220330/c49b3a8e/attachment-0001.html
More information about the Users
mailing list