[Users] Setting up ETK on Discoverer

Jay Vijay Kalinani jayvijay.kalinani at phd.unipd.it
Tue Sep 6 12:40:31 CDT 2022


Hi all,

I am trying to perform simulations on the Discoverer cluster (
https://docs.discoverer.bg/index.html) using the latest Einstein Toolkit
(ETK) release (ET_2022_05) and the Spritz GRMHD code.
To compile ETK on Discoverer, I am attaching the simfactory configuration
files which I had newly prepared. I am also attaching the list of modules
which were loaded.

To submit the simulation, for instance, I use the following simfactory
command:

sim submit BNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49
--parfile=./par/BNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49.par
--config=newspritzgnu --machine=discoverer --procs=256 --num-threads=1
--ppn-used=128 --walltime=24:00:00



Unfortunately, my simulations crash after running for some time. My guess
is that I might not be correctly setting the configuration options or flags
during compilation, which might affect my simulation during runtime, but I
am not completely certain.
I am attaching the output file, the error file, the parfile as well as the
generated backtrace for the simulation which used 256 procs. I also looked
at the hexadecimal addresses in the backtrace with addr2line, but
unfortunately all of them return "??:0"

I also noticed that when changing the number of processors, the simulation
crashes at different times. But if I keep the number of processors as the
same, the simulation always crashes at the same point.
For instance, simulation with 256 processors ran for about 2 hours on the
cluster, and crashed after completing about 4600 iterations. One submitted
with 1280 processors ran for about 12 hours and crashed after completing
about 32000 iterations. Simulation with 1792 processors instead crashed
soon after the start of the simulation (within few minutes), even before
reaching iteration 0. For all cases, I always set number of threads as 1.

If you have any suggestions or insights on why the simulations crash and in
case I have any incorrect settings in the configuration files, kindly let
me know. I would greatly appreciate your help. If you need any further
information from my side, please let me know too.

Thank you very much.
Kind regards,
Jay Kalinani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20220906/afbf36d5/attachment-0001.html 
-------------- next part --------------
Backtrace from rank 95 pid 1642950:
1. CarpetLib::signal_handler(int)   [/discofs/jaykalinani/simulations/newBNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49/SIMFACTORY/exe/cactus_newspritzgnu(_ZN9CarpetLib14signal_handlerEi+0xaf) [0x55dbac64c88f]]
2. /lib64/libc.so.6(+0x37400) [0x7f16d7ae2400]
3. CarpetLib::bbox<int, 2>::is_poison() const   [/discofs/jaykalinani/simulations/newBNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49/SIMFACTORY/exe/cactus_newspritzgnu(_ZNK9CarpetLib4bboxIiLi2EE9is_poisonEv+0) [0x55dbac64f6f0]]
4. CarpetLib::bboxset2::bboxset<int, 2>::bboxset(CarpetLib::bbox<int, 2> const&)   [/discofs/jaykalinani/simulations/newBNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49/SIMFACTORY/exe/cactus_newspritzgnu(_ZN9CarpetLib8bboxset27bboxsetIiLi2EEC1ERKNS_4bboxIiLi2EEE+0x85) [0x55dbac558325]]
5. CarpetLib::bboxset2::bboxset<int, 3>::bboxset(CarpetLib::bbox<int, 3> const&)   [/discofs/jaykalinani/simulations/newBNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49/SIMFACTORY/exe/cactus_newspritzgnu(_ZN9CarpetLib8bboxset27bboxsetIiLi3EEC2ERKNS_4bboxIiLi3EEE+0x168) [0x55dbac5588c8]]
6. CarpetLib::dh::regrid(bool)   [/discofs/jaykalinani/simulations/newBNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49/SIMFACTORY/exe/cactus_newspritzgnu(_ZN9CarpetLib2dh6regridEb+0x3a04) [0x55dbac692724]]
7. CarpetLib::gh::regrid(std::vector<std::vector<CarpetLib::region_t, std::allocator<CarpetLib::region_t> >, std::allocator<std::vector<CarpetLib::region_t, std::allocator<CarpetLib::region_t> > > > const&, std::vector<std::vector<std::vector<CarpetLib::region_t, std::allocator<CarpetLib::region_t> >, std::allocator<std::vector<CarpetLib::region_t, std::allocator<CarpetLib::region_t> > > >, std::allocator<std::vector<std::vector<CarpetLib::region_t, std::allocator<CarpetLib::region_t> >, std::allocator<std::vector<CarpetLib::region_t, std::allocator<CarpetLib::region_t> > > > > > const&, bool)   [/discofs/jaykalinani/simulations/newBNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49/SIMFACTORY/exe/cactus_newspritzgnu(_ZN9CarpetLib2gh6regridERKSt6vectorIS1_INS_8region_tESaIS2_EESaIS4_EERKS1_IS6_SaIS6_EEb+0x826) [0x55dbac6c3276]]
8. Carpet::RegridMap(_cGH const*, int, std::vector<std::vector<CarpetLib::region_t, std::allocator<CarpetLib::region_t> >, std::allocator<std::vector<CarpetLib::region_t, std::allocator<CarpetLib::region_t> > > > const&, std::vector<std::vector<std::vector<CarpetLib::region_t, std::allocator<CarpetLib::region_t> >, std::allocator<std::vector<CarpetLib::region_t, std::allocator<CarpetLib::region_t> > > >, std::allocator<std::vector<std::vector<CarpetLib::region_t, std::allocator<CarpetLib::region_t> >, std::allocator<std::vector<CarpetLib::region_t, std::allocator<CarpetLib::region_t> > > > > > const&, bool)   [/discofs/jaykalinani/simulations/newBNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49/SIMFACTORY/exe/cactus_newspritzgnu(_ZN6Carpet9RegridMapEPK4_cGHiRKSt6vectorIS3_IN9CarpetLib8region_tESaIS5_EESaIS7_EERKS3_IS9_SaIS9_EEb+0x9d) [0x55dbac54fa4d]]
9. Carpet::Regrid(_cGH const*, bool, bool)   [/discofs/jaykalinani/simulations/newBNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49/SIMFACTORY/exe/cactus_newspritzgnu(_ZN6Carpet6RegridEPK4_cGHbb+0xbd7) [0x55dbac550797]]
10. Carpet::Evolve(tFleshConfig*)   [/discofs/jaykalinani/simulations/newBNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49/SIMFACTORY/exe/cactus_newspritzgnu(_ZN6Carpet6EvolveEP12tFleshConfig+0x1326) [0x55dbac52f846]]
11. /discofs/jaykalinani/simulations/newBNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49/SIMFACTORY/exe/cactus_newspritzgnu(main+0x3e) [0x55dbac1a90fe]
12. /lib64/libc.so.6(__libc_start_main+0xf3) [0x7f16d7ace493]
13. /discofs/jaykalinani/simulations/newBNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49/SIMFACTORY/exe/cactus_newspritzgnu(_start+0x2e) [0x55dbac1af4ce]

The hexadecimal addresses in this backtrace can also be interpreted
with a debugger (e.g. gdb), or with the 'addr2line' (or 'gaddr2line')
command line tool: 'addr2line -e cactus_sim <address>'.
-------------- next part --------------
Currently Loaded Modulefiles:
 1) gcc/11/latest     4) python/3/3.10/latest   7) gsl/2/latest-gcc           10) gcc/12/latest                                     13) cmake/3/3.23.2         
 2) sqlite/3/latest   5) openmpi/4/gcc/4.1.1    8) openmpi/4/gcc/latest       11) zlib/1/latest                                     14) libjpeg-turbo/2/2.1.1  
 3) xz/latest         6) openblas/latest-gcc    9) fftw/3/latest-gcc-openmpi  12) hdf5/1/1.10/1.10.7-gcc_11-zlib_1-api_v110-nojava  15) boost/1/1.79.0-gcc     
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49.par
Type: application/octet-stream
Size: 33283 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20220906/afbf36d5/attachment-0007.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49.err
Type: application/octet-stream
Size: 2067 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20220906/afbf36d5/attachment-0008.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BNS_IF_fluxCT_dx018_q10_RPA_RotGas_E8e49.out
Type: application/octet-stream
Size: 378049 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20220906/afbf36d5/attachment-0009.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: discoverer.run
Type: application/octet-stream
Size: 1457 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20220906/afbf36d5/attachment-0010.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: discoverer.cfg
Type: application/octet-stream
Size: 4323 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20220906/afbf36d5/attachment-0011.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: discoverer.ini
Type: application/octet-stream
Size: 1709 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20220906/afbf36d5/attachment-0012.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: discoverer.sub
Type: application/octet-stream
Size: 613 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20220906/afbf36d5/attachment-0013.obj 


More information about the Users mailing list