[Users] Errors when submitting runs in remote machine

Steven Brandt sbrandt at cct.lsu.edu
Thu Oct 16 10:20:21 CDT 2025


Hi. Sorry for the late response.

On 10/2/2025 2:10 PM, Omar Elías Velasco Castillo wrote:
> Hello,
>
> This is a followup for retaking my questions about my failing to 
> submit and run simulations in a remote cluster.
>
>
>     Can you show us what the error message(s) are?
>
>
> Yes, the tail of my err file reads:
>
> + set -
> ERROR: ld.so: object '/lib64/libpapi.so.5.2.0.0' from LD_PRELOAD 
> cannot be preloaded: ignored.
> ERROR: ld.so: object '/lib64/libpapi.so.5.2.0.0' from LD_PRELOAD 
> cannot be preloaded: ignored.
> --------------------------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> --------------------------------------------------------------------------
> ERROR: ld.so: object '/lib64/libpapi.so.5.2.0.0' from LD_PRELOAD 
> cannot be preloaded: ignored.
> /home/ia/ovelasco/simulations/tov_ET_decisiva/SIMFACTORY/exe/cactus_sim: 
> error while loading shared libraries: libpapi.so.5.2.0.0: cannot open 
> shared object file: No such file or directory

Is it possible this library is installed on the head node and not the 
compute nodes? That could be the problem.

Do you need papi? Maybe build without it. Why is there a preload?

> =>> PBS: job killed: walltime 864033 exceeded limit 864000
> mpirun: abort is already in progress...hit ctrl-c again to forcibly 
> terminate
>
>
>
>     Maybe. I'm not 100% sure what you are doing. Can you be clearer
>     about how you are running the ET?
>
>
> Sure, as I mentioned in a previous email, my intention is to run and 
> submit the ET in a queue of a remote machine using Simfactory using 
> either PBS or SLURM, as I work in different remote machines from time 
> to time. The key point is that, I notice that while the sim and 
> cactus_sim build is being done, it seems that some lines printed on 
> the shell indicate that when a module is not found in the machine, the 
> ET compiler "builds a bundle" for those modules that weren't found in 
> the compilation from some thorns (examples below showing creation of 
> bundles for GSL and HDF5):
That is correct.
>
>
> ********************************************************************************
> Running configuration script for thorn GSL:
> GSL selected, but GSL_DIR not set. Checking pkg-config ...
> GSL not found. Checking standard paths ...
> GSL not found.
> Using bundled GSL...
> Finished running configuration script for thorn GSL.
> ********************************************************************************
> Running configuration script for thorn HDF5:
> Additional requested language support:  Fortran
> HDF5 selected, but HDF5_DIR not set. Checking pkg-config ...
> HDF5 not found. Checking standard paths ...
> HDF5 not found.
> Using bundled HDF5...
> Finished running configuration script for thorn HDF5.
>
> First, these messages about GSL and HDF5 not being found surprise me 
> because both modules are located in the /usr directory and I pointed 
> to them in the configurations file.
It may be that HDF5 isn't built with the correct options. You need 
hdf5+hl+fortran+cxx+mpi (i.e. you need fortran, cxx, mpi, and hl enabled).
>
> Second. If this build was completed succesfully in the home or login 
> shell of that machine, then I assume I can run and submit simulations 
> with the configuration and runscript files of my machine in PBS or 
> SLURM queues, am I right? If yes then, why I still haven't been 
> capable of doing so? My simulations die within some seconds after 
> being started.
That should be the case, but every cluster is a special animal.
>
> I include here the optionlist and runscript that I use for a machine 
> that uses PBS as an example.
>
> Optionlist:
>
> VERSION = 2025 #For Einstein Toolkit 2022_11
>
> CPP = cpp
> #CC  = gcc
> #CXX = g++
> CC  = mpicc
> CXX = mpic++
You should use gcc and g++ here.
>
> FPP = cpp
> #F90 = gfortran
> F90 = mpif90
Again, gfortran, not mpif90.
>
> CPPFLAGS = -I/usr/include
> CPPFLAGS += -DCCTK_VECTOR_DISABLE_TESTS
> CPPFLAGS = -D__USE_ISOC99
> PPFLAGS = -D_GLIBCXX_USE_C99_MATH
>
> LDFLAGS = -L/usr/lib64 -lssl -lcrypto -rdynamic
>
>
> FPPFLAGS = -traditional
>
> CFLAGS   = -g -std=gnu99
> CXXFLAGS = -g -std=gnu++11 -fpermissive
> F90FLAGS = -g -fcray-pointer -ffixed-line-length-none
>
> DEBUG = no
> CPP_DEBUG_FLAGS =
> C_DEBUG_FLAGS   =
> CXX_DEBUG_FLAGS =
>
> OPTIMISE = yes
> CPP_OPTIMISE_FLAGS =
> C_OPTIMISE_FLAGS   = -O2
> CXX_OPTIMISE_FLAGS = -O2
> F90_OPTIMISE_FLAGS = -O2
>
> PROFILE = no
> CPP_PROFILE_FLAGS =
> C_PROFILE_FLAGS   = -pg
> CXX_PROFILE_FLAGS = -pg
> F90_PROFILE_FLAGS = -pg
>
> WARN           = yes
> CPP_WARN_FLAGS = -Wall
> C_WARN_FLAGS   = -Wall
> CXX_WARN_FLAGS = -Wall
> F90_WARN_FLAGS = -Wall
>
> OPENMP           = yes
> CPP_OPENMP_FLAGS = -fopenmp
> FPP_OPENMP_FLAGS = -D_OPENMP
> C_OPENMP_FLAGS   = -fopenmp
> CXX_OPENMP_FLAGS = -fopenmp
> F90_OPENMP_FLAGS = -fopenmp
>
> VECTORISE                = no
> VECTORISE_ALIGNED_ARRAYS = no
> VECTORISE_INLINE         = yes
>
> PTHREADS_DIR = NO_BUILD
>
> # Para hallar todas estas opciones, hacemos: ldconfig -p | grep 
> nombredelpaquete
> MPI_DIR = /software/TEST/local
> #
>
> LAPACK_DIR = /usr
> #/usr/lib64/liblapack.so.3
> # lapack-3.4.2-8.el7.x86_64
>
> BLAS_DIR = /usr
> #/usr/lib64/libblas.so.3
> # blas-3.4.2-8.el7.x86_64
>
> HDF5_DIR = /usr
> #/usr/lib64/libhdf5.so.8
> # hdf5-1.8.12-13.el7.x86_64
>
> HWLOC_DIR = /usr
> #/usr/lib64/libhwloc.so.5
> # hwloc-1.11.8-4.el7.x86_64
>
> JPEG_DIR = /usr
> #/usr/lib64/libjpeg.so.62
> #
>
> YAML_DIR = /usr
> #/usr/lib64/libyaml-0.so.2
> # /usr/lib64/libyaml-0.so.2.0.
>
> ZLIB_DIR = /usr
> #/usr/lib64/imlib2/loaders/zlib.so
> # zlib-1.2.7-21.el7_9.x86_64
>
> GSL_DIR     = /usr
> #/usr/lib64/libgsl.so
> # gsl-1.15-13.el7.x86_64
>
> FFTW3_DIR   = /usr
> #/usr/lib64/libfftw3.so
> # fftw-3.3.3-8.el7.x86_64
>
> PAPI_DIR    = /usr
> #/usr/lib64/libpapi.so.5.2.0.0
> # papi-5.2.0-26.el7.x86_64
>
> XML2_DIR = /usr
> #/usr/lib64/libxml2.so.2
> # xml2-0.5-7.el7.x86_64
>
> NUMA_DIR = /usr
> #/usr/lib64/libnuma.so.1
>
> OPENSSL_DIR = /usr
> #/usr/lib64/libssl3.so
> # openssl-1.0.2k-26.el7_9.x86_64
>
>
> Runscript:
>
> #!/bin/bash
>
> set -x
> set -e
>
> cd @SIMULATION_DIR@
>
> # Environment setup
> source /opt/rh/devtoolset-8/enable
>
> module purge
> module load lamod/cmake/3.17
> module load lamod/fftw/gnu/3.3.8
> module load lamod/openmpi/gnu/4.1.0
> module load libraries/gsl/2.6_gnu
> module load libraries/hdf5/1.10.5_gnu

It's hard for me to tell whether this is a good environment.

Are you using simfactory to build and run? It's important to make sure 
you have the same env for both of those tasks.

>
> echo "Environment diagnostics:"
> date
> hostname
> env
> ldd @EXECUTABLE@ | grep -E "lapack|blas|openssl|stdc++|gfortran"
>
> # Set runtime parameters
> export CACTUS_NUM_PROCS=64
> export CACTUS_NUM_THREADS=1
> export OMP_NUM_THREADS=1
> export GMON_OUT_PREFIX=gmon.out
> env | sort > SIMFACTORY/ENVIRONMENT
>
> echo "Starting simulation at $(date)"
> export CACTUS_STARTTIME=$(date +%s)
>
This looks right.
> mpirun -np $CACTUS_NUM_PROCS -x LD_LIBRARY_PATH @EXECUTABLE@ -L 3 
> @PARFILE@
>
> echo "Simulation finished at $(date)"
> touch segment.done
>
>
>
>
>
>
> El jue, 18 sept 2025 a las 14:04, Steven Brandt 
> (<sbrandt at cct.lsu.edu>) escribió:
>
>
>     On 9/17/2025 12:11 PM, Omar Elías Velasco Castillo wrote:
>>     Dear Einstein Toolkit team,
>>
>>     I hope this message finds you well. I am a beginner with the
>>     Einstein Toolkit. On personal workstations I have been able to
>>     compile and run tutorial simulations at low resolution, but I am
>>     facing problems on two different clusters. I would like to ask
>>     two questions:
>>
>>     1. *Are there ET versions prior to 2022_05 (e.g. 2019–2020
>>     releases) that can still be downloaded and compiled
>>     successfully?* When I try to fetch them from the website using
>>     ./GetComponents, the process fails (CactusSourceJar.git is not
>>     created and some components do not download). Since some of the
>>     nodes I use have older GCC versions (8 or 10) and limited
>>     modules, a stable older release might be more practical.
>>
>>     2. During compilation, I notice that thorns (such as GSL and
>>     HDF5, for example) fall back to using the bundled versions
>>     because system modules are not found. The build completes
>>     successfully, but jobs fail immediately after submission to PBS
>>     or SLURM queues.
>           Can you show us what the error message(s) are?
>>
>>     *What is the role of the bundled versions in this case*?*If the
>>     build uses bundled GSL/HDF5, do I still need to load
>>     corresponding, compatible modules in the runscript?*
>>
>>     Could this mismatch explain why jobs die right after submission?
>
>     Maybe. I'm not 100% sure what you are doing. Can you be clearer
>     about how you are running the ET?
>
>     --Steve
>
>>
>>     Any advice on handling these issues would be very helpful. Thank
>>     you very much for your time and support.
>>
>>     Greetings,
>>
>>     O.V.
>>
>>
>>
>>
>>
>>     _______________________________________________
>>     Users mailing list
>>     Users at einsteintoolkit.org
>>     http://lists.einsteintoolkit.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.einsteintoolkit.org/pipermail/users/attachments/20251016/5ed9b778/attachment-0001.htm>


More information about the Users mailing list