[ET Trac] #2892: SpacetimeX: GPU error in PunctureTracker::PunctureContainer::interpolate
Miren Radia
trac-noreply at einsteintoolkit.org
Fri Oct 10 08:28:21 CDT 2025
#2892: SpacetimeX: GPU error in PunctureTracker::PunctureContainer::interpolate
Reporter: Miren Radia
Status: submitted
Milestone:
Version: development version
Type: bug
Priority: major
Component: CarpetX
This issue is related to #2890 but I’ve now tried running almost the same parameter file again but with the main versions of CarpetX/SpacetimeX.
I am trying to run a BH binary simulation with CarpetX on [tursa](https://epcced.github.io/dirac-docs/tursa-user-guide/hardware/) \(Nvidia A100 GPUs\) but I seem to be running into issues when the `PunctureTracker` thorn is enabled.
Please find attached the following:
* My thornlist: `spacetimex.th`
* My parameter file: `test-bbh.par` \(small grid configuration to make it easier to reproduce the issue\). This is more or less the same as the one in #2890 but with the following changes:
* `CarpetX::poison_undefined_values = no` I had to disable this as when I had it set to yes, I got the following error \(although it looks likely related\):
```
ERROR from host tu-c0r0n87 process 0
in thorn CarpetX, file /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/arrangements/CarpetX/CarpetX/src/valid.cxx:552:
-> CallFunction iteration 0 PunctureTracker_SetupGroup: PunctureTracker::PunctureTracker_Setup checking output: Grid array "BOXINBOX::position_x[2]" has 1 nans on time level 0; expected valid
The interior is valid because: CallFunction iteration 0 PunctureTracker_SetupGroup: PunctureTracker::PunctureTracker_Setup: Mark output variables as valid.
The outer boundary is valid because: CallFunction iteration 0 PunctureTracker_SetupGroup: PunctureTracker::PunctureTracker_Setup: Mark output variables as valid.
The ghost zones are valid because: CallFunction iteration 0 PunctureTracker_SetupGroup: PunctureTracker::PunctureTracker_Setup: Mark output variables as valid.
```
* `CarpetX::use_subcycling_wip = yes` removed
* `CarpetX::restrict_during_sync = yes` \(previously `no`\)
* My OptionList: `tursa.cfg`
* The full output from running the code through `cuda-gdb` including a backtrace at the error: `test-bbh-debug.out`
Here’s the end of the output printed from the simulation:
```
INFO (CarpetX): Starting evolution...
INFO (CarpetX): Regridding...
INFO (CarpetX): Setting max_grid_size values for all levels before regridding
INFO (CarpetX): ErrorEst patch 0 level 0
INFO (CarpetX): ErrorEst patch 0 level 0 done. Set/clear/total=288/3808/4096=7%/93%/100%
INFO (CarpetX): old levels 2, new levels 2
INFO (CarpetX): level 0: 1 boxes, 4096 cells (100%)
INFO (CarpetX): level 1: 1 boxes, 8192 cells (25%, 25%)
INFO (CarpetX): ScheduleTraverseGH iteration 1 CCTK_PRESTEP
INFO (CarpetX): ScheduleTraverseGH iteration 1 CCTK_EVOL
INFO (CarpetX): CallFunction iteration 1 CCTK_EVOL: ODESolvers::ODESolvers_Solve
INFO (ODESolvers): Integrator is RK4
INFO (ODESolvers): Integrating 22 variables
INFO (ODESolvers): Calculating RHS #1 at t=0
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_RHS
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_apply_newradx_boundary_conditi
ons
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_Enforce
INFO (CarpetX): SyncGroups Z4C::CHI, Z4C::GAMMA_TILDE, Z4C::K_HAT, Z4C::A_TILDE, Z4C::GAM_TILD
E, Z4C::THETA, Z4C::ALPHAG, Z4C::BETAG
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_ADM
INFO (CarpetX): CallFunction iteration 1 TmunuBaseX_SetTmunuVars: TmunuBaseX::TmunuBaseX_ZeroT
munu
INFO (ODESolvers): Calculated new state #1 at t=0.5
INFO (ODESolvers): Calculating RHS #2 at t=0.5
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_RHS
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_apply_newradx_boundary_conditi
ons
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_Enforce
INFO (CarpetX): SyncGroups Z4C::CHI, Z4C::GAMMA_TILDE, Z4C::K_HAT, Z4C::A_TILDE, Z4C::GAM_TILD
E, Z4C::THETA, Z4C::ALPHAG, Z4C::BETAG
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_ADM
INFO (CarpetX): CallFunction iteration 1 TmunuBaseX_SetTmunuVars: TmunuBaseX::TmunuBaseX_ZeroT
munu
INFO (ODESolvers): Calculated new state #2 at t=0.5
INFO (ODESolvers): Calculating RHS #3 at t=0.5
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_RHS
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_apply_newradx_boundary_conditi
ons
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_Enforce
INFO (CarpetX): SyncGroups Z4C::CHI, Z4C::GAMMA_TILDE, Z4C::K_HAT, Z4C::A_TILDE, Z4C::GAM_TILD
E, Z4C::THETA, Z4C::ALPHAG, Z4C::BETAG
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_ADM
INFO (CarpetX): CallFunction iteration 1 TmunuBaseX_SetTmunuVars: TmunuBaseX::TmunuBaseX_ZeroT
munu
INFO (ODESolvers): Calculated new state #3 at t=1
INFO (ODESolvers): Calculating RHS #4 at t=1
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_RHS
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_apply_newradx_boundary_conditi
ons
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_Enforce
INFO (CarpetX): SyncGroups Z4C::CHI, Z4C::GAMMA_TILDE, Z4C::K_HAT, Z4C::A_TILDE, Z4C::GAM_TILD
E, Z4C::THETA, Z4C::ALPHAG, Z4C::BETAG
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_ADM
INFO (CarpetX): CallFunction iteration 1 TmunuBaseX_SetTmunuVars: TmunuBaseX::TmunuBaseX_ZeroT
munu
INFO (ODESolvers): Calculated new state #4 at t=1
INFO (CarpetX): CallFunction iteration 1 CCTK_EVOL: PunctureTracker::PunctureTracker_Track
INFO (PunctureTracker): Tracking punctures...
INFO (PunctureTracker): Puncture #0 is at (4.46154,0,0)
INFO (PunctureTracker): Puncture #1 is at (-5.53846,0,0)
warning: Cuda API error detected: cudaLaunchKernel returned (0x1)
warning: Cuda API error detected: cudaPeekAtLastError returned (0x1)
warning: Cuda API error detected: cudaPeekAtLastError returned (0x1)
warning: Cuda API error detected: cudaGetLastError returned (0x1)
terminate called after throwing an instance of 'std::runtime_error'
what(): GPU error in file /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX
-main/configs/sim/scratch/external/AMReX/include/AMReX_Scan.H line 1242 invalid argument
```
Here’s the relevant part of the backtrace:
```
#0 0x000015554c8de52f in raise () from /lib64/libc.so.6
#1 0x000015554c8b1e65 in abort () from /lib64/libc.so.6
#2 0x000015554d176bd9 in __gnu_cxx::__verbose_terminate_handler ()
at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x000015554d18225a in __cxxabiv1::__terminate (handler=<optimized out>)
at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#4 0x000015554d1822c5 in std::terminate ()
at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58
#5 0x000015554d182517 in __cxa_throw (obj=<optimized out>,
tinfo=0x337e8dc0 <typeinfo for std::runtime_error@@GLIBCXX_3.4>,
dest=0x41da90 <std::runtime_error::~runtime_error()@plt>)
at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:98
#6 0x00000000004b290b in amrex::Error_host (
msg=0x4256d8e0 "GPU error in file /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactu
s-CarpetX-main/configs/sim/scratch/external/AMReX/include/AMReX_Scan.H line 1242 invalid argum
ent", type=0x2d4fba9 "Abort")
at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/scra
tch/build/AMReX/amrex-24.10/Src/Base/AMReX.cpp:242
#7 amrex::Abort (
msg=0x4256d8e0 "GPU error in file /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactu
s-CarpetX-main/configs/sim/scratch/external/AMReX/include/AMReX_Scan.H line 1242 invalid argum
ent")
at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/scra
tch/build/AMReX/amrex-24.10/Src/Base/AMReX.H:159
#8 amrex::Abort (msg=...)
at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/scra
tch/build/AMReX/amrex-24.10/Src/Base/AMReX.cpp:223
#9 0x00000000010de45f in amrex::Scan::ExclusiveSum<long, int, void> (a_ret_sum=...,
out=<optimized out>, in=0x15461cd98000, n=<optimized out>)
at /home/y07/shared/utils/core/gcc/12.2.0/include/c++/12.2.0/bits/new_allocator.h:90
#10 amrex::Gpu::exclusive_scan<int*, int*> (result=<optimized out>, end=<optimized out>,
begin=0x15461cd98000)
at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/scra
tch/external/AMReX/include/AMReX_Scan.H:1381
#11 amrex::DenseBins<amrex::BoxND<3> >::build<int, __nv_dl_wrapper_t<__nv_dl_tag<void (amrex::
DenseBins<amrex::BoxND<3> >::*)(amrex::BinPolicy::GPUBinPolicy, int, amrex::BoxND<3> const*, a
mrex::BoxND<3> const&, __nv_dl_wrapper_t<__nv_dl_trailing_return_tag<void (amrex::ParticleLoca
tor<amrex::DenseBins<amrex::BoxND<3> > >::*)(amrex::BoxArray const&, amrex::Geometry const&),
&amrex::ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::build, amrex::IntVectND<3>, 2u>,
amrex::IntVectND<3>, amrex::IntVectND<3> > const&), &amrex::DenseBins<amrex::BoxND<3> >::buil
d<int, __nv_dl_wrapper_t<__nv_dl_trailing_return_tag<void (amrex::ParticleLocator<amrex::Dense
Bins<amrex::BoxND<3> > >::*)(amrex::BoxArray const&, amrex::Geometry const&), &amrex::Particle
Locator<amrex::DenseBins<amrex::BoxND<3> > >::build, amrex::IntVectND<3>, 2u>, amrex::IntVectN
D<3>, amrex::IntVectND<3> > >, 1u>, __nv_dl_wrapper_t<__nv_dl_trailing_return_tag<void (amrex:
:ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::*)(amrex::BoxArray const&, amrex::Geome
try const&), &amrex::ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::build, amrex::IntVe
ctND<3>, 2u>, amrex::IntVectND<3>, amrex::IntVectND<3> > const, amrex::Dim3, amrex::Dim3 const> > (f=..., nbins=<optimized out>, v=<optimized out>, nitems=1, this=0x4337
d478) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/sc
ratch/external/AMReX/include/AMReX_DenseBins.H:252
#12 amrex::DenseBins<amrex::BoxND<3> >::build<int, __nv_dl_wrapper_t<__nv_dl_trailing_return_t
ag<void (amrex::ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::*)(amrex::BoxArray const
&, amrex::Geometry const&), &amrex::ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::buil
d, amrex::IntVectND<3>, 2u>, amrex::IntVectND<3>, amrex::IntVectND<3> > > (f=..., bx=<syntheti
c pointer>..., v=<optimized out>, nitems=1, this=0x4337d478) at /mnt/lustre/tursafs1/home/dp41
5/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/scratch/external/AMReX/include/AMReX_Dens
eBins.H:189
#13 amrex::DenseBins<amrex::BoxND<3> >::build<int, __nv_dl_wrapper_t<__nv_dl_trailing_return_t
ag<void (amrex::ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::*)(amrex::BoxArray const
&, amrex::Geometry const&), &amrex::ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::buil
d, amrex::IntVectND<3>, 2u>, amrex::IntVectND<3>, amrex::IntVectND<3> > > (f=..., bx=<syntheti
c pointer>..., v=<optimized out>, nitems=1, this=0x4337d478) at /mnt/lustre/tursafs1/home/dp41
5/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/scratch/external/AMReX/include/AMReX_Dens
eBins.H:132
#14 amrex::ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::build (this=0x4337d300, ba=..
., geom=...) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs
/sim/scratch/external/AMReX/include/AMReX_ParticleLocator.H:170
#15 0x00000000010e9982 in amrex::AmrParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::buil
d (this=this at entry=0x4337cc58, a_ba=..., a_geom=...) at /mnt/lustre/tursafs1/home/dp415/dp415/
dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/scratch/external/AMReX/include/AMReX_ParticleLoca
tor.H:285
#16 0x00000000010f5953 in amrex::AmrParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::buil
d (a_gdb=<optimized out>, this=0x4337cc58) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/E
TK/Cactus-CarpetX-main/configs/sim/scratch/external/AMReX/include/AMReX_ParticleLocator.H:310
#17 amrex::ParticleContainer_impl<amrex::Particle<3, 2>, 0, 0, amrex::ArenaAllocator, amrex::D
efaultAssignor>::RedistributeGPU (this=0x4337cc50, lev_min=0, lev_max=1, nGrow=0, local=0, rem
ove_negative=true) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/c
onfigs/sim/scratch/external/AMReX/include/AMReX_ParticleContainerI.H:1287
#18 0x00000000010c13f5 in amrex::ParticleContainer_impl<amrex::Particle<3, 2>, 0, 0, amrex::Ar
enaAllocator, amrex::DefaultAssignor>::Redistribute (remove_negative=true, local=0, nGrow=0, l
ev_max=-1, lev_min=0, this=0x4337cc50) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/C
actus-CarpetX-main/configs/sim/scratch/external/AMReX/include/AMReX_ParticleContainerI.H:1073
#19 CarpetX_Interpolate (cctkGH_=<optimized out>, npoints=2, globalsx=<optimized out>, globals
y=<optimized out>, globalsz=<optimized out>, nvars=3, varinds=0x42b6c0c0, operations=0x42baf43
0, allow_boundaries=1, resultptrs_=0x7fffffffaa80) at /mnt/lustre/tursafs1/home/dp415/dp415/dc
-radi1/ETK/Cactus-CarpetX-main/arrangements/CarpetX/CarpetX/src/interpolate.cxx:581
#20 0x00000000010bd35b in CarpetX_DriverInterpolate (cctkGH=0x33e8ec20, N_dims=<optimized out>
, local_interp_handle=<optimized out>, param_table_handle=200, coord_system_handle=<optimized
out>, N_interp_points=2, interp_coords_type_code=0, coords=0x7fffffffaa60, N_input_arrays=3, i
nput_array_variable_indices=0x7fffffffaa54, N_output_arrays=3, output_array_type_codes=0x7ffff
fffaa50, output_arrays=0x7fffffffaa80) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/C
actus-CarpetX-main/arrangements/CarpetX/CarpetX/src/interpolate.cxx:404
#21 0x000000000146c5b4 in PunctureTracker::PunctureContainer::interpolate (this=<optimized out
>, cctkGH=cctkGH at entry=0x33e8ec20) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactu
s-CarpetX-main/arrangements/SpacetimeX/PunctureTracker/src/puncture.cxx:61
#22 0x000000000146dba0 in PunctureTracker_Track (cctkGH=0x33e8ec20, cctkGH at entry=<error readin
g variable: value has been optimized out>) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/E
TK/Cactus-CarpetX-main/arrangements/SpacetimeX/PunctureTracker/src/puncture_tracker.cxx:132
```
The software versions I am using are:
* GCC 12.2.0
* CUDA 12.3
* Open MPI 4.1.5
I did try changing `CarpetX::interpolation_order` and `PunctureTracker::interp_order` \(keeping them the same\) but that didn’t seem to help.
Disabling the `PunctureTracker` thorn resolves the problem for me \(although then I get no puncture tracking of course\).
Please let me know if any further information would be helpful to debug this.
attachment: test-bbh.par (https://api.bitbucket.org/2.0/repositories/einsteintoolkit/tickets/issues/2892/attachments/test-bbh.par)
attachment: tursa.cfg (https://api.bitbucket.org/2.0/repositories/einsteintoolkit/tickets/issues/2892/attachments/tursa.cfg)
attachment: spacetimex.th (https://api.bitbucket.org/2.0/repositories/einsteintoolkit/tickets/issues/2892/attachments/spacetimex.th)
attachment: test-bbh-debug.out (https://api.bitbucket.org/2.0/repositories/einsteintoolkit/tickets/issues/2892/attachments/test-bbh-debug.out)
--
Ticket URL: https://bitbucket.org/einsteintoolkit/tickets/issues/2892/spacetimex-gpu-error-in-puncturetracker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.einsteintoolkit.org/pipermail/trac/attachments/20251010/99a6dac1/attachment.htm>
More information about the Trac
mailing list