[ET Trac] #2892: SpacetimeX: GPU error in PunctureTracker::PunctureContainer::interpolate

Miren Radia trac-noreply at einsteintoolkit.org
Fri Oct 10 08:31:23 CDT 2025


#2892: SpacetimeX: GPU error in PunctureTracker::PunctureContainer::interpolate

 Reporter: Miren Radia
   Status: submitted
Milestone: 
  Version: development version
     Type: bug
 Priority: major
Component: CarpetX

Changes (by Miren Radia):
This issue is related to #2890 but I’ve now tried running almost the same parameter file again but with the main versions of CarpetX/SpacetimeX rather than Liwei’s development branches.

I am trying to run a BH binary simulation with CarpetX on [tursa](https://epcced.github.io/dirac-docs/tursa-user-guide/hardware/) \(Nvidia A100 GPUs\) but I seem to be running into issues when the `PunctureTracker` thorn is enabled.

Please find attached the following:

* My thornlist: `spacetimex.th`
* My parameter file: `test-bbh.par` \(small grid configuration to make it easier to reproduce the issue\). This is more or less the same as the one in #2890 but with the following changes:

    * `CarpetX::poison_undefined_values = no` I had to disable this as when I had it set to yes, I got the following error \(although it looks likely related\):
    
        `ERROR from host tu-c0r0n87 process 0 in thorn CarpetX, file /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/arrangements/CarpetX/CarpetX/src/valid.cxx:552: -> CallFunction iteration 0 PunctureTracker_SetupGroup: PunctureTracker::PunctureTracker_Setup checking output: Grid array "BOXINBOX::position_x[2]" has 1 nans on time level 0; expected valid The interior is valid because: CallFunction iteration 0 PunctureTracker_SetupGroup: PunctureTracker::PunctureTracker_Setup: Mark output variables as valid. The outer boundary is valid because: CallFunction iteration 0 PunctureTracker_SetupGroup: PunctureTracker::PunctureTracker_Setup: Mark output variables as valid. The ghost zones are valid because: CallFunction iteration 0 PunctureTracker_SetupGroup: PunctureTracker::PunctureTracker_Setup: Mark output variables as valid.` \* `CarpetX::use_subcycling_wip = yes` removed \* `CarpetX::restrict_during_sync = yes` \(previously `no`\)

    
    
* My OptionList: `tursa.cfg`
* The full output from running the code through `cuda-gdb` including a backtrace at the error: `test-bbh-debug.out`

Here’s the end of the output printed from the simulation:

```
INFO (CarpetX): Starting evolution...
INFO (CarpetX): Regridding...
INFO (CarpetX): Setting max_grid_size values for all levels before regridding
INFO (CarpetX): ErrorEst patch 0 level 0
INFO (CarpetX): ErrorEst patch 0 level 0 done. Set/clear/total=288/3808/4096=7%/93%/100%
INFO (CarpetX):   old levels 2, new levels 2
INFO (CarpetX):   level 0: 1 boxes, 4096 cells (100%)
INFO (CarpetX):   level 1: 1 boxes, 8192 cells (25%, 25%)
INFO (CarpetX): ScheduleTraverseGH iteration 1 CCTK_PRESTEP
INFO (CarpetX): ScheduleTraverseGH iteration 1 CCTK_EVOL
INFO (CarpetX): CallFunction iteration 1 CCTK_EVOL: ODESolvers::ODESolvers_Solve
INFO (ODESolvers): Integrator is RK4
INFO (ODESolvers):   Integrating 22 variables
INFO (ODESolvers): Calculating RHS #1 at t=0
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_RHS
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_apply_newradx_boundary_conditi
ons
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_Enforce
INFO (CarpetX): SyncGroups Z4C::CHI, Z4C::GAMMA_TILDE, Z4C::K_HAT, Z4C::A_TILDE, Z4C::GAM_TILD
E, Z4C::THETA, Z4C::ALPHAG, Z4C::BETAG
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_ADM
INFO (CarpetX): CallFunction iteration 1 TmunuBaseX_SetTmunuVars: TmunuBaseX::TmunuBaseX_ZeroT
munu
INFO (ODESolvers): Calculated new state #1 at t=0.5
INFO (ODESolvers): Calculating RHS #2 at t=0.5
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_RHS
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_apply_newradx_boundary_conditi
ons
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_Enforce
INFO (CarpetX): SyncGroups Z4C::CHI, Z4C::GAMMA_TILDE, Z4C::K_HAT, Z4C::A_TILDE, Z4C::GAM_TILD
E, Z4C::THETA, Z4C::ALPHAG, Z4C::BETAG
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_ADM
INFO (CarpetX): CallFunction iteration 1 TmunuBaseX_SetTmunuVars: TmunuBaseX::TmunuBaseX_ZeroT
munu
INFO (ODESolvers): Calculated new state #2 at t=0.5
INFO (ODESolvers): Calculating RHS #3 at t=0.5
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_RHS
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_apply_newradx_boundary_conditi
ons
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_Enforce
INFO (CarpetX): SyncGroups Z4C::CHI, Z4C::GAMMA_TILDE, Z4C::K_HAT, Z4C::A_TILDE, Z4C::GAM_TILD
E, Z4C::THETA, Z4C::ALPHAG, Z4C::BETAG
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_ADM
INFO (CarpetX): CallFunction iteration 1 TmunuBaseX_SetTmunuVars: TmunuBaseX::TmunuBaseX_ZeroT
munu
INFO (ODESolvers): Calculated new state #3 at t=1
INFO (ODESolvers): Calculating RHS #4 at t=1
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_RHS
INFO (CarpetX): CallFunction iteration 1 Z4c_RHSGroup: Z4c::Z4c_apply_newradx_boundary_conditi
ons
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_Enforce
INFO (CarpetX): SyncGroups Z4C::CHI, Z4C::GAMMA_TILDE, Z4C::K_HAT, Z4C::A_TILDE, Z4C::GAM_TILD
E, Z4C::THETA, Z4C::ALPHAG, Z4C::BETAG
INFO (CarpetX): CallFunction iteration 1 Z4c_PostStepGroup: Z4c::Z4c_ADM
INFO (CarpetX): CallFunction iteration 1 TmunuBaseX_SetTmunuVars: TmunuBaseX::TmunuBaseX_ZeroT
munu
INFO (ODESolvers): Calculated new state #4 at t=1
INFO (CarpetX): CallFunction iteration 1 CCTK_EVOL: PunctureTracker::PunctureTracker_Track
INFO (PunctureTracker): Tracking punctures...
INFO (PunctureTracker): Puncture #0 is at (4.46154,0,0)
INFO (PunctureTracker): Puncture #1 is at (-5.53846,0,0)
warning: Cuda API error detected: cudaLaunchKernel returned (0x1)

warning: Cuda API error detected: cudaPeekAtLastError returned (0x1)

warning: Cuda API error detected: cudaPeekAtLastError returned (0x1)

warning: Cuda API error detected: cudaGetLastError returned (0x1)

terminate called after throwing an instance of 'std::runtime_error'
  what():  GPU error in file /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX
-main/configs/sim/scratch/external/AMReX/include/AMReX_Scan.H line 1242 invalid argument
```

Here’s the relevant part of the backtrace:

```
#0  0x000015554c8de52f in raise () from /lib64/libc.so.6
#1  0x000015554c8b1e65 in abort () from /lib64/libc.so.6
#2  0x000015554d176bd9 in __gnu_cxx::__verbose_terminate_handler ()
    at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x000015554d18225a in __cxxabiv1::__terminate (handler=<optimized out>)
    at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#4  0x000015554d1822c5 in std::terminate ()
    at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58
#5  0x000015554d182517 in __cxa_throw (obj=<optimized out>,
    tinfo=0x337e8dc0 <typeinfo for std::runtime_error@@GLIBCXX_3.4>,
    dest=0x41da90 <std::runtime_error::~runtime_error()@plt>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:98
#6  0x00000000004b290b in amrex::Error_host (
    msg=0x4256d8e0 "GPU error in file /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactu
s-CarpetX-main/configs/sim/scratch/external/AMReX/include/AMReX_Scan.H line 1242 invalid argum
ent", type=0x2d4fba9 "Abort")
    at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/scra
tch/build/AMReX/amrex-24.10/Src/Base/AMReX.cpp:242
#7  amrex::Abort (
    msg=0x4256d8e0 "GPU error in file /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactu
s-CarpetX-main/configs/sim/scratch/external/AMReX/include/AMReX_Scan.H line 1242 invalid argum
ent")
    at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/scra
tch/build/AMReX/amrex-24.10/Src/Base/AMReX.H:159
#8  amrex::Abort (msg=...)
    at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/scra
tch/build/AMReX/amrex-24.10/Src/Base/AMReX.cpp:223
#9  0x00000000010de45f in amrex::Scan::ExclusiveSum<long, int, void> (a_ret_sum=...,
    out=<optimized out>, in=0x15461cd98000, n=<optimized out>)
    at /home/y07/shared/utils/core/gcc/12.2.0/include/c++/12.2.0/bits/new_allocator.h:90
#10 amrex::Gpu::exclusive_scan<int*, int*> (result=<optimized out>, end=<optimized out>,
    begin=0x15461cd98000)
    at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/scra
tch/external/AMReX/include/AMReX_Scan.H:1381
#11 amrex::DenseBins<amrex::BoxND<3> >::build<int, __nv_dl_wrapper_t<__nv_dl_tag<void (amrex::
DenseBins<amrex::BoxND<3> >::*)(amrex::BinPolicy::GPUBinPolicy, int, amrex::BoxND<3> const*, a
mrex::BoxND<3> const&, __nv_dl_wrapper_t<__nv_dl_trailing_return_tag<void (amrex::ParticleLoca
tor<amrex::DenseBins<amrex::BoxND<3> > >::*)(amrex::BoxArray const&, amrex::Geometry const&),
&amrex::ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::build, amrex::IntVectND<3>, 2u>,
 amrex::IntVectND<3>, amrex::IntVectND<3> > const&), &amrex::DenseBins<amrex::BoxND<3> >::buil
d<int, __nv_dl_wrapper_t<__nv_dl_trailing_return_tag<void (amrex::ParticleLocator<amrex::Dense
Bins<amrex::BoxND<3> > >::*)(amrex::BoxArray const&, amrex::Geometry const&), &amrex::Particle
Locator<amrex::DenseBins<amrex::BoxND<3> > >::build, amrex::IntVectND<3>, 2u>, amrex::IntVectN
D<3>, amrex::IntVectND<3> > >, 1u>, __nv_dl_wrapper_t<__nv_dl_trailing_return_tag<void (amrex:
:ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::*)(amrex::BoxArray const&, amrex::Geome
try const&), &amrex::ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::build, amrex::IntVe
ctND<3>, 2u>, amrex::IntVectND<3>, amrex::IntVectND<3> > const, amrex::Dim3, amrex::Dim3 const> > (f=..., nbins=<optimized out>, v=<optimized out>, nitems=1, this=0x4337
d478) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/sc
ratch/external/AMReX/include/AMReX_DenseBins.H:252
#12 amrex::DenseBins<amrex::BoxND<3> >::build<int, __nv_dl_wrapper_t<__nv_dl_trailing_return_t
ag<void (amrex::ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::*)(amrex::BoxArray const
&, amrex::Geometry const&), &amrex::ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::buil
d, amrex::IntVectND<3>, 2u>, amrex::IntVectND<3>, amrex::IntVectND<3> > > (f=..., bx=<syntheti
c pointer>..., v=<optimized out>, nitems=1, this=0x4337d478) at /mnt/lustre/tursafs1/home/dp41
5/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/scratch/external/AMReX/include/AMReX_Dens
eBins.H:189
#13 amrex::DenseBins<amrex::BoxND<3> >::build<int, __nv_dl_wrapper_t<__nv_dl_trailing_return_t
ag<void (amrex::ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::*)(amrex::BoxArray const
&, amrex::Geometry const&), &amrex::ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::buil
d, amrex::IntVectND<3>, 2u>, amrex::IntVectND<3>, amrex::IntVectND<3> > > (f=..., bx=<syntheti
c pointer>..., v=<optimized out>, nitems=1, this=0x4337d478) at /mnt/lustre/tursafs1/home/dp41
5/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/scratch/external/AMReX/include/AMReX_Dens
eBins.H:132
#14 amrex::ParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::build (this=0x4337d300, ba=..
., geom=...) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/configs
/sim/scratch/external/AMReX/include/AMReX_ParticleLocator.H:170
#15 0x00000000010e9982 in amrex::AmrParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::buil
d (this=this at entry=0x4337cc58, a_ba=..., a_geom=...) at /mnt/lustre/tursafs1/home/dp415/dp415/
dc-radi1/ETK/Cactus-CarpetX-main/configs/sim/scratch/external/AMReX/include/AMReX_ParticleLoca
tor.H:285
#16 0x00000000010f5953 in amrex::AmrParticleLocator<amrex::DenseBins<amrex::BoxND<3> > >::buil
d (a_gdb=<optimized out>, this=0x4337cc58) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/E
TK/Cactus-CarpetX-main/configs/sim/scratch/external/AMReX/include/AMReX_ParticleLocator.H:310
#17 amrex::ParticleContainer_impl<amrex::Particle<3, 2>, 0, 0, amrex::ArenaAllocator, amrex::D
efaultAssignor>::RedistributeGPU (this=0x4337cc50, lev_min=0, lev_max=1, nGrow=0, local=0, rem
ove_negative=true) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactus-CarpetX-main/c
onfigs/sim/scratch/external/AMReX/include/AMReX_ParticleContainerI.H:1287
#18 0x00000000010c13f5 in amrex::ParticleContainer_impl<amrex::Particle<3, 2>, 0, 0, amrex::Ar
enaAllocator, amrex::DefaultAssignor>::Redistribute (remove_negative=true, local=0, nGrow=0, l
ev_max=-1, lev_min=0, this=0x4337cc50) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/C
actus-CarpetX-main/configs/sim/scratch/external/AMReX/include/AMReX_ParticleContainerI.H:1073
#19 CarpetX_Interpolate (cctkGH_=<optimized out>, npoints=2, globalsx=<optimized out>, globals
y=<optimized out>, globalsz=<optimized out>, nvars=3, varinds=0x42b6c0c0, operations=0x42baf43
0, allow_boundaries=1, resultptrs_=0x7fffffffaa80) at /mnt/lustre/tursafs1/home/dp415/dp415/dc
-radi1/ETK/Cactus-CarpetX-main/arrangements/CarpetX/CarpetX/src/interpolate.cxx:581
#20 0x00000000010bd35b in CarpetX_DriverInterpolate (cctkGH=0x33e8ec20, N_dims=<optimized out>
, local_interp_handle=<optimized out>, param_table_handle=200, coord_system_handle=<optimized
out>, N_interp_points=2, interp_coords_type_code=0, coords=0x7fffffffaa60, N_input_arrays=3, i
nput_array_variable_indices=0x7fffffffaa54, N_output_arrays=3, output_array_type_codes=0x7ffff
fffaa50, output_arrays=0x7fffffffaa80) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/C
actus-CarpetX-main/arrangements/CarpetX/CarpetX/src/interpolate.cxx:404
#21 0x000000000146c5b4 in PunctureTracker::PunctureContainer::interpolate (this=<optimized out
>, cctkGH=cctkGH at entry=0x33e8ec20) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/ETK/Cactu
s-CarpetX-main/arrangements/SpacetimeX/PunctureTracker/src/puncture.cxx:61
#22 0x000000000146dba0 in PunctureTracker_Track (cctkGH=0x33e8ec20, cctkGH at entry=<error readin
g variable: value has been optimized out>) at /mnt/lustre/tursafs1/home/dp415/dp415/dc-radi1/E
TK/Cactus-CarpetX-main/arrangements/SpacetimeX/PunctureTracker/src/puncture_tracker.cxx:132
```

The software versions I am using are:

* GCC 12.2.0
* CUDA 12.3
* Open MPI 4.1.5

I did try changing `CarpetX::interpolation_order` and `PunctureTracker::interp_order` \(keeping them the same\) but that didn’t seem to help.

Disabling the `PunctureTracker` thorn resolves the problem for me \(although then I get no puncture tracking of course\).

Please let me know if any further information would be helpful to debug this.

--
Ticket URL: https://bitbucket.org/einsteintoolkit/tickets/issues/2892/spacetimex-gpu-error-in-puncturetracker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.einsteintoolkit.org/pipermail/trac/attachments/20251010/d1a6e823/attachment-0001.htm>


More information about the Trac mailing list