[Users] Issues with Carpet run stalling on RIT cluster
Yosef Zlochower
yosef at astro.rit.edu
Mon Jun 10 06:40:32 CDT 2013
I tried a run without the ParitySymmetry thorn and saw the same behavior.
Also, the runs do not always output an assertion failure. Sometimes they
silently stall.
On 05/31/2013 07:55 AM, Yosef Zlochower wrote:
> Hi,
> I was wondering if anyone else saw this issue, or if it's specific to
> the particular cluster I am using.
> The problem is that the runs stall after a few hours. Basically, two
> processes (of 10-12)
> get killed due to a failed assert within the mpi library (I'm using
> openmpi on a qlogic infinipath
> IB network). From the backtrace, it looks like the recompose step is the
> one initiating the MPI
> call. Despite the name of the executable, this was actually run on a
> sandybridge processor.
> The OS is CentOs6, and I'm using icc version 13.1.1. The ET version is
> Orsted.
>
> I saw this issue with many different runs. However, the backtrace and
> assert failure below came from
> runs that used the "ParitySymmetry" thorn and the associated changes to
> CarpetRegrid2. I included
> the patch to CarpetRegrid2 at the bottom.
>
> n015:3.0.Assertion failure at ptl.c:200: nbytes == msglen
> n018:3.0.Assertion failure at ptl.c:200: nbytes == msglen
>
> Backtrace from rank 2 pid 27135:
> 1. /lib64/libc.so.6() [0x32a4832920]
> 2. /lib64/libc.so.6(gsignal+0x35) [0x32a48328a5]
> 3. /lib64/libc.so.6(abort+0x175) [0x32a4834085]
> 4. /usr/lib64/libpsm_infinipath.so.1(+0x17b6d) [0x7f577e6c6b6d]
> 5. /usr/lib64/libpsm_infinipath.so.1(psmi_handle_error+0x261)
> [0x7f577e6c6dd1]
> 6. /usr/lib64/libpsm_infinipath.so.1(psmi_am_mq_handler_rtsmatch+0x17a)
> [0x7f577e6c1f6a]
> 7. /usr/lib64/libpsm_infinipath.so.1(+0xa832) [0x7f577e6b9832]
> 8. /usr/lib64/libpsm_infinipath.so.1(+0xd90f) [0x7f577e6bc90f]
> 9. /usr/lib64/libpsm_infinipath.so.1(psmi_poll_internal+0x29)
> [0x7f577e6df309]
> a. /usr/lib64/libpsm_infinipath.so.1(psm_mq_ipeek+0xa5) [0x7f577e6dde05]
> b. /usr/mpi/gcc/openmpi-1.4.3-qlc/lib64/openmpi/mca_mtl_psm.so(+0x15f4)
> [0x7f577e9025f4]
> c.
> /usr/mpi/gcc/openmpi-1.4.3-qlc/lib64/libopen-pal.so.0(opal_progress+0x5a) [0x7f57815f30fa]
> d. /usr/mpi/gcc/openmpi-1.4.3-qlc/lib64/libmpi.so.0(+0x35685)
> [0x7f5782f8a685]
> e. /usr/mpi/gcc/openmpi-1.4.3-qlc/lib64/libmpi.so.0(PMPI_Waitall+0xa3)
> [0x7f5782fb6c73]
> f. comm_state::step()
> [./cactus_lazevnehalem(_ZN10comm_state4stepEv+0x4a2) [0x8826c2]]
> 10. dh::recompose(int, bool)
> [./cactus_lazevnehalem(_ZN2dh9recomposeEib+0x218) [0x89d8a8]]
> 11. gh::recompose(int, bool)
> [./cactus_lazevnehalem(_ZN2gh9recomposeEib+0x52) [0x8dc722]]
> 12. Carpet::Recompose(_cGH const*, int, bool)
> [./cactus_lazevnehalem(_ZN6Carpet9RecomposeEPK4_cGHib+0xea) [0x7db6ea]]
>
> diff --git a/Carpet/CarpetRegrid2/param.ccl b/Carpet/CarpetRegrid2/param.ccl
> index c7327d2..4843abe 100644
> --- a/Carpet/CarpetRegrid2/param.ccl
> +++ b/Carpet/CarpetRegrid2/param.ccl
> @@ -62,6 +62,11 @@ BOOLEAN symmetry_rotating180 "Ensure a 180 degree
> rotating symmetry about the z
> {
> } no
>
> +BOOLEAN symmetry_parity "parity "
> +{
> +} no
> +
> +
> BOOLEAN symmetry_periodic_x "Ensure a periodicity symmetry in the x
> direction"
> {
> } no
> diff --git a/Carpet/CarpetRegrid2/src/paramcheck.cc
> b/Carpet/CarpetRegrid2/src/paramcheck.cc
> index 5cc8978..679a562 100644
> --- a/Carpet/CarpetRegrid2/src/paramcheck.cc
> +++ b/Carpet/CarpetRegrid2/src/paramcheck.cc
> @@ -25,7 +25,7 @@ namespace CarpetRegrid2 {
> DECLARE_CCTK_ARGUMENTS;
> DECLARE_CCTK_PARAMETERS;
>
> - enum sym_t { sym_unknown, sym_90, sym_180 };
> + enum sym_t { sym_unknown, sym_90, sym_180, sym_parity };
>
> int num_params = 0;
> sym_t params = sym_unknown;
> @@ -40,7 +40,13 @@ namespace CarpetRegrid2 {
> params = sym_180;
> param = "symmetry_rotating180";
> }
> -
> +
> + if (symmetry_parity) {
> + ++num_params;
> + params = sym_parity;
> + param = "symmetry_parity";
> + }
> +
> int num_thorns = 0;
> sym_t thorns = sym_unknown;
> char const* thorn = "";
> @@ -59,13 +65,18 @@ namespace CarpetRegrid2 {
> thorns = sym_180;
> thorn = "RotatingSymmetry180";
> }
> -
> + if (CCTK_IsThornActive ("ParitySymmetry")) {
> + ++num_thorns;
> + thorns = sym_parity;
> + thorn = "ParitySymmetry";
> + }
> +
> if (num_params > 1) {
> - CCTK_PARAMWARN ("Too many of the symmetry parameters
> symmetry_rotating90 and symmetry_rotating180 are specified. (At most
> one of these can be specified.)");
> + CCTK_PARAMWARN ("Too many of the symmetry parameters at least two
> of symmetry_rotating90, symmetry_rotating180, and parity_symmetry are
> specified. (At most one of these can be specified.)");
> }
>
> if (num_thorns > 1) {
> - CCTK_PARAMWARN ("Too many of the symmetry thorns
> RotatingSymmetry90, RotatingSymmetry90r, and RotatingSymmetry180 are
> active. (At most one of these can be active.)");
> + CCTK_PARAMWARN ("Too many of the symmetry thorns
> RotatingSymmetry90, RotatingSymmetry90r, RotatingSymmetry180, and
> ParitySymmetry are active. (At most one of these can be active.)");
> }
>
> if (params != sym_unknown and thorns != sym_unknown and params !=
> thorns) {
> diff --git a/Carpet/CarpetRegrid2/src/property.cc
> b/Carpet/CarpetRegrid2/src/property.cc
> index a568e82..c2a31a1 100644
> --- a/Carpet/CarpetRegrid2/src/property.cc
> +++ b/Carpet/CarpetRegrid2/src/property.cc
> @@ -577,7 +577,121 @@ namespace CarpetRegrid2 {
> }
> }
>
> +
> //////////////////////////////////////////////////////////////////////////////
> + // Make the boxes parity symmetric
> +
> //////////////////////////////////////////////////////////////////////////////
> +
> + ibset parsym::
> + symmetrised_regions (gh const& hh, dh const& dd,
> + level_boundary const& bnd,
> + vector<ibset> const& regions, int const rl)
> + {
> + ibbox const& baseextent = hh.baseextent(0,rl);
> +
> + ibset symmetrised = regions.at(rl);
> + for (ibset::const_iterator
> + ibb = regions.at(rl).begin(); ibb != regions.at(rl).end();
> ++ ibb)
> + {
> + ibbox const& bb = *ibb;
> +
> + bvect const lower_is_outside_lower =
> + bb.lower() - bnd.min_bnd_dist_away[0] * bb.stride() <=
> + bnd.level_physical_ilower;
> +
> + // Treat z direction
> + int const dir = 2;
> + if (lower_is_outside_lower[dir]) {
> + ivect const ilo = bb.lower();
> + ivect const iup = bb.upper();
> + ivect const istr = bb.stride();
> + assert (istr[0] == istr[1]);
> +
> + // Origin
> + assert (hh.refcent == vertex_centered or all (istr % 2 == 0));
> + rvect const axis ( (bnd.physical_lower[0] +
> bnd.physical_upper[0]) / 2,
> + (bnd.physical_lower[1] +
> bnd.physical_upper[1]) / 2,
> + bnd.physical_lower[2]);
> + ivect const iaxis0 = rpos2ipos (axis, bnd.origin, bnd.scale,
> hh, rl);
> + assert (all ((iaxis0 - baseextent.lower()) % istr == 0));
> + ivect const iaxis1 = rpos2ipos1 (axis, bnd.origin, bnd.scale,
> hh, rl);
> + assert (all ((iaxis1 - baseextent.lower()) % istr == 0));
> + ivect const offset = iaxis1 - iaxis0;
> + assert (all (offset % istr == 0));
> + if (hh.refcent == vertex_centered) {
> + assert (all (offset >= 0 and offset < 2*istr));
> + assert (all ((iaxis0 + iaxis1 - offset) % (2*istr) == 0));
> + } else {
> + // The offset may be negative because both boundaries are
> + // shifted inwards by 1/2 grid spacing, and therefore iaxis0
> + // < iaxis1 + istr
> + assert (all (offset >= -istr and offset < istr));
> + assert (all ((iaxis0 + iaxis1 - offset) % (2*istr) == istr));
> + assert (all (istr % 2 == 0));
> + }
> + ivect const iaxis = (iaxis0 + iaxis1 - offset) / 2;
> + ivect const neg_ilo = (2*iaxis+offset) - ilo;
> + ivect const neg_iup = (2*iaxis+offset) - iup;
> +
> + // Rotate 180 degrees about z axis
> + ivect const new_ilo (neg_iup[0], neg_iup[1], neg_iup[2]);
> + ivect const new_iup (neg_ilo[0], neg_ilo[1], neg_ilo[2]);
> + ivect const new_istr (istr);
> +
> + ibbox const new_bb (new_ilo, new_iup, new_istr);
> + // Will be clipped later
> + // assert (new_bb.is_contained_in (baseextent));
> +
> + // symmetrised |= new_bb & baseextent;
> + symmetrised |= new_bb;
> + }
> + }
> +
> + return symmetrised;
> + }
> +
> + bool parsym::
> + test_impl (gh const& hh, dh const& dd,
> + level_boundary const& bnd,
> + vector<ibset> const& regions, int const rl)
> + {
> + DECLARE_CCTK_PARAMETERS;
> +
> + if (not symmetry_parity) return true;
> +
> + ibset const symmetrised = symmetrised_regions (hh, dd, bnd,
> regions, rl);
> +
> + // We cannot test for equality, since the difference may be
> + // outside of the domain (and hence irrelevant)
> + // return regions.AT(rl) == symmetrised;
> +
> + // Test whether any part of the difference (i.e. that part of the
> + // level that would be added by symmetrising) is inside the
> + // domain. If the difference is outside, we can safely ignore it.
> + ibbox const& baseextent = hh.baseextent(0,rl);
> + ibset const difference = symmetrised - regions.AT(rl);
> + return (difference & baseextent).empty();
> + }
>
> + void parsym::
> + enforce_impl (gh const& hh, dh const& dd,
> + level_boundary const& bnd,
> + vector<ibset>& regions, int const rl)
> + {
> + DECLARE_CCTK_PARAMETERS;
> +
> + assert (symmetry_parity);
> +
> + if (veryverbose) {
> + cout << "Refinement level " << rl << ": making regions parity
> symmetric...\n";
> + }
> +
> + regions.AT(rl) = symmetrised_regions (hh, dd, bnd, regions, rl);
> +
> + if (veryverbose) {
> + cout << " New regions are " << regions.at(rl) << "\n";
> + }
> + }
> +
>
> //////////////////////////////////////////////////////////////////////////////
> // Make the boxes periodic in one direction
> diff --git a/Carpet/CarpetRegrid2/src/property.hh
> b/Carpet/CarpetRegrid2/src/property.hh
> index d5540d6..b0080c7 100644
> --- a/Carpet/CarpetRegrid2/src/property.hh
> +++ b/Carpet/CarpetRegrid2/src/property.hh
> @@ -112,6 +112,18 @@ namespace CarpetRegrid2 {
> vector<ibset>& regions, int rl);
> };
>
> + // Make the boxes parity symmetric
> + class parsym: public property {
> + ibset symmetrised_regions (gh const& hh, dh const& dd,
> + level_boundary const& bnd,
> + vector<ibset> const& regions, int rl);
> + bool test_impl (gh const& hh, dh const& dd,
> + level_boundary const& bnd,
> + vector<ibset> const& regions, int rl);
> + void enforce_impl (gh const& hh, dh const& dd,
> + level_boundary const& bnd,
> + vector<ibset>& regions, int rl);
> + };
>
>
> // Make the boxes rotating-180 symmetric
> diff --git a/Carpet/CarpetRegrid2/src/regrid.cc
> b/Carpet/CarpetRegrid2/src/regrid.cc
> index 427d8b0..5b32a32 100644
> --- a/Carpet/CarpetRegrid2/src/regrid.cc
> +++ b/Carpet/CarpetRegrid2/src/regrid.cc
> @@ -329,6 +329,7 @@ namespace CarpetRegrid2 {
> properties.push_back (new snap_coarse());
> properties.push_back (new rotsym90());
> properties.push_back (new rotsym180());
> + properties.push_back (new parsym());
> properties.push_back (new periodic<0>());
> properties.push_back (new periodic<1>());
> properties.push_back (new periodic<2>());
>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
More information about the Users
mailing list