[ET Trac] [Einstein Toolkit] #2023: CarpetInterp hangs with openmpi

Einstein Toolkit trac-noreply at einsteintoolkit.org
Fri Mar 17 08:17:40 CDT 2017


#2023: CarpetInterp hangs with openmpi
------------------------+---------------------------------------------------
  Reporter:  anonymous  |       Owner:                     
      Type:  defect     |      Status:  new                
  Priority:  unset      |   Milestone:                     
 Component:  Other      |     Version:  development version
Resolution:             |    Keywords:  OpenMPI, Carpet    
------------------------+---------------------------------------------------

Comment (by anonymous):

 Replying to [comment:1 hinder]:
 > This looks very similar to what we encountered with OmniPath on Minerva,
 which held up the acceptance for a few months.  In that case, it was a bug
 in the OmniPath driver, which Intel eventually found and fixed, using our
 reproducible example.  This was with SpEC, not Cactus.  It looks the same
 on surface (a

 Is that reproducible example a full SpEC run or a small example code I
 could test as well?
 Did you use hyperthreading and/or OpenMP?

 hang in MPI_Alltoallv), but you are using OpenMPI and infiniband, whereas
 we were using Intel MPI and OmniPath, so unfortunately this doesn't seem
 to be the same problem.  You could also try compiling without optimisation
 to see if that gives better backtraces.
 >
 > Is your problem reproducible?

 Not really. I did several short benchmarks (<20 min) with the same
 executable, and it happened to 2 out of 18. It also happened in one longer
 run after around 30 hours.

 >What cluster is this?
 The new cluster "holodeck" for NR at AEI Hanover. It's a 640 core Intel
 Xeon system.

 >Have you tried different OpenMPI versions, or a different MPI
 implementation?

 Not yet, we need to install them first..

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/2023#comment:2>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list