[Users] ET test failures on Stampede

Ian Hinder ian.hinder at aei.mpg.de
Fri Oct 31 03:52:45 CDT 2014


On 30 Oct 2014, at 21:55, Erik Schnetter <schnetter at cct.lsu.edu> wrote:

> Ian
> 
> This new MPI version leads to problems running the benchmarks, and
> runs at half the speed. (This test was on a single node.)

Ouch. I didn't notice that with my runs; either I wasn't paying attention or it didn't happen there.  I will check the next time I run on stampede.  

> 
> -erik
> 
> On Thu, Oct 30, 2014 at 11:34 AM, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
>> 
>> On 30 Oct 2014, at 15:03, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
>> 
>>> I've begun to run the automated tests for the ET on our production
>>> machines. Things look very good almost everywhere, except on Stampede,
>>> one of the machines that is most important to us. It seems that there
>>> are many test failures for GRHydro, and these seem to be caused by
>>> segfaults. Does anybody volunteer to investigate?
>> 
>> I don't know anything about the problems with GRHydro.
>> 
>> I was having problems a while back with the current simfactory default version of mvapich2, and TACC support suggested I try the mvapich2-x version.  The problem I saw was the MPI reductions would hang.  I have been using that for a few months with no problems.  The required change to the optionlist is:
>> 
>>> 
>>> < MPI_DIR  = /opt/apps/intel13/mvapich2/1.9
>>> ---
>>>> MPI_DIR  = /home1/apps/intel13/mvapich2-x/2.0b
>>>> MPI_LIB_DIRS = /home1/apps/intel13/mvapich2-x/2.0b/lib64
>> 
>> 
>> At least, this was before the changes to the MPI thorn.  It's possible that this is no longer enough.
>> 
>> Should we change to this version in simfactory for the release?
>> 
>> --
>> Ian Hinder
>> http://numrel.aei.mpg.de/people/hinder
>> 
> 
> 
> 
> -- 
> Erik Schnetter <schnetter at cct.lsu.edu>
> http://www.perimeterinstitute.ca/personal/eschnetter/

-- 
Ian Hinder
http://numrel.aei.mpg.de/people/hinder



More information about the Users mailing list