[Users] Carpet with Intel MPI on Stampede

Ian Hinder ian.hinder at aei.mpg.de
Thu Sep 19 09:49:06 CDT 2013


Hi,

I've noticed that I get a number of jobs failing on stampede with the following error:

[59:c453-703][../../dapl_poll_rc.c:1360] Intel MPI fatal error: ofa-v2-mlx4_0-1 DTO operation posted for [1:c411-103] completed with error. status=0x8. cookie=0x0
Assertion failed in file ../../dapl_poll_rc.c at line 1362: 0
internal ABORT - process 59

It appears to be intermittent; if I restart from the last checkpoint, the simulation gets a bit further.  Switching to MVAPICH appears to improve things (time will tell) but it seems to be significantly slower than Intel MPI.  I remember also having problems on SuperMUC with Intel MPI which were solved by switching to IBM MPI there.  

Has anyone else seen these problems, and if so, do you have a workaround?  I'm using the standard simfactory optionlist with 512 cores and num-threads = 8.

-- 
Ian Hinder
http://numrel.aei.mpg.de/people/hinder

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: Message signed with OpenPGP using GPGMail
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20130919/6bea50b1/attachment.bin 


More information about the Users mailing list