[Users] MPI problems

Ian Hinder ian.hinder at aei.mpg.de
Mon May 15 14:10:58 CDT 2017


On 15 May 2017, at 16:49, Chris Stevens <c.stevens at ru.ac.za> wrote:

> Hi there,
> 
> I am new to Cactus, and have been having trouble getting the qc0-mclachlan.par test file to run. I have compiled the latest version of Cactus successfully on the CHPC cluster in South Africa.
> 
> I have attached .out and .err files for the run, along with my machine file, optionlist file, run and submit scripts. The submit command was 
> ./simfactory/bin/sim submit mctest --configuration=mclachlantest_mpidebug --parfile=par/qc0-mclachlan.par --procs=240 --num-threads=12 --walltime=10:0:0 --queue=normal --machine=lengau-intel
> 
> Using --mca orte_base_help_aggregate 0 in the mpirun command in the runscript, the error is:
> 
> [cnode0823:136405] *** An error occurred in MPI_Comm_create_keyval
> [cnode0823:136405] *** reported by process [476512257,3]
> [cnode0823:136405] *** on communicator MPI_COMM_WORLD
> [cnode0823:136405] *** MPI_ERR_ARG: invalid argument of some other kind
> [cnode0823:136405] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> [cnode0823:136405] ***    and potentially your MPI job)
> 
> I unfortunately have no idea where to go from here, and some help would be greatly appreciated! I hope I have attached enough information.
> 
Hi Chris,

Welcome to Cactus!  (meant in a friendly sense, not sarcastic!)

I cannot see anything wrong, and I've never seen this error before.  It's a mystery.  Have you tried running on fewer MPI processes?  I wonder if something is going wrong because the problem size is too small for the number of processes.  This *shouldn't* cause a problem, but it's something to try.  Is there another MPI implementation, or another version of OpenMPI, available on the machine?  Maybe it's a bug in OpenMPI 1.8.8.  

When faced with such strange behaviour, it's always worth wiping the configuration and rebuilding it, just in case it was not built cleanly.  e.g. when developing the optionlist, maybe you partially built the configuration, then corrected something, and the resulting configuration has a mixture of the two versions.  You can do this with rm -rf configs/mclachlantest_mpidebug, though I suspect that the debug version was built cleanly, so this is unlikely to be the problem.

-- 
Ian Hinder
http://members.aei.mpg.de/ianhin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20170515/4cea8c4a/attachment.html 


More information about the Users mailing list