[Users] MPI problems

Chris Stevens c.stevens at ru.ac.za
Tue May 16 01:52:39 CDT 2017


Hi Ian,

thanks for your quick reply!

There are a couple of other guys here who have successfully used the 
toolkit with this version of openmpi, so I'm not sure if this is at 
fault. There is not another version of openmpi available currently on 
the cluster, but I could ask for this to be remedied.

I have tried using procs=48 and num-threads=2 but I find the same problem.

As you say, I recompiled a clean config to double check. I have also 
used the May 16 version but to no avail.

Could it be something to do with compilation, that has gone wrong 
somehow, even though it finishes successfully with done?

Cheers,

Chris


On 05/15/2017 09:10 PM, Ian Hinder wrote:
>
> On 15 May 2017, at 16:49, Chris Stevens <c.stevens at ru.ac.za 
> <mailto:c.stevens at ru.ac.za>> wrote:
>
>> Hi there,
>>
>> I am new to Cactus, and have been having trouble getting the 
>> qc0-mclachlan.par test file to run. I have compiled the latest 
>> version of Cactus successfully on the CHPC cluster in South Africa.
>>
>> I have attached .out and .err files for the run, along with my 
>> machine file, optionlist file, run and submit scripts. The submit 
>> command was
>>
>> ./simfactory/bin/sim submit mctest 
>> --configuration=mclachlantest_mpidebug 
>> --parfile=par/qc0-mclachlan.par --procs=240 --num-threads=12 
>> --walltime=10:0:0 --queue=normal --machine=lengau-intel
>>
>> Using --mca orte_base_help_aggregate 0 in the mpirun command in the 
>> runscript, the error is:
>>
>> [cnode0823:136405] *** An error occurred in MPI_Comm_create_keyval
>> [cnode0823:136405] *** reported by process [476512257,3]
>> [cnode0823:136405] *** on communicator MPI_COMM_WORLD
>> [cnode0823:136405] *** MPI_ERR_ARG: invalid argument of some other kind
>> [cnode0823:136405] *** MPI_ERRORS_ARE_FATAL (processes in this 
>> communicator will now abort,
>> [cnode0823:136405] ***    and potentially your MPI job)
>>
>> I unfortunately have no idea where to go from here, and some help 
>> would be greatly appreciated! I hope I have attached enough information.
>>
> Hi Chris,
>
> Welcome to Cactus!  (meant in a friendly sense, not sarcastic!)
>
> I cannot see anything wrong, and I've never seen this error before. 
>  It's a mystery.  Have you tried running on fewer MPI processes?  I 
> wonder if something is going wrong because the problem size is too 
> small for the number of processes.  This *shouldn't* cause a problem, 
> but it's something to try.  Is there another MPI implementation, or 
> another version of OpenMPI, available on the machine?  Maybe it's a 
> bug in OpenMPI 1.8.8.
>
> When faced with such strange behaviour, it's always worth wiping the 
> configuration and rebuilding it, just in case it was not built 
> cleanly.  e.g. when developing the optionlist, maybe you partially 
> built the configuration, then corrected something, and the resulting 
> configuration has a mixture of the two versions.  You can do this with 
> rm -rf configs/mclachlantest_mpidebug, though I suspect that the debug 
> version was built cleanly, so this is unlikely to be the problem.
>
> -- 
> Ian Hinder
> http://members.aei.mpg.de/ianhin
>

-- 
Dr Chris Stevens

Department of Mathematics

Rhodes University

Room 5

Ph: +27 46 603 8932

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20170516/24e86274/attachment.html 


More information about the Users mailing list