[Users] HDF5 Error, Hilbert

Ian Hinder ian.hinder at aei.mpg.de
Tue Oct 20 06:50:31 CDT 2015


On 20 Oct 2015, at 13:44, Geraint Pratten <g.pratten at sussex.ac.uk> wrote:

> Hi,
> 
> So I'm getting a problem passing the correct number of MPI processes through to Carpet. I'm not sure what I've messed up in the configurations, the error I am getting is:
> 
>> The environment variable CACTUS_NUM_PROCS is set to 4, but there are 1 MPI processes. This may indicate a severe problem with the MPI startup mechanism.
> 
> ​I've att​ached the .run and .sub scripts that I use. Can anyone see anything obviously wrong? The local cluster uses the UNIVA Grid Engine for submission scripts. 

Can you also send the optionlist?  Maybe it's a mismatch between the MPI installation used at compile and run time?


> 
> Thanks in advance!
> Geraint
> 
> 
> On 5 October 2015 at 12:34, Geraint Pratten <g.pratten at sussex.ac.uk> wrote:
> Ah, I think you could be right! It seems that I've messed up the submit/run scripts along the line! Relevant output attached below. I'll try fixing this now.
> 
> Thanks! Much appreciated!
> Geraint
> 
> INFO (Carpet): MPI is enabled
> INFO (Carpet): Carpet is running on 1 processes
> WARNING level 1 from host node207.cm.cluster process 0
>   while executing schedule bin (none), routine (no thorn)::(no routine)
>   in thorn Carpet, file /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:226:
>   -> The environment variable CACTUS_NUM_PROCS is set to 6, but there are 1 MPI processes. This may indicate a severe problem with the MPI startup mechanism.
> INFO (Carpet): This is process 0
> INFO (Carpet): OpenMP is enabled
> INFO (Carpet): This process contains 6 threads, this is thread 0
> WARNING level 2 from host node207.cm.cluster process 0
>   while executing schedule bin (none), routine (no thorn)::(no routine)
>   in thorn Carpet, file /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:256:
>   -> Although OpenMP is enabled, the environment variable CACTUS_NUM_THREADS is not set.
> INFO (Carpet): There are 6 threads in total
> INFO (Carpet): There are 6 threads per process
> INFO (Carpet): This process runs on host node207, pid=29183
> INFO (Carpet): This process runs on 24 cores: 0-23
> INFO (Carpet): Thread 0 runs on 24 cores: 0-23
> INFO (Carpet): Thread 1 runs on 24 cores: 0-23
> INFO (Carpet): Thread 2 runs on 24 cores: 0-23
> INFO (Carpet): Thread 3 runs on 24 cores: 0-23
> INFO (Carpet): Thread 4 runs on 24 cores: 0-23
> INFO (Carpet): Thread 5 runs on 24 cores: 0-23
> 
> On 5 October 2015 at 12:27, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
> 
> On 5 Oct 2015, at 12:43, Geraint Pratten <g.pratten at sussex.ac.uk> wrote:
> 
>> Hi,
>> 
>> I am trying to run a simulation with the same output directory - though I've been removing the simulations after each run while trying to fix the above issue. I should have write access to the file and there should be sufficient storage. 
> 
> Can you check the standard output of the run and see if the number of processes and threads are what you expect?  You should see something like "Carpet is running on N processes with M threads".  Maybe there is a problem in MPI initialisation, and multiple processes are trying to write to the same HDF5 file at the same time.
> 
> 
>> 
>> Geraint
>> 
>> On 5 October 2015 at 11:34, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
>> 
>> On 1 Oct 2015, at 15:49, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
>> 
>>> Geraint
>>> 
>>> Some very fundamental HDF5 operations fail. This could be due to a file system problem; maybe you don't have write access to the file, or your disk is full. It could also be that there exists a corrupted HDF5 file, and you are trying to write to it (extend it).
>>> 
>>> HDF5 files can get corrupted if you have opened it for writing, and then the application is interrupted or crashes, or you run out of disk space. Even if you make free space later, a corrupted file remains corrupted.
>> 
>> Hi Geraint,
>> 
>> Are you trying to run a simulation with output in the same directory as one that was run before?  
>> 
>>> 
>>> -erik
>>> 
>>> 
>>> On Thu, Oct 1, 2015 at 9:30 AM, Geraint Pratten <g.pratten at sussex.ac.uk> wrote:
>>> Hi all,
>>> 
>>> I recently built the latest release (Hilbert) on the local cluster and I've been getting a HDF5 error. I've tried building Cactus with both the local HDF5 package on the cluster as well as letting Cactus build HDF5 from scratch. Does anyone have any insight into the error I've attached below (extract of a long chain of errors of the same form)?
>>> 
>>> I can dump the output to file or give configuration files if needed.
>>> 
>>> Thanks in advance!
>>> Geraint
>>> 
>>> ----------
>>> 
>>> HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 0:
>>>   #000: H5Ddeprec.c line 193 in H5Dcreate1(): unable to create dataset
>>>     major: Dataset
>>>     minor: Unable to initialize object
>>>   #001: H5Dint.c line 453 in H5D__create_named(): unable to create and link to dataset
>>>     major: Dataset
>>>     minor: Unable to initialize object
>>>   #002: H5L.c line 1638 in H5L_link_object(): unable to create new link to object
>>>     major: Links
>>>     minor: Unable to initialize object
>>>   #003: H5L.c line 1882 in H5L_create_real(): can't insert link
>>>     major: Symbol table
>>>     minor: Unable to insert object
>>>   #004: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
>>>     major: Symbol table
>>>     minor: Object not found
>>>   #005: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed
>>>     major: Symbol table
>>>     minor: Callback failed
>>>   #006: H5L.c line 1674 in H5L_link_cb(): name already exists
>>>     major: Symbol table
>>>     minor: Object already exists
>>> WARNING level 1 from host node207.cm.cluster process 0
>>>   while executing schedule bin (none), routine (no thorn)::(no routine)
>>>   in thorn CarpetIOHDF5, file /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/CarpetIOHDF5/src/Output.cc:677:
>>>   -> HDF5 call 'dataset = H5Dcreate (outfile, datasetname.str().c_str(), filedatatype, dataspace, plist)' returned error code -1
>>> WARNING level 1 from host node207.cm.cluster process 0
>>>   while executing schedule bin (none), routine (no thorn)::(no routine)
>>>   in thorn CarpetIOHDF5, file /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/CarpetIOHDF5/src/Output.cc:689:
>>>   -> HDF5 call 'H5Dwrite (dataset, memdatatype, H5S_ALL, H5S_ALL, H5P_DEFAULT, data)' returned error code -1
>>> WARNING level 1 from host node207.cm.cluster process 0
>>>   while executing schedule bin (none), routine (no thorn)::(no routine)
>>>   in thorn CarpetIOHDF5, file /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/CarpetIOHDF5/src/Output.cc:725:
>>>   -> HDF5 call 'attr = H5Acreate (dataset, "level", H5T_NATIVE_INT, dataspace, H5P_DEFAULT)' returned error code -1
>>> WARNING level 1 from host node207.cm.cluster process 0
>>>   while executing schedule bin (none), routine (no thorn)::(no routine)
>>>   in thorn CarpetIOHDF5, file /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/CarpetIOHDF5/src/Output.cc:726:
>>>   -> HDF5 call 'H5Awrite (attr, H5T_NATIVE_INT, &refinementlevel)' returned error code -1
>>> WARNING level 1 from host node207.cm.cluster process 0
>>> 
>>> -- 
>>> Geraint Pratten
>>> Postdoctoral Research Associate
>>> 
>>> Mobile: +44(0) 7581709282
>>> E-mail: G.Pratten at sussex.ac.uk
>>> Skype: geraint.pratten
>>> 
>>> School of Mathematical and Physical Sciences
>>> Pevensey 3 Building
>>> University of Sussex
>>> Falmer Campus
>>> Brighton
>>> BN1 9QH
>>> United Kingdom
>>> 
>>> _______________________________________________
>>> Users mailing list
>>> Users at einsteintoolkit.org
>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Erik Schnetter <schnetter at cct.lsu.edu>
>>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>> _______________________________________________
>>> Users mailing list
>>> Users at einsteintoolkit.org
>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>> 
>> -- 
>> Ian Hinder
>> http://members.aei.mpg.de/ianhin
>> 
>> 
>> 
>> 
>> -- 
>> Geraint Pratten
>> Postdoctoral Research Associate
>> 
>> Mobile: +44(0) 7581709282
>> E-mail: G.Pratten at sussex.ac.uk
>> Skype: geraint.pratten
>> 
>> School of Mathematical and Physical Sciences
>> Pevensey 3 Building
>> University of Sussex
>> Falmer Campus
>> Brighton
>> BN1 9QH
>> United Kingdom
> 
> 
> -- 
> Ian Hinder
> http://members.aei.mpg.de/ianhin
> 
> 
> 
> 
> -- 
> Geraint Pratten
> Postdoctoral Research Associate
> 
> Mobile: +44(0) 7581709282
> E-mail: G.Pratten at sussex.ac.uk
> Skype: geraint.pratten
> 
> School of Mathematical and Physical Sciences
> Pevensey 3 Building
> University of Sussex
> Falmer Campus
> Brighton
> BN1 9QH
> United Kingdom
> 
> 
> 
> -- 
> Geraint Pratten
> Postdoctoral Research Associate
> 
> Mobile: +44(0) 7581709282
> E-mail: G.Pratten at sussex.ac.uk
> Skype: geraint.pratten
> 
> School of Mathematical and Physical Sciences
> Pevensey 3 Building
> University of Sussex
> Falmer Campus
> Brighton
> BN1 9QH
> United Kingdom
> <apollo.run><apollo.sub>

-- 
Ian Hinder
http://members.aei.mpg.de/ianhin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20151020/4fa42b9f/attachment-0001.html 


More information about the Users mailing list