[Users] HDF5 Error, Hilbert

Geraint Pratten g.pratten at sussex.ac.uk
Tue Oct 20 06:44:45 CDT 2015


Hi,

So I'm getting a problem passing the correct number of MPI processes
through to Carpet. I'm not sure what I've messed up in the configurations,
the error I am getting is:

The environment variable CACTUS_NUM_PROCS is set to 4, but there are 1 MPI
processes. This may indicate a severe problem with the MPI startup
mechanism.

​I've att​ached the .run and .sub scripts that I use. Can anyone see
anything obviously wrong? The local cluster uses the UNIVA Grid Engine for
submission scripts.

Thanks in advance!
Geraint


On 5 October 2015 at 12:34, Geraint Pratten <g.pratten at sussex.ac.uk> wrote:

> Ah, I think you could be right! It seems that I've messed up the
> submit/run scripts along the line! Relevant output attached below. I'll try
> fixing this now.
>
> Thanks! Much appreciated!
> Geraint
>
> INFO (Carpet): MPI is enabled
>
> INFO (Carpet): Carpet is running on 1 processes
>
> WARNING level 1 from host node207.cm.cluster process 0
>
>   while executing schedule bin (none), routine (no thorn)::(no routine)
>
>   in thorn Carpet, file
> /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:226:
>
>   -> The environment variable CACTUS_NUM_PROCS is set to 6, but there are
> 1 MPI processes. This may indicate a severe problem with the MPI startup
> mechanism.
>
> INFO (Carpet): This is process 0
>
> INFO (Carpet): OpenMP is enabled
>
> INFO (Carpet): This process contains 6 threads, this is thread 0
>
> WARNING level 2 from host node207.cm.cluster process 0
>
>   while executing schedule bin (none), routine (no thorn)::(no routine)
>
>   in thorn Carpet, file
> /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:256:
>
>   -> Although OpenMP is enabled, the environment variable
> CACTUS_NUM_THREADS is not set.
>
> INFO (Carpet): There are 6 threads in total
>
> INFO (Carpet): There are 6 threads per process
>
> INFO (Carpet): This process runs on host node207, pid=29183
>
> INFO (Carpet): This process runs on 24 cores: 0-23
>
> INFO (Carpet): Thread 0 runs on 24 cores: 0-23
>
> INFO (Carpet): Thread 1 runs on 24 cores: 0-23
>
> INFO (Carpet): Thread 2 runs on 24 cores: 0-23
>
> INFO (Carpet): Thread 3 runs on 24 cores: 0-23
>
> INFO (Carpet): Thread 4 runs on 24 cores: 0-23
>
> INFO (Carpet): Thread 5 runs on 24 cores: 0-23
>
> On 5 October 2015 at 12:27, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
>
>>
>> On 5 Oct 2015, at 12:43, Geraint Pratten <g.pratten at sussex.ac.uk> wrote:
>>
>> Hi,
>>
>> I am trying to run a simulation with the same output directory - though
>> I've been removing the simulations after each run while trying to fix the
>> above issue. I should have write access to the file and there should be
>> sufficient storage.
>>
>>
>> Can you check the standard output of the run and see if the number of
>> processes and threads are what you expect?  You should see something like
>> "Carpet is running on N processes with M threads".  Maybe there is a
>> problem in MPI initialisation, and multiple processes are trying to write
>> to the same HDF5 file at the same time.
>>
>>
>>
>> Geraint
>>
>> On 5 October 2015 at 11:34, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
>>
>>>
>>> On 1 Oct 2015, at 15:49, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
>>>
>>> Geraint
>>>
>>> Some very fundamental HDF5 operations fail. This could be due to a file
>>> system problem; maybe you don't have write access to the file, or your disk
>>> is full. It could also be that there exists a corrupted HDF5 file, and you
>>> are trying to write to it (extend it).
>>>
>>> HDF5 files can get corrupted if you have opened it for writing, and then
>>> the application is interrupted or crashes, or you run out of disk space.
>>> Even if you make free space later, a corrupted file remains corrupted.
>>>
>>>
>>> Hi Geraint,
>>>
>>> Are you trying to run a simulation with output in the same directory as
>>> one that was run before?
>>>
>>>
>>> -erik
>>>
>>>
>>> On Thu, Oct 1, 2015 at 9:30 AM, Geraint Pratten <g.pratten at sussex.ac.uk>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I recently built the latest release (Hilbert) on the local cluster and
>>>> I've been getting a HDF5 error. I've tried building Cactus with both the
>>>> local HDF5 package on the cluster as well as letting Cactus build HDF5 from
>>>> scratch. Does anyone have any insight into the error I've attached below
>>>> (extract of a long chain of errors of the same form)?
>>>>
>>>> I can dump the output to file or give configuration files if needed.
>>>>
>>>> Thanks in advance!
>>>> Geraint
>>>>
>>>> ----------
>>>>
>>>> HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 0:
>>>>   #000: H5Ddeprec.c line 193 in H5Dcreate1(): unable to create dataset
>>>>     major: Dataset
>>>>     minor: Unable to initialize object
>>>>   #001: H5Dint.c line 453 in H5D__create_named(): unable to create and
>>>> link to dataset
>>>>     major: Dataset
>>>>     minor: Unable to initialize object
>>>>   #002: H5L.c line 1638 in H5L_link_object(): unable to create new link
>>>> to object
>>>>     major: Links
>>>>     minor: Unable to initialize object
>>>>   #003: H5L.c line 1882 in H5L_create_real(): can't insert link
>>>>     major: Symbol table
>>>>     minor: Unable to insert object
>>>>   #004: H5Gtraverse.c line 861 in H5G_traverse(): internal path
>>>> traversal failed
>>>>     major: Symbol table
>>>>     minor: Object not found
>>>>   #005: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal
>>>> operator failed
>>>>     major: Symbol table
>>>>     minor: Callback failed
>>>>   #006: H5L.c line 1674 in H5L_link_cb(): name already exists
>>>>     major: Symbol table
>>>>     minor: Object already exists
>>>> WARNING level 1 from host node207.cm.cluster process 0
>>>>   while executing schedule bin (none), routine (no thorn)::(no routine)
>>>>   in thorn CarpetIOHDF5, file
>>>> /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/CarpetIOHDF5/src/
>>>> Output.cc <http://output.cc/>:677:
>>>>   -> HDF5 call 'dataset = H5Dcreate (outfile,
>>>> datasetname.str().c_str(), filedatatype, dataspace, plist)' returned error
>>>> code -1
>>>> WARNING level 1 from host node207.cm.cluster process 0
>>>>   while executing schedule bin (none), routine (no thorn)::(no routine)
>>>>   in thorn CarpetIOHDF5, file
>>>> /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/CarpetIOHDF5/src/
>>>> Output.cc <http://output.cc/>:689:
>>>>   -> HDF5 call 'H5Dwrite (dataset, memdatatype, H5S_ALL, H5S_ALL,
>>>> H5P_DEFAULT, data)' returned error code -1
>>>> WARNING level 1 from host node207.cm.cluster process 0
>>>>   while executing schedule bin (none), routine (no thorn)::(no routine)
>>>>   in thorn CarpetIOHDF5, file
>>>> /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/CarpetIOHDF5/src/
>>>> Output.cc <http://output.cc/>:725:
>>>>   -> HDF5 call 'attr = H5Acreate (dataset, "level", H5T_NATIVE_INT,
>>>> dataspace, H5P_DEFAULT)' returned error code -1
>>>> WARNING level 1 from host node207.cm.cluster process 0
>>>>   while executing schedule bin (none), routine (no thorn)::(no routine)
>>>>   in thorn CarpetIOHDF5, file
>>>> /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/CarpetIOHDF5/src/
>>>> Output.cc <http://output.cc/>:726:
>>>>   -> HDF5 call 'H5Awrite (attr, H5T_NATIVE_INT, &refinementlevel)'
>>>> returned error code -1
>>>> WARNING level 1 from host node207.cm.cluster process 0
>>>>
>>>> --
>>>> Geraint Pratten
>>>> Postdoctoral Research Associate
>>>>
>>>> Mobile: +44(0) 7581709282
>>>> E-mail: G.Pratten at sussex.ac.uk
>>>> Skype: geraint.pratten
>>>>
>>>> School of Mathematical and Physical Sciences
>>>> Pevensey 3 Building
>>>> University of Sussex
>>>> Falmer Campus
>>>> Brighton
>>>> BN1 9QH
>>>> United Kingdom
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at einsteintoolkit.org
>>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>>
>>> --
>>> Erik Schnetter <schnetter at cct.lsu.edu>
>>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>> _______________________________________________
>>> Users mailing list
>>> Users at einsteintoolkit.org
>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>>
>>>
>>> --
>>> Ian Hinder
>>> http://members.aei.mpg.de/ianhin
>>>
>>>
>>
>>
>> --
>> Geraint Pratten
>> Postdoctoral Research Associate
>>
>> Mobile: +44(0) 7581709282
>> E-mail: G.Pratten at sussex.ac.uk
>> Skype: geraint.pratten
>>
>> School of Mathematical and Physical Sciences
>> Pevensey 3 Building
>> University of Sussex
>> Falmer Campus
>> Brighton
>> BN1 9QH
>> United Kingdom
>>
>>
>> --
>> Ian Hinder
>> http://members.aei.mpg.de/ianhin
>>
>>
>
>
> --
> Geraint Pratten
> Postdoctoral Research Associate
>
> Mobile: +44(0) 7581709282
> E-mail: G.Pratten at sussex.ac.uk
> Skype: geraint.pratten
>
> School of Mathematical and Physical Sciences
> Pevensey 3 Building
> University of Sussex
> Falmer Campus
> Brighton
> BN1 9QH
> United Kingdom
>



-- 
Geraint Pratten
Postdoctoral Research Associate

Mobile: +44(0) 7581709282
E-mail: G.Pratten at sussex.ac.uk
Skype: geraint.pratten

School of Mathematical and Physical Sciences
Pevensey 3 Building
University of Sussex
Falmer Campus
Brighton
BN1 9QH
United Kingdom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20151020/0953d958/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: apollo.run
Type: application/octet-stream
Size: 1309 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20151020/0953d958/attachment-0002.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: apollo.sub
Type: application/octet-stream
Size: 555 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20151020/0953d958/attachment-0003.obj 


More information about the Users mailing list