[Users] HDF5 Error, Hilbert

Geraint Pratten g.pratten at sussex.ac.uk
Tue Oct 20 06:53:39 CDT 2015


The option list​

On 20 October 2015 at 12:50, Ian Hinder <ian.hinder at aei.mpg.de> wrote:

>
> On 20 Oct 2015, at 13:44, Geraint Pratten <g.pratten at sussex.ac.uk> wrote:
>
> Hi,
>
> So I'm getting a problem passing the correct number of MPI processes
> through to Carpet. I'm not sure what I've messed up in the configurations,
> the error I am getting is:
>
> The environment variable CACTUS_NUM_PROCS is set to 4, but there are 1 MPI
> processes. This may indicate a severe problem with the MPI startup
> mechanism.
>
> ​I've att​ached the .run and .sub scripts that I use. Can anyone see
> anything obviously wrong? The local cluster uses the UNIVA Grid Engine for
> submission scripts.
>
>
> Can you also send the optionlist?  Maybe it's a mismatch between the MPI
> installation used at compile and run time?
>
>
>
> Thanks in advance!
> Geraint
>
>
> On 5 October 2015 at 12:34, Geraint Pratten <g.pratten at sussex.ac.uk>
> wrote:
>
>> Ah, I think you could be right! It seems that I've messed up the
>> submit/run scripts along the line! Relevant output attached below. I'll try
>> fixing this now.
>>
>> Thanks! Much appreciated!
>> Geraint
>>
>> INFO (Carpet): MPI is enabled
>> INFO (Carpet): Carpet is running on 1 processes
>> WARNING level 1 from host node207.cm.cluster process 0
>>   while executing schedule bin (none), routine (no thorn)::(no routine)
>>   in thorn Carpet, file
>> /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:226:
>>   -> The environment variable CACTUS_NUM_PROCS is set to 6, but there are
>> 1 MPI processes. This may indicate a severe problem with the MPI startup
>> mechanism.
>> INFO (Carpet): This is process 0
>> INFO (Carpet): OpenMP is enabled
>> INFO (Carpet): This process contains 6 threads, this is thread 0
>> WARNING level 2 from host node207.cm.cluster process 0
>>   while executing schedule bin (none), routine (no thorn)::(no routine)
>>   in thorn Carpet, file
>> /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:256:
>>   -> Although OpenMP is enabled, the environment variable
>> CACTUS_NUM_THREADS is not set.
>> INFO (Carpet): There are 6 threads in total
>> INFO (Carpet): There are 6 threads per process
>> INFO (Carpet): This process runs on host node207, pid=29183
>> INFO (Carpet): This process runs on 24 cores: 0-23
>> INFO (Carpet): Thread 0 runs on 24 cores: 0-23
>> INFO (Carpet): Thread 1 runs on 24 cores: 0-23
>> INFO (Carpet): Thread 2 runs on 24 cores: 0-23
>> INFO (Carpet): Thread 3 runs on 24 cores: 0-23
>> INFO (Carpet): Thread 4 runs on 24 cores: 0-23
>> INFO (Carpet): Thread 5 runs on 24 cores: 0-23
>>
>> On 5 October 2015 at 12:27, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
>>
>>>
>>> On 5 Oct 2015, at 12:43, Geraint Pratten <g.pratten at sussex.ac.uk> wrote:
>>>
>>> Hi,
>>>
>>> I am trying to run a simulation with the same output directory - though
>>> I've been removing the simulations after each run while trying to fix the
>>> above issue. I should have write access to the file and there should be
>>> sufficient storage.
>>>
>>>
>>> Can you check the standard output of the run and see if the number of
>>> processes and threads are what you expect?  You should see something like
>>> "Carpet is running on N processes with M threads".  Maybe there is a
>>> problem in MPI initialisation, and multiple processes are trying to write
>>> to the same HDF5 file at the same time.
>>>
>>>
>>>
>>> Geraint
>>>
>>> On 5 October 2015 at 11:34, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
>>>
>>>>
>>>> On 1 Oct 2015, at 15:49, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
>>>>
>>>> Geraint
>>>>
>>>> Some very fundamental HDF5 operations fail. This could be due to a file
>>>> system problem; maybe you don't have write access to the file, or your disk
>>>> is full. It could also be that there exists a corrupted HDF5 file, and you
>>>> are trying to write to it (extend it).
>>>>
>>>> HDF5 files can get corrupted if you have opened it for writing, and
>>>> then the application is interrupted or crashes, or you run out of disk
>>>> space. Even if you make free space later, a corrupted file remains
>>>> corrupted.
>>>>
>>>>
>>>> Hi Geraint,
>>>>
>>>> Are you trying to run a simulation with output in the same directory as
>>>> one that was run before?
>>>>
>>>>
>>>> -erik
>>>>
>>>>
>>>> On Thu, Oct 1, 2015 at 9:30 AM, Geraint Pratten <g.pratten at sussex.ac.uk
>>>> > wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I recently built the latest release (Hilbert) on the local cluster and
>>>>> I've been getting a HDF5 error. I've tried building Cactus with both the
>>>>> local HDF5 package on the cluster as well as letting Cactus build HDF5 from
>>>>> scratch. Does anyone have any insight into the error I've attached below
>>>>> (extract of a long chain of errors of the same form)?
>>>>>
>>>>> I can dump the output to file or give configuration files if needed.
>>>>>
>>>>> Thanks in advance!
>>>>> Geraint
>>>>>
>>>>> ----------
>>>>>
>>>>> HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 0:
>>>>>   #000: H5Ddeprec.c line 193 in H5Dcreate1(): unable to create dataset
>>>>>     major: Dataset
>>>>>     minor: Unable to initialize object
>>>>>   #001: H5Dint.c line 453 in H5D__create_named(): unable to create and
>>>>> link to dataset
>>>>>     major: Dataset
>>>>>     minor: Unable to initialize object
>>>>>   #002: H5L.c line 1638 in H5L_link_object(): unable to create new
>>>>> link to object
>>>>>     major: Links
>>>>>     minor: Unable to initialize object
>>>>>   #003: H5L.c line 1882 in H5L_create_real(): can't insert link
>>>>>     major: Symbol table
>>>>>     minor: Unable to insert object
>>>>>   #004: H5Gtraverse.c line 861 in H5G_traverse(): internal path
>>>>> traversal failed
>>>>>     major: Symbol table
>>>>>     minor: Object not found
>>>>>   #005: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal
>>>>> operator failed
>>>>>     major: Symbol table
>>>>>     minor: Callback failed
>>>>>   #006: H5L.c line 1674 in H5L_link_cb(): name already exists
>>>>>     major: Symbol table
>>>>>     minor: Object already exists
>>>>> WARNING level 1 from host node207.cm.cluster process 0
>>>>>   while executing schedule bin (none), routine (no thorn)::(no routine)
>>>>>   in thorn CarpetIOHDF5, file
>>>>> /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/CarpetIOHDF5/src/
>>>>> Output.cc <http://output.cc/>:677:
>>>>>   -> HDF5 call 'dataset = H5Dcreate (outfile,
>>>>> datasetname.str().c_str(), filedatatype, dataspace, plist)' returned error
>>>>> code -1
>>>>> WARNING level 1 from host node207.cm.cluster process 0
>>>>>   while executing schedule bin (none), routine (no thorn)::(no routine)
>>>>>   in thorn CarpetIOHDF5, file
>>>>> /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/CarpetIOHDF5/src/
>>>>> Output.cc <http://output.cc/>:689:
>>>>>   -> HDF5 call 'H5Dwrite (dataset, memdatatype, H5S_ALL, H5S_ALL,
>>>>> H5P_DEFAULT, data)' returned error code -1
>>>>> WARNING level 1 from host node207.cm.cluster process 0
>>>>>   while executing schedule bin (none), routine (no thorn)::(no routine)
>>>>>   in thorn CarpetIOHDF5, file
>>>>> /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/CarpetIOHDF5/src/
>>>>> Output.cc <http://output.cc/>:725:
>>>>>   -> HDF5 call 'attr = H5Acreate (dataset, "level", H5T_NATIVE_INT,
>>>>> dataspace, H5P_DEFAULT)' returned error code -1
>>>>> WARNING level 1 from host node207.cm.cluster process 0
>>>>>   while executing schedule bin (none), routine (no thorn)::(no routine)
>>>>>   in thorn CarpetIOHDF5, file
>>>>> /mnt/pact/gp234/NR/Cactus/arrangements/Carpet/CarpetIOHDF5/src/
>>>>> Output.cc <http://output.cc/>:726:
>>>>>   -> HDF5 call 'H5Awrite (attr, H5T_NATIVE_INT, &refinementlevel)'
>>>>> returned error code -1
>>>>> WARNING level 1 from host node207.cm.cluster process 0
>>>>>
>>>>> --
>>>>> Geraint Pratten
>>>>> Postdoctoral Research Associate
>>>>>
>>>>> Mobile: +44(0) 7581709282
>>>>> E-mail: G.Pratten at sussex.ac.uk
>>>>> Skype: geraint.pratten
>>>>>
>>>>> School of Mathematical and Physical Sciences
>>>>> Pevensey 3 Building
>>>>> University of Sussex
>>>>> Falmer Campus
>>>>> Brighton
>>>>> BN1 9QH
>>>>> United Kingdom
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at einsteintoolkit.org
>>>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Erik Schnetter <schnetter at cct.lsu.edu>
>>>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at einsteintoolkit.org
>>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>>>
>>>>
>>>> --
>>>> Ian Hinder
>>>> http://members.aei.mpg.de/ianhin
>>>>
>>>>
>>>
>>>
>>> --
>>> Geraint Pratten
>>> Postdoctoral Research Associate
>>>
>>> Mobile: +44(0) 7581709282
>>> E-mail: G.Pratten at sussex.ac.uk
>>> Skype: geraint.pratten
>>>
>>> School of Mathematical and Physical Sciences
>>> Pevensey 3 Building
>>> University of Sussex
>>> Falmer Campus
>>> Brighton
>>> BN1 9QH
>>> United Kingdom
>>>
>>>
>>> --
>>> Ian Hinder
>>> http://members.aei.mpg.de/ianhin
>>>
>>>
>>
>>
>> --
>> Geraint Pratten
>> Postdoctoral Research Associate
>>
>> Mobile: +44(0) 7581709282
>> E-mail: G.Pratten at sussex.ac.uk
>> Skype: geraint.pratten
>>
>> School of Mathematical and Physical Sciences
>> Pevensey 3 Building
>> University of Sussex
>> Falmer Campus
>> Brighton
>> BN1 9QH
>> United Kingdom
>>
>
>
>
> --
> Geraint Pratten
> Postdoctoral Research Associate
>
> Mobile: +44(0) 7581709282
> E-mail: G.Pratten at sussex.ac.uk
> Skype: geraint.pratten
>
> School of Mathematical and Physical Sciences
> Pevensey 3 Building
> University of Sussex
> Falmer Campus
> Brighton
> BN1 9QH
> United Kingdom
> <apollo.run><apollo.sub>
>
>
> --
> Ian Hinder
> http://members.aei.mpg.de/ianhin
>
>


-- 
Geraint Pratten
Postdoctoral Research Associate

Mobile: +44(0) 7581709282
E-mail: G.Pratten at sussex.ac.uk
Skype: geraint.pratten

School of Mathematical and Physical Sciences
Pevensey 3 Building
University of Sussex
Falmer Campus
Brighton
BN1 9QH
United Kingdom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20151020/9ba1fe2a/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: apollo.cfg
Type: application/octet-stream
Size: 3863 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20151020/9ba1fe2a/attachment-0001.obj 


More information about the Users mailing list