[Users] Err logs like "[83bdcfc58a2e:19025] Read -1, expected 4096, errno = 1" when running ET.

Haas, Roland rhaas at illinois.edu
Fri Sep 20 10:10:16 CDT 2019


Hello Zhichao,

I just noticed that the file

/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so

also contains the error string.

Since this file is the Vader file that produces the warnings that 

export OMPI_MCA_btl_vader_single_copy_mechanism="none"

avoids, setting that env variable should indeed fix the issue for you.

See https://github.com/open-mpi/ompi/issues/4948 for a description of the issue.

Yours,
Roland


> Hello Zhichao,
> 
> sorry for the long delay, I got started on other work and forgot about
> this email.
> 
> You should not need to install Ubuntu from scratch as Cactus should run
> in a docker container (Steve Brandt tests this before each release).
> You may want to try and set a couple of extra environment variables
> before you run the code:
> 
> export OMPI_MCA_btl_vader_single_copy_mechanism="none"
> 
> which at one point prevented (harmless) warnings for me when running in
> a (modern) Ubuntu container (old ones were fine). 
> 
> Can you tell me what image you are running inside of the container,
> please? I would like to try it out and see if this is what is causing
> problems for you.
> 
> It also turns out I could have already given you a better (though
> not really helpful) answer before.
> 
> Your errors are:
> 
> 83bdcfc58a2e:55616] Read -1, expected 31752, errno = 1
> 
> meaning a read failed (31752 bytes were supposed to be read but none
> were read) and the error number (errno) was 1.
> 
> Running this through the strerror function (part of C but also eg in
> the python posix.strerror code:
> 
> python -c 'import posix; print (posix.strerror(1))'
> 
> which returns "Operation not permitted".
> 
> Normally this would be a file that cannot be read from, though the only
> file your code reads is the LORENE data file for the initial data,
> which results in a different error when it cannot be opened, so I am
> still stumped.
> 
> Yours,
> Roland
> 
> > Thank you Dr. Roland,
> > 
> > In fact the parfile I used is NsNsToHMNS.par
> > <https://einsteintoolkit.org/gallery/bns/nsnstohmns.par>(click to open
> > hyperlink). I just changed the Initial Data file.
> > However, when I trying to use another parfiles, for example,
> > BBHHighRes.par, the error also occurs.
> > 
> > 83bdcfc58a2e is the identification of this machine (this node), so I guess
> > it is something wrong with Docker ( I run ET in a docker ) ?
> > 
> > I have upload an err file generated by simfactory(located in
> > ~/simfactory/bbh/output-0000/bbh.err).
> > 
> > Maybe I should reinstall my OS to ubuntu and then run ET without docker ?
> > 
> > Thanks again for your help.
> > 
> > Yours,
> > Zhichao.
> > 
> > 
> > 
> > Haas, Roland <rhaas at illinois.edu> 于2019年9月10日周二 下午8:14写道:
> >   
> > > Hello ZhiChao,
> > >    
> > > > > + date +%s
> > > > > + export CACTUS_STARTTIME=1567761511
> > > > > + [ 14 = 1 ]
> > > > > + mpirun -np 14    
> > > /home/zhaozc/simulations/bnstest/SIMFACTORY/exe/cactus_sim    
> > > > > -L 3 /home/zhaozc/simulations/bnstest  /output-0000/bnstest.par
> > > > > [83bdcfc58a2e:19025] Read -1, expected 4096, errno = 1
> > > > > [83bdcfc58a2e:19027] Read -1, expected 4096, errno = 1
> > > > > [83bdcfc58a2e:19029] Read -1, expected 4096, errno = 1
> > > > > [83bdcfc58a2e:19024] Read -1, expected 4096, errno = 1
> > > > > [83bdcfc58a2e:19026] Read -1, expected 4096, errno = 1
> > > > > [83bdcfc58a2e:19028] Read -1, expected 4096, errno = 1
> > > > > [83bdcfc58a2e:19032] Read -1, expected 4096, errno = 1
> > > > > [83bdcfc58a2e:19034] Read -1, expected 4096, errno = 1
> > > > > [83bdcfc58a2e:19036] Read -1, expected 4096, errno = 1
> > > > > [83bdcfc58a2e:19030] Read -1, expected 4096, errno = 1
> > > > > [83bdcfc58a2e:19023] Read -1, expected 4096, errno = 1    
> > > >
> > > > ......
> > > >
> > > >
> > > > but  those errors  will not terminate the task.
> > > > Is it safe to ignore those errors?    
> > > Would you mind providing the parfile bnstest.par that you are running,
> > > please? It does not seem to exist in the Einstein Toolkit repos and
> > > without it all we can do is guess what could be happening.
> > >
> > > The string "Read.*expected" also does not seem to show up anywhere in
> > > my Cactus tree (or any of my executables for that matter), so that I am
> > > not sure at all which thorn or module is even generating this error.
> > >
> > > The string does however show up in libgfortran.so so that I would guess
> > > that it is some Fortran code not checking if it could open a file
> > > correctly.
> > >
> > > You can try running what looks like an address: 83bdcfc58a2e through
> > > the addr2line command:
> > >
> > > addr2line -e /home/zhaozc/simulations/bnstest/SIMFACTORY/exe/cactus_sim
> > > 0x83bdcfc58a2e
> > >
> > > or use gdb:
> > >
> > > gdb /home/zhaozc/simulations/bnstest/SIMFACTORY/exe/cactus_sim
> > > info line *0x83bdcfc58a2e
> > >
> > > to try and find out which thorn was responsible, though I cannot
> > > guarantee that this will work.
> > >
> > > A blind guess would be that some thorn fails to open a file, does not
> > > properly check for this and then find that it cannot read anything from
> > > that (not opened file) in which case eg the read() system call returns
> > > -1.
> > >
> > > Yours,
> > > Roland
> > >
> > > --
> > > My email is as private as my paper mail. I therefore support encrypting
> > > and signing email messages. Get my PGP key from http://pgp.mit.edu .
> > >    
> > 
> >   
> 
> 
> 



-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20190920/191b561a/attachment-0001.bin 


More information about the Users mailing list