[Users] meeting minutes for 2017-03-06

Ian Hinder ian.hinder at aei.mpg.de
Tue Mar 7 08:03:58 CST 2017


On 6 Mar 2017, at 16:45, Roland Haas <rhaas at illinois.edu> wrote:

> Present: Erik, Roland, Cory Chu, Steve, Bhavesh, 彭兆宏
> 
> We quickly discussed the failing tests, and given that the option list
> used
> (https://bitbucket.org/ianhinder/cactusjenkins/raw/d7021a52bd83448db589b2346c43441682eecabb/build.cfg)
> is not using -ffast-math or -mmarch=native the working assumption is
> that the occurrence of the failures is due to to an updated OS and
> newer compiler.

I seem to remember that this problem started when we moved the system from UCD to NCSA.  At this time, there was no change to the OS or the compiler.  The only thing that should have been visible was a change in CPU.  I might be misremembering.

Do we have tickets for these failures?  It is hard to keep track of what has been discussed without them.  It would be good if someone familiar with the failing codes would take "ownership" of this issue, create some tickets, and try to come up with a plan for tracking down the cause of the failures.  Having constantly-failing tests in Jenkins desensitises us to problems, and is not in general a good situation to be in.

We have failures in three thorns:

	CT_MultiLevel
	SphericalHarmonicReconGen
	GRHydro

> Erik asked if we had a docker container for the Jenkins
> test slave, which we were not sure (there is a docker image at
> https://bitbucket.org/ianhinder/et-jenkins-slave though it is not clear if this is the one used).

I would first see if it is easy to reproduce the failure on a system which you already have set up, since that is the easiest.

The docker container used is:

ianhinder/et-jenkins-slave:ubuntu-16.04

A new build slave can be created, assuming an existing installation of Ubuntu 16.04, by using the (almost trivial) scripts in the repository at

	https://bitbucket.org/ianhinder/ncsajenkins

The README gives the commands to run.  If you want to just run the container on an existing docker system, a simple

	docker run --name etslave ianhinder/et-jenkins-slave:ubuntu-16.04

should be sufficient.

If the problem is CPU-specific, then it will matter what system you run on.  The build machine reports:

$ cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 60
model name	: Intel Core Processor (Haswell)
stepping	: 1
microcode	: 0x1
cpu MHz		: 2499.996
cache size	: 4096 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs		:
bogomips	: 4999.99
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:


I have just tried to reproduce the failure at

https://build-test.barrywardell.net/job/EinsteinToolkit/lastCompletedBuild/testReport/(root)/GRHydro/GRHydro_test_shock_weno_1procs/

on a Ubuntu machine with a Kaby Lake processor, and it gives exactly the same "diffs" output, and the same failure, as on the build machine.  This is with Ubuntu 16.10 with ubuntu.cfg from simfactory.

--
Ian Hinder
http://members.aei.mpg.de/ianhin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 204 bytes
Desc: Message signed with OpenPGP using GPGMail
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20170307/b5c4883b/attachment.bin 


More information about the Users mailing list