[Users] logic of scheduling SelectBoundConds in McLachlan?

Tue Feb 19 13:16:48 CST 2013

On Tue, Feb 19, 2013 at 1:24 PM, Kelly, Bernard J. (GSFC-660.0)[UNIVERSITY
OF MARYLAND BALTIMORE COUNTY] <bernard.j.kelly at nasa.gov> wrote:

> Hi Ian (and Frank and Erik). Thanks for the further insight on the
> profiling.
>
>
> [Please ignore the new mail that just came through with the 400KB
> attachment. That was my first attempt that was held for moderation because
> of the attachment size. Then I sent the slimmed-down attachments, but this
> was still in the pipeline.]
>
> I was looking at *all* the processor outputs (that is, all the
> TimerReport_XXXXXX files), but not necessarily at all fields in all of
> them. I concentrated on the CCTK_EVOL section of the report, and then only
> looked closely at discrepancies between a sample "longer SelectBoundCond"
> processor and each of the five or six "shorter SelectBoundcond"
> processors. I suppose to do a more complete job, I'd have to start
> scripting ...
>
> Anyway, I *hadn't* been using those profiling parameters before, so my
> conclusions were probably dodgy as you say. After your reply I re-enabled
> them and restarted the run. Since it's so slow, I'm now looking at the
> TimerReports from earlier in the new run, and no longer see any
> discrepancies between different processors (that is, there don't seem to
> be any "shorter SelectBoundcond" processors any more).
>
> So if *all* the processors are showing essentially the same information,
> and the "schedule_barriers" and "sync_barriers" are in place, then there's
> no significant load imbalance? And yet it is slow as hell ...
>

With schedule barriers, load imbalance is hidden in these barriers. That
is, you would need to measure how much time each process spends in these
barriers. I expect that some processes will spend 0s there, while others
will spend 50,000s there. That would be your load imbalance.

-erik

>
> I'm now testing with the actual repository McLachlan instead.
>
> Bernard
>
> On 2/18/13 3:40 PM, "Ian Hinder" <ian.hinder at aei.mpg.de> wrote:
>
> >
> >On 18 Feb 2013, at 21:11, "Kelly, Bernard J. (GSFC-660.0)[UNIVERSITY OF
> >MARYLAND BALTIMORE COUNTY]" <bernard.j.kelly at nasa.gov> wrote:
> >
> >> [re-sent, with smaller attachment]
> >>
> >> Hi Roland, and thanks for your reply. I'm still a bit confused, I
> >>confess
> >> (see below) ...
> >
> >>
> >>
> >>>
> >>>> I wouldn't mind, but while trying to understand why ML_BSSN was
> >>>>evolving
> >>>> so slowly on one of our machines, I looked at the TimerReport files,
> >>>>and
> >>>> saw that SelectBoundConds was taking *much* more time (like 20 times
> >>>>as
> >>>> long) than the actual RHS calculation routines.
> >>> The long time is most likely caused by the fact that the boundary
> >>> selection routine tends to be the one calling SYNC which means it is
> >>>the
> >>> one that does an MPI wait (if there is load imbalance) and communicates
> >>> data for buffer zone prolongation etc.
> >>
> >> So it might be spending most of the time waiting for other cores to
> >>catch
> >> up?
> >
> >If you look at timer output just for one process, you will almost
> >certainly reach erroneous conclusions due to things like this.  I
> >recommend to look at the output on all processes (yes, performance
> >profiling is hard).
> >
> >> But if it's really waiting for prior routines to finish on other
> >> processors, then on the handful of cores where SBC appears significantly
> >> *quicker* than usual (e.g. ~50,000 seconds instead of ~100,000) I should
> >> see earlier routines taking correspondingly *longer*, right? But I
> >>don't.
> >
> >It may also be that timings change significantly from one iteration to
> >the next.  Have you set your CPU affinity settings correctly?
> >
> >I recommend to set the parameters
> >
> >Carpet::schedule_barriers = yes
> >Carpet::sync_barriers = yes
> >
> >This will insert an MPI barrier before and after each scheduled function
> >call and sync.  Then you can rely on the timings of the individual
> >functions, and also see how much time is spent waiting to catch up (i.e.
> >in load imbalance).  At the moment, the function timers for functions
> >which do communication will include time spent waiting for the other
> >process to catch up.
> >
> >> I'm attaching TimerReport files for two cores on the same (128-core)
> >> evolution. Core 000 is typical. Line 184 (the most up-to-date instance
> >>of
> >> "large" SBC behaviour) shows about 100K seconds spent cumulatively over
> >> the simulation so far. Core 052 shows only about half as much time used
> >>in
> >> the same routine, but I can't see what other EVOL routines might be
> >>taking
> >> up the slack.
> >>
> >> (Note, BTW, that what I'm running isn't vanilla ML_BSSN, but a locally
> >> modified version called MH_BSSN. The scheduling and most routines are
> >> almost identical to McLachlan)
> >>
> >> Bernard
> >>
> >>
> >>
> >>>
> >>> Yours,
> >>> Roland
> >>>
> >>> --
> >>> My email is as private as my paper mail. I therefore support encrypting
> >>> and signing email messages. Get my PGP key from http://keys.gnupg.net.
> >>>
> >>
> >>
> >><TimerReports_LATEST_BJK.tgz>____________________________________________
> >>___
> >> Users mailing list
> >> Users at einsteintoolkit.org
> >> http://lists.einsteintoolkit.org/mailman/listinfo/users
> >
> >--
> >Ian Hinder
> >http://numrel.aei.mpg.de/people/hinder
> >
>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>

-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20130219/4698e2bb/attachment-0001.html