<div dir="ltr">On Tue, Feb 19, 2013 at 1:24 PM, Kelly, Bernard J. (GSFC-660.0)[UNIVERSITY OF MARYLAND BALTIMORE COUNTY] <span dir="ltr"><<a href="mailto:bernard.j.kelly@nasa.gov" target="_blank">bernard.j.kelly@nasa.gov</a>></span> wrote:<br>
<div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Ian (and Frank and Erik). Thanks for the further insight on the<br>
profiling.<br>
<br>
<br>
[Please ignore the new mail that just came through with the 400KB<br>
attachment. That was my first attempt that was held for moderation because<br>
of the attachment size. Then I sent the slimmed-down attachments, but this<br>
was still in the pipeline.]<br>
<br>
I was looking at *all* the processor outputs (that is, all the<br>
TimerReport_XXXXXX files), but not necessarily at all fields in all of<br>
them. I concentrated on the CCTK_EVOL section of the report, and then only<br>
looked closely at discrepancies between a sample "longer SelectBoundCond"<br>
processor and each of the five or six "shorter SelectBoundcond"<br>
processors. I suppose to do a more complete job, I'd have to start<br>
scripting ...<br>
<br>
Anyway, I *hadn't* been using those profiling parameters before, so my<br>
conclusions were probably dodgy as you say. After your reply I re-enabled<br>
them and restarted the run. Since it's so slow, I'm now looking at the<br>
TimerReports from earlier in the new run, and no longer see any<br>
discrepancies between different processors (that is, there don't seem to<br>
be any "shorter SelectBoundcond" processors any more).<br>
<br>
So if *all* the processors are showing essentially the same information,<br>
and the "schedule_barriers" and "sync_barriers" are in place, then there's<br>
no significant load imbalance? And yet it is slow as hell ...<br></blockquote><div><br></div><div style>With schedule barriers, load imbalance is hidden in these barriers. That is, you would need to measure how much time each process spends in these barriers. I expect that some processes will spend 0s there, while others will spend 50,000s there. That would be your load imbalance.</div>
<div style><br></div><div style>-erik</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
I'm now testing with the actual repository McLachlan instead.<br>
<span class="HOEnZb"><font color="#888888"><br>
Bernard<br>
</font></span><div class="im HOEnZb"><br>
On 2/18/13 3:40 PM, "Ian Hinder" <<a href="mailto:ian.hinder@aei.mpg.de">ian.hinder@aei.mpg.de</a>> wrote:<br>
<br>
><br>
>On 18 Feb 2013, at 21:11, "Kelly, Bernard J. (GSFC-660.0)[UNIVERSITY OF<br>
>MARYLAND BALTIMORE COUNTY]" <<a href="mailto:bernard.j.kelly@nasa.gov">bernard.j.kelly@nasa.gov</a>> wrote:<br>
><br>
>> [re-sent, with smaller attachment]<br>
>><br>
>> Hi Roland, and thanks for your reply. I'm still a bit confused, I<br>
>>confess<br>
>> (see below) ...<br>
><br>
>><br>
>><br>
>>><br>
</div><div class="HOEnZb"><div class="h5">>>>> I wouldn't mind, but while trying to understand why ML_BSSN was<br>
>>>>evolving<br>
>>>> so slowly on one of our machines, I looked at the TimerReport files,<br>
>>>>and<br>
>>>> saw that SelectBoundConds was taking *much* more time (like 20 times<br>
>>>>as<br>
>>>> long) than the actual RHS calculation routines.<br>
>>> The long time is most likely caused by the fact that the boundary<br>
>>> selection routine tends to be the one calling SYNC which means it is<br>
>>>the<br>
>>> one that does an MPI wait (if there is load imbalance) and communicates<br>
>>> data for buffer zone prolongation etc.<br>
>><br>
>> So it might be spending most of the time waiting for other cores to<br>
>>catch<br>
>> up?<br>
><br>
>If you look at timer output just for one process, you will almost<br>
>certainly reach erroneous conclusions due to things like this. I<br>
>recommend to look at the output on all processes (yes, performance<br>
>profiling is hard).<br>
><br>
>> But if it's really waiting for prior routines to finish on other<br>
>> processors, then on the handful of cores where SBC appears significantly<br>
>> *quicker* than usual (e.g. ~50,000 seconds instead of ~100,000) I should<br>
>> see earlier routines taking correspondingly *longer*, right? But I<br>
>>don't.<br>
><br>
>It may also be that timings change significantly from one iteration to<br>
>the next. Have you set your CPU affinity settings correctly?<br>
><br>
>I recommend to set the parameters<br>
><br>
>Carpet::schedule_barriers = yes<br>
>Carpet::sync_barriers = yes<br>
><br>
>This will insert an MPI barrier before and after each scheduled function<br>
>call and sync. Then you can rely on the timings of the individual<br>
>functions, and also see how much time is spent waiting to catch up (i.e.<br>
>in load imbalance). At the moment, the function timers for functions<br>
>which do communication will include time spent waiting for the other<br>
>process to catch up.<br>
><br>
>> I'm attaching TimerReport files for two cores on the same (128-core)<br>
>> evolution. Core 000 is typical. Line 184 (the most up-to-date instance<br>
>>of<br>
>> "large" SBC behaviour) shows about 100K seconds spent cumulatively over<br>
>> the simulation so far. Core 052 shows only about half as much time used<br>
>>in<br>
>> the same routine, but I can't see what other EVOL routines might be<br>
>>taking<br>
>> up the slack.<br>
>><br>
>> (Note, BTW, that what I'm running isn't vanilla ML_BSSN, but a locally<br>
>> modified version called MH_BSSN. The scheduling and most routines are<br>
>> almost identical to McLachlan)<br>
>><br>
>> Bernard<br>
>><br>
>><br>
>><br>
>>><br>
>>> Yours,<br>
>>> Roland<br>
>>><br>
>>> --<br>
>>> My email is as private as my paper mail. I therefore support encrypting<br>
>>> and signing email messages. Get my PGP key from <a href="http://keys.gnupg.net" target="_blank">http://keys.gnupg.net</a>.<br>
>>><br>
>><br>
>><br>
>><TimerReports_LATEST_BJK.tgz>____________________________________________<br>
>>___<br>
>> Users mailing list<br>
>> <a href="mailto:Users@einsteintoolkit.org">Users@einsteintoolkit.org</a><br>
>> <a href="http://lists.einsteintoolkit.org/mailman/listinfo/users" target="_blank">http://lists.einsteintoolkit.org/mailman/listinfo/users</a><br>
><br>
>--<br>
>Ian Hinder<br>
><a href="http://numrel.aei.mpg.de/people/hinder" target="_blank">http://numrel.aei.mpg.de/people/hinder</a><br>
><br>
<br>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@einsteintoolkit.org">Users@einsteintoolkit.org</a><br>
<a href="http://lists.einsteintoolkit.org/mailman/listinfo/users" target="_blank">http://lists.einsteintoolkit.org/mailman/listinfo/users</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>Erik Schnetter <<a href="mailto:schnetter@cct.lsu.edu" target="_blank">schnetter@cct.lsu.edu</a>><br><a href="http://www.perimeterinstitute.ca/personal/eschnetter/" target="_blank">http://www.perimeterinstitute.ca/personal/eschnetter/</a>
</div></div>