[Users] memory leak in Carpet?

Miguel Zilhão miguel.zilhao.nogueira at tecnico.ulisboa.pt
Thu Jul 19 05:58:06 CDT 2018


hi all,

i've noticed that my runs (using latest ET release) with CarpetRegrid2 exhibit a significant 
increase in memory during runtime. this seems to happen immediately after some non-trivial 
regridding operation is done. the increase is steady, and at some point i run out of memory and the 
simulation crashes. this is happening both on my workstation (running Ubuntu 18.04) as well as our 
local cluster (running Debian 9). i was wondering if someone has seen something like this?

i have not seen this happen for simulations without CarpetRegrid2. i show below some relevant 
portions of the stdout file for a standard inspiral BH run (note the last column--maxrss_mb):

------------------------------------------------------------------------------------
Iteration      Time | *me_per_hour |     LEANBSSNMOL::conf_fac | *TISTICS::maxrss_mb
                     |              |      minimum      maximum |   minimum   maximum
------------------------------------------------------------------------------------
         0     0.000 |    0.0000000 |    0.2213400    0.9977828 |      1057      1359
         4     0.025 |    1.8911444 |    0.2213388    0.9977828 |      1060      1361
         8     0.050 |    2.8049414 |    0.2213330    0.9977828 |      1060      1361
        12     0.075 |    3.2859229 |    0.2213195    0.9977828 |      1060      1361
        16     0.100 |    3.6219375 |    0.2212959    0.9977828 |      1061      1361
        20     0.125 |    3.7521230 |    0.2212596    0.9977828 |      1064      1361
        24     0.150 |    3.9448186 |    0.2212081    0.9977828 |      1064      1361
        28     0.175 |    4.0652624 |    0.2211358    0.9977828 |      1064      1361
        32     0.200 |    4.1492619 |    0.2210378    0.9977828 |      1064      1361
        36     0.225 |    4.1272708 |    0.2209119    0.9977828 |      1064      1361
        40     0.250 |    4.2206476 |    0.2207447    0.9977828 |      1064      1361
        44     0.275 |    4.2801401 |    0.2205343    0.9977828 |      1064      1361
        48     0.300 |    4.3460198 |    0.2202766    0.9977828 |      1064      1361
        52     0.325 |    4.3485616 |    0.2199651    0.9977828 |      1064      1361
        56     0.350 |    4.4054911 |    0.2195938    0.9977828 |      1065      1361
        60     0.375 |    4.4398502 |    0.2191577    0.9977828 |      1065      1361
        64     0.400 |    4.3724216 |    0.2186524    0.9977828 |      1065      1361

(...)

INFO (QuasiLocalMeasures):    Weinberg angular momentum z:     0.405200
       384     2.400 |    4.2719649 |    0.3525426    0.9977828 |      1068      1379
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 1
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 1
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 1
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 1
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 1
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 1
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 1
INFO (Carpet): Grid structure (superregions, grid points):
    [0][0][0]   exterior: [0,0,0] : [165,324,165]   ([166,325,166] + PADDING) 8955700
    [1][0][0]   exterior: [3,155,3] : [179,493,175]   ([177,339,173] + PADDING) 10380519
    [2][0][0]   exterior: [9,558,9] : [108,738,101]   ([100,181,93] + PADDING) 1683300
    [3][0][0]   exterior: [21,1206,21] : [128,1386,113]   ([108,181,93] + PADDING) 1817964
    [4][0][0]   exterior: [45,2521,45] : [147,2663,117]   ([103,143,73] + PADDING) 1075217
    [5][0][0]   exterior: [93,5111,93] : [225,5257,165]   ([133,147,73] + PADDING) 1427223
    [6][0][0]   exterior: [242,10307,189] : [380,10445,261]   ([139,139,73] + PADDING) 1410433
    [7][0][0]   exterior: [554,20684,381] : [692,20822,453]   ([139,139,73] + PADDING) 1410433
INFO (Carpet): Grid structure (superregions, coordinates):
    [0][0][0]   exterior: [-4.800000000000001,-259.199999999999989,-4.800000000000001] : 
[259.199999999999989,259.199999999999989,259.199999999999989] : 
[1.600000000000000,1.600000000000000,1.600000000000000]
    [1][0][0]   exterior: [-2.400000000000000,-135.199999999999989,-2.400000000000000] : 
[138.400000000000006,135.199999999999989,135.199999999999989] : 
[0.800000000000000,0.800000000000000,0.800000000000000]
    [2][0][0]   exterior: [-1.200000000000001,-36.000000000000000,-1.200000000000001] : 
[38.400000000000006,36.000000000000000,35.600000000000009] : 
[0.400000000000000,0.400000000000000,0.400000000000000]
    [3][0][0]   exterior: [-0.600000000000001,-18.000000000000000,-0.600000000000001] : 
[20.800000000000001,18.000000000000000,17.800000000000001] : 
[0.200000000000000,0.200000000000000,0.200000000000000]
    [4][0][0]   exterior: [-0.300000000000001,-7.100000000000023,-0.300000000000001] : 
[9.900000000000000,7.099999999999966,6.900000000000000] : 
[0.100000000000000,0.100000000000000,0.100000000000000]
    [5][0][0]   exterior: [-0.150000000000000,-3.650000000000006,-0.150000000000000] : 
[6.449999999999999,3.649999999999977,3.449999999999999] : 
[0.050000000000000,0.050000000000000,0.050000000000000]
    [6][0][0]   exterior: [1.250000000000000,-1.525000000000034,-0.075000000000000] : 
[4.699999999999999,1.925000000000011,1.725000000000000] : 
[0.025000000000000,0.025000000000000,0.025000000000000]
    [7][0][0]   exterior: [2.125000000000000,-0.650000000000034,-0.037500000000001] : 
[3.850000000000000,1.074999999999989,0.862500000000000] : 
[0.012500000000000,0.012500000000000,0.012500000000000]
INFO (Carpet): Global grid structure statistics:
INFO (Carpet): GF: rhs: 2551k active, 3661k owned (+44%), 6190k total (+69%), 320 steps/time
INFO (Carpet): GF: vars: 161, pts: 3631M active, 4271M owned (+18%), 6261M total (+47%), 1.0 comp/proc
INFO (Carpet): GA: vars: 1044, pts: 82M active, 82M total (+0%)
INFO (Carpet): Total required memory: 50.821 GByte (for GAs and currently active GFs)
INFO (Carpet): Load balance:  min     avg     max     sdv     max/avg-1
INFO (Carpet): Level  0:      25M     27M     31M      1M owned     12%
INFO (Carpet): Level  1:      31M     34M     36M      1M owned      6%
INFO (Carpet): Level  2:       5M      5M      6M      0M owned      8%
INFO (Carpet): Level  3:       5M      6M      6M      0M owned      8%
INFO (Carpet): Level  4:       3M      3M      4M      0M owned      8%
INFO (Carpet): Level  5:       4M      4M      5M      0M owned      9%
INFO (Carpet): Level  6:       4M      5M      5M      0M owned      4%
INFO (Carpet): Level  7:       4M      5M      5M      0M owned      4%
       388     2.425 |    4.1930426 |    0.3508349    0.9977828 |      1149      1450
       392     2.450 |    4.2022566 |    0.3491189    0.9977828 |      1149      1450
       396     2.475 |    4.2091891 |    0.3471720    0.9977828 |      1149      1450
------------------------------------------------------------------------------------
Iteration      Time | *me_per_hour |     LEANBSSNMOL::conf_fac | *TISTICS::maxrss_mb
                     |              |      minimum      maximum |   minimum   maximum
------------------------------------------------------------------------------------
       400     2.500 |    4.2177082 |    0.3450791    0.9977828 |      1149      1450
       404     2.525 |    4.2195973 |    0.3429667    0.9977828 |      1149      1450


(...)

      1344     8.400 |    4.4943523 |    0.3648129    0.9977828 |      1156      1455
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 1
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 1
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 1
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 1
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 1
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 1
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 0
INFO (CarpetRegrid2): Enforcing grid structure properties, iteration 1
INFO (Carpet): Grid structure (superregions, grid points):
    [0][0][0]   exterior: [0,0,0] : [165,324,165]   ([166,325,166] + PADDING) 8955700
    [1][0][0]   exterior: [3,154,3] : [178,494,175]   ([176,341,173] + PADDING) 10382768
    [2][0][0]   exterior: [9,556,9] : [108,740,101]   ([100,185,93] + PADDING) 1720500
    [3][0][0]   exterior: [21,1202,21] : [127,1390,113]   ([107,189,93] + PADDING) 1880739
    [4][0][0]   exterior: [45,2512,45] : [145,2672,117]   ([101,161,73] + PADDING) 1187053
    [5][0][0]   exterior: [93,5094,93] : [110,5135,165]   ([18,42,73] + PADDING) 55188
    [5][0][1]   exterior: [93,5136,93] : [220,5274,165]   ([128,139,73] + PADDING) 1298816
    [6][0][0]   exterior: [234,10342,189] : [372,10480,261]   ([139,139,73] + PADDING) 1410433
    [7][0][0]   exterior: [537,20752,381] : [675,20890,453]   ([139,139,73] + PADDING) 1410433
INFO (Carpet): Grid structure (superregions, coordinates):
    [0][0][0]   exterior: [-4.800000000000001,-259.199999999999989,-4.800000000000001] : 
[259.199999999999989,259.199999999999989,259.199999999999989] : 
[1.600000000000000,1.600000000000000,1.600000000000000]
    [1][0][0]   exterior: [-2.400000000000000,-136.000000000000000,-2.400000000000000] : 
[137.599999999999994,136.000000000000000,135.199999999999989] : 
[0.800000000000000,0.800000000000000,0.800000000000000]
    [2][0][0]   exterior: [-1.200000000000001,-36.800000000000011,-1.200000000000001] : 
[38.400000000000006,36.800000000000011,35.600000000000009] : 
[0.400000000000000,0.400000000000000,0.400000000000000]
    [3][0][0]   exterior: [-0.600000000000001,-18.800000000000011,-0.600000000000001] : 
[20.600000000000001,18.800000000000011,17.800000000000001] : 
[0.200000000000000,0.200000000000000,0.200000000000000]
    [4][0][0]   exterior: [-0.300000000000001,-8.000000000000000,-0.300000000000001] : 
[9.699999999999999,8.000000000000000,6.900000000000000] : 
[0.100000000000000,0.100000000000000,0.100000000000000]
    [5][0][0]   exterior: [-0.150000000000000,-4.500000000000000,-0.150000000000000] : 
[0.699999999999999,-2.449999999999989,3.449999999999999] : 
[0.050000000000000,0.050000000000000,0.050000000000000]
    [5][0][1]   exterior: [-0.150000000000000,-2.400000000000034,-0.150000000000000] : 
[6.199999999999999,4.500000000000000,3.449999999999999] : 
[0.050000000000000,0.050000000000000,0.050000000000000]
    [6][0][0]   exterior: [1.050000000000000,-0.650000000000034,-0.075000000000000] : 
[4.500000000000000,2.800000000000011,1.725000000000000] : 
[0.025000000000000,0.025000000000000,0.025000000000000]
    [7][0][0]   exterior: [1.912500000000000,0.199999999999989,-0.037500000000001] : 
[3.637499999999999,1.925000000000011,0.862500000000000] : 
[0.012500000000000,0.012500000000000,0.012500000000000]
INFO (Carpet): Global grid structure statistics:
INFO (Carpet): GF: rhs: 2544k active, 3659k owned (+44%), 6187k total (+69%), 320 steps/time
INFO (Carpet): GF: vars: 161, pts: 3644M active, 4290M owned (+18%), 6288M total (+47%), 1.0 comp/proc
INFO (Carpet): GA: vars: 1044, pts: 82M active, 82M total (+0%)
INFO (Carpet): Total required memory: 51.040 GByte (for GAs and currently active GFs)
INFO (Carpet): Load balance:  min     avg     max     sdv     max/avg-1
INFO (Carpet): Level  0:      25M     27M     31M      1M owned     12%
INFO (Carpet): Level  1:      31M     34M     35M      1M owned      4%
INFO (Carpet): Level  2:       5M      5M      6M      0M owned      6%
INFO (Carpet): Level  3:       5M      6M      6M      0M owned      8%
INFO (Carpet): Level  4:       3M      4M      4M      0M owned     10%
INFO (Carpet): Level  5:       3M      4M      5M      0M owned     11%
INFO (Carpet): Level  6:       4M      5M      5M      0M owned      4%
INFO (Carpet): Level  7:       4M      5M      5M      0M owned      4%
      1348     8.425 |    4.4749366 |    0.3620783    0.9977828 |      1417      1726
      1352     8.450 |    4.4793418 |    0.3593239    0.9977828 |      1417      1727
      1356     8.475 |    4.4831132 |    0.3565493    0.9977828 |      1417      1727
------------------------------------------------------------------------------------
Iteration      Time | *me_per_hour |     LEANBSSNMOL::conf_fac | *TISTICS::maxrss_mb
                     |              |      minimum      maximum |   minimum   maximum
------------------------------------------------------------------------------------
      1360     8.500 |    4.4873776 |    0.3537546    0.9977828 |      1417      1727
      1364     8.525 |    4.4896835 |    0.3509394    0.9977828 |      1417      1727


and the pattern continues such that at iteration 9000 we're at maximum(maxrss_mb) = 2722... is this 
normal, or expected? it becomes very inconvenient since, for some high-resolutions runs, i 
inevitably run out of memory.

thanks,
Miguel


More information about the Users mailing list