curious case of 20GB address space reservation

Recently I have been tracking a case where some code from a third party library was fiddling up with address space of my application. Address space was very important back in Win32 days you had only 2GB (3GB for large address aware applications), but with x64 Windows 7 allows you to reference up to 8TB (that’s right 8192GB). Well, no one needs that much address space as of today- but a lot of times it is the first thing you look at when your application is running out of memory.

Such thing happened with me: due to some bug in a very thick algorithm, while loop never terminated and kept on allocating memory internally. It allocated so much memory that Windows started having problems with Low Memory Condition and finally had to quit the application. As for investigation I noticed that application reported it’s address space was ~82GB while Task Manager reported it’s Working Set to be ~48GB- what was puzzling was my application during the run had reserved ~34 GB address space for nothing.

The way you fiddle with address space in Windows is by calls to VirtualAlloc and VirtualAllocEx. After checking the source base for all the callers of VirtualAlloc I could not find a single caller that just did reserve- in fact all callers reserved and committed the request for addres space. This meant some other library was reserving the memory. How do you debug such cases? You set break point in VirtualAlloc function and check who else is allocating memory. The easiest way to do this is to fire up windbg (you can get it here) and attach it to your application such that it finds symbols (_NT_SYMBOL_PATH or .sympath). Once you have all that setup then symbols for kernel32.dll will be required- so as to set breakpoint at function VirtualAlloc. You can get the symbols for Windows here, prefer retail- checked are useful when debugging kernel.

Once you have the setup running, lets load symbols for kernel32.dll in windbg with following command:

.reload /f kernel32.dll

VirtualAlloc takes 4 parameters- with Windows x64 the first four parameters are passed in registers and rest are passed on stack. The sequence is rcx, rdx, r8 and r9. Second parameter is about the allocation size so it will be passed in rdx register. Easiest way is to set conditional breakpoint.

bp kernel32!VirtualAlloc ".if (@rdx>0x100000) {r; KB 25} .else {gc}"

Here we are checking for value in rdx being greater than 0x100000 (1MB), if this is true then prior to break it will print the contents of registers with command r and then traceback upto depth 25 with command KB 25. In case the memory requested is less than 1MB it will just continue- command gc.

run your application with command g and watch the fun.

For me this is what it turned out- initing Cuda launched Cuda profiler which reserved 20GB of address space for itself- not sure what it is going to do with it but it was fun to track down.

Below debugging information translates to:
VirtualAlloc(0x0000000200000000, 20GB, MEM_RESERVE, PAGE_NOACCESS)
VirtualAlloc(rcx, rdx, r8, r9)
----------------------------------------------------------------------

kernel32!VirtualAlloc:

00000000`776c5c68 ff25ba760800    jmp     qword ptr 
[kernel32!_imp_VirtualAlloc (00000000`7774d328)]
ds:00000000`7774d328={KERNELBASE!VirtualAlloc (000007fe`fd721900)}

ModLoad:000007fe`d1680000 000007fe`d23cc000 
C:\windows\system32\nvcuda.dll
ModLoad:000007fe`ec760000 000007fe`eca93000 
C:\windows\system32\nvapi64.dll

*** ERROR: Symbol file could not be found. 
Defaulted to export symbols for C:\windows\system32\nvcuda.dll -

rax=0000000700000000 rbx=0000000200000000 rcx=0000000200000000
rdx=0000000500000000 rsi=000000fff8000000 rdi=0000000500000000
rip=00000000776c5c68 rsp=000000000022c108 rbp=0000000100000000
r8=0000000000002000  r9=0000000000000001 r10=0000000000000000
r11=0000000000000246 r12=0000000000000000 r13=0000000000000002
r14=0000000100000000 r15=000000fff8000000

iopl=0         nv up ei ng nz na po cy
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             
efl=00000287

kernel32!VirtualAlloc:

00000000`776c5c68 ff25ba760800    jmp    
qword ptr [kernel32!_imp_VirtualAlloc (00000000`7774d328)]
ds:00000000`7774d328={KERNELBASE!VirtualAlloc (000007fe`fd721900)}

Child-SP          RetAddr           Call Site
00000000`0022c108 000007fe`d1ecd538 kernel32!VirtualAlloc
00000000`0022c110 000007fe`d16c7a33 nvcuda!cuProfilerStop+0x82fc18
00000000`0022c140 000007fe`d16c7c0d nvcuda!cuProfilerStop+0x2a113
00000000`0022c1a0 000007fe`d16c8595 nvcuda!cuProfilerStop+0x2a2ed
00000000`0022c1f0 000007fe`d169fc09 nvcuda!cuProfilerStop+0x2ac75
00000000`0022c230 000007fe`d169fd0a nvcuda!cuProfilerStop+0x22e9
00000000`0022c270 000007fe`d1681417 nvcuda!cuProfilerStop+0x23ea
00000000`0022c2a0 00000000`2e22337a nvcuda!cuInit+0x137

Reference links: