Before we can delve into the inner workings of growing allocators, I would like to explain the concept of virtual memory and discuss what it is, why it is needed, and what we can use it for.
What is virtual memory?
Speaking in simple terms, virtual memory provides an extra indirection when accessing memory. It provides an abstraction of the virtual address space of a process, which lets each process think that it’s alone in the system. It lets us write programs without having to worry which other programs allocated memory, and without having to worry which physical memory we need to access. Even though virtual memory is often also mentioned in conjunction with paging to hard disk (=swapping), this is not what we are interested in!
Consider a system having 4 GB of RAM that consists of four physical 1 GB RAM units. We are able to allocate more than 1 GB of contiguous memory, even though there’s no actual physical memory unit that is larger than 1 GB. This works thanks to virtual memory.
Similarly, the allocations we make inside an application return virtual addresses which are valid inside our process. Such an address could be 0×80000000, and another process can also have allocations residing at 0×80000000, and yet everything works thanks to virtual memory.
Before touching actual physical memory, the virtual address needs to be translated. This virtual address to physical address translation is being taken care of by the MMU of the CPU. Modern CPUs also have a TLB which is used to speed up this translation.
Traditionally, this translation is done on a page-by-page basis. This means that on the OS-level, memory can only be allocated in so-called pages of a certain size. As an example, Windows 7 has a default page-size of 4 KB. Consequently, this also means that whenever you allocate memory directly from the OS, you can only allocate it with page-size granularity.
The details of address translation are actually quite involved, and here is a very good post describing this process for x86 and Cell architectures: Memory Address Translation.
As an example, allocating just 10 bytes of memory using VirtualAlloc (the low-level allocation function on Windows) will allocate a whole page, that is 4096 bytes. You can access all of the 4096 bytes without triggering an access violation, eventhough you only requested 10 bytes.
Of course, page sizes differ across platforms (consoles). Some platforms even offer more than just one page-size. The reason for this is that because the TLB normally is of limited size, increasing the page-size can lead to fewer TLB misses (similar to cache misses), resulting in an increase in performance. However, larger pages can also lead to more wasted memory if you’re not careful. Thus, it’s a typical space/time-tradeoff.
Normally, the OS memory allocator (e.g. malloc/free) takes care of allocating pages, coalescing nearby allocations into contiguous regions, putting several small allocations on the same page, etc. However, as soon as we want to implement our own general-purpose allocator or any other custom allocation scheme, we need to be aware of such details, and cannot use malloc/free for our purposes.
Furthermore, knowing about such low-level details enables us to use a wealth of new debugging techniques like using protected pages, guard pages, etc. As an example, some pages could be marked read-only in order to find memory stomps, race conditions (writes on shared data), and more. Guard pages serve as a one-shot alarm for memory page access, and are e.g. used for growing an application’s stack. Applications like PageHeap use those features for finding memory accesses beyond an allocation’s boundary.
MMU, TLB, pages, address translation, memory protection… that all sounds wonderful, but what can we do with it?
Because the OS clearly distinguishes between reserving address space (see MEM_RESERVE for the VirtualAlloc function) and allocating physical memory for address space (see MEM_COMMIT), we can build allocators that can grow to a specified upper limit, but only allocate the memory they actually need.
This is very, very useful when implementing growing allocators, because we can reserve a contiguous region of memory (=virtual address space), but only commit physical memory to it whenever we need it.
Virtual memory addressing is supported by all common desktop OSs (Windows, Linux, Mac), almost all consoles (cannot go into details because of NDAs) and even on mobiles like the iPhone. How different growing allocators can be built using virtual memory on those platforms will be the topic of the next posts!