Memory allocation strategies: a growing stack-like (LIFO) allocator

Continuing from where we left of last time, I would like to discuss how we can build growing allocators using a virtual memory system. This post describes how to build a stack-like allocator that can automatically grow up to a given maximum size.

As explained in one of the last posts, virtual memory allows us to reserve address space without allocating any physical memory for it. We can use this to our benefit, and build an allocator that reserves address space up-front, and allocates physical memory whenever we need it. In that regard, the allocator behaves similar to the non-growing stack allocator, but is not restricted to working within a fixed region of memory.

Such an allocator is extremely useful for situations like e.g. loading level data, where the amount of memory needed varies between different levels. Using a growing allocator saves development time because there is no need to constantly change the upper bound of a fixed-size allocator whenever a certain level exceeds the previous worst-case memory usage.

An example

Consider a game where the maximum amount of memory that is spent on level-resident resources is defined to be 128 MB. What we want to have is an allocator that reserves address space for e.g. 256 MB (to be on the safe side), and allocates physical memory in 1 MB blocks whenever needed.

The basic steps of building such an allocator are simple:

  • Reserve address space for 256 MB.
  • Store the start and end of the reserved address space.
  • Store the start and end of the physical address space (=0 bytes in the beginning).
  • For each allocation:
    • Work out the required amount of physical memory.
    • If there is still enough physical memory left, bump the start of the physical address space (see above).
    • If there is not enough memory left, allocate 1 MB pages at the end of the current physical address space until the allocation fits.
    • Update the start and end of the physical address space.

In code this could look like the following:

GrowingStackAllocator::GrowingStackAllocator(unsigned int maxSizeInBytes, unsigned int growSize)
  : m_virtualStart(virtualMemory::ReserveAddressSpace(maxSizeInBytes))
  , m_virtualEnd(m_virtualStart + maxSizeInBytes)
  , m_physicalCurrent(m_virtualStart)
  , m_physicalEnd(m_virtualStart)
  , m_growSize(growSize)
{
}

In the constructor, we just reserve address space and store the start and end of it.

Allocating memory

void* GrowingStackAllocator::Allocate(size_t size, size_t alignment, size_t offset)
{
  const size_t allocationSize = size;

  // work out proper allocation sizes and possible offsets
  ...

  m_physicalCurrent = core::pointerUtil::AlignTop(m_physicalCurrent + offset, alignment) - offset;

  // is there enough physical memory left?
  if (m_physicalCurrent + size > m_physicalEnd)
  {
    // out of physical memory. check if there is still address space left from which new physical pages can be allocated.
    // remember that virtual memory must always be allocated in page-size chunks, that is why we round the needed size to
    // the next grow-size multiple.
    const size_t neededPhysicalSize = bitUtil::RoundUpToMultiple(size, m_growSize);
    if (m_physicalEnd + neededPhysicalSize > m_virtualEnd)
    {
      // the allocation doesn't fit into the address space, we're out of memory
      return nullptr;
    }

    // allocate new memory pages at the end of our currently allocated pages
    virtualMemory::AllocatePhysicalMemory(m_physicalEnd, neededPhysicalSize);
    m_physicalEnd += neededPhysicalSize;
  }

  // store book-keeping information if necessary
  ...

  // return allocation to user
  return m_physicalCurrent;
}

As described above, we allocate new pages whenever there is not enough physical memory left for the allocation to fit into. In that case, new pages are allocated at the end of the address range that already has physical memory committed to it. If there is not enough space left in the address space dedicated to this allocator, then we are out of memory.

The granularity with which the allocator grows is specified by the user in the constructor. It is a typical performance/memory-tradeoff regarding performance of allocations vs. wasted memory. If the allocator grows in 64 KB pages, it has to do lots of small allocations via the virtual memory system, but only wastes up to 64 KB, which is not much. In comparison, growing in 1 MB pages leads to far fewer allocations, but can waste up to 1 MB of memory that is backed by physical memory, but never used.

Freeing memory

One thing that is left to discuss is how the allocator behaves when freeing allocations. The primary thing to consider is whether the allocator should return physical memory pages whenever enough allocations have been freed. Depending on the allocation behaviour, that can be either good or bad.

When freeing level-resident resources, we mostly have several large allocations, and want to release physical memory as soon as possible. In situations where the allocator is used to make many small allocations, it could be more beneficial to keep physical memory that is committed to the address space. Otherwise, the allocator could suffer performance penalties whenever allocations straddle a page boundary, causing pages to be constantly allocated, freed, allocated again, and so on.

The way this is handled in Molecule is that whenever an allocations is freed, the allocator does not return physical memory to the OS. Instead, physical memory which is no longer
needed must be explicitly released by calling Purge(), which is an extra method offered by growing allocators. In case of the growing stack allocator, it simply checks which
pages are no longer needed, and returns them to the OS:

void GrowingStackAllocator::Purge(void)
{
  // we need to free all physical memory pages which are no longer needed by any allocation while making sure that we don't free the page
  // we're currently pointing to (remember that virtual memory only works in page-size granularity).
  // we do this by rounding the current physical memory pointer to the next grow-size boundary, and freeing all physical memory from there.
  char* addressToFree = pointerUtil::AlignTop(m_physicalCurrent, m_growSize);
  const unsigned int sizeToFree = safe_static_cast(m_physicalEnd - addressToFree);
  virtualMemory::FreePhysicalMemory(addressToFree, sizeToFree);

  m_physicalEnd = addressToFree;
}

Conclusion

Having growing allocators that make use of the virtual memory system allows us to specify a safe upper-bound for the amount of memory needed during development without wasting physical memory. Furthermore, knowing in which address range certain allocations lie can be a tremendous help when debugging.

About these ads

2 thoughts on “Memory allocation strategies: a growing stack-like (LIFO) allocator

  1. Very interesting, thanks for sharing. I’m just wondering how custom allocators can be helpful for resources which are stored in the memory managed by external systems, e.g. textures which are uploaded into GPU’s memory. Once loaded into the temporary memory allocated with our custom allocator the texture is then transferred to the GPU memory and we must deallocate this temporary memory in order to avoid memory duplication… I guess, in this case custom allocator only allows us to quickly allocate this temporary memory but there still can be performance hit when the external system allocates its own memory and copies data.

    • I think that depends on the type of resource, and the platform and API you’re working with.

      There are certain resources where it’s better to duplicate memory so you don’t pay huge performance penalties. An example would be a dynamic vertex buffer which is generated on the CPU and uploaded to a GPU vertex buffer each frame (using a ring-buffer scheme for maximum performance).

      On the PC, there’s generally the GPU driver which will take care of managing memory for GPU resources for us, so there’s not much we can do. On consoles however, you have to allocate memory for every vertex buffer, index buffer, texture, etc. yourself. There you have to pay attention to the correct alignment so the GPU doesn’t crash (varies between GPUs/APIs), map memory in order for the GPU to see it, allocate memory with the “correct” properties depending on usage (e.g. write-combined memory), and much more. It’s more work, but ultimately you are in complete control. This allows you to do things not possible on the PC, e.g. aliasing the same eDRAM addresses for different things because you know you don’t need them at the same time during rendering. An example would be allocating memory for a shadow map, using that during the main render pass, then at a later point using the same memory for a completely different rendertarget.

      Custom allocators surely come in handy in such situations!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s