Memory allocation strategies: a pool allocator

As promised last time, today we will see how pool allocators can help with allocating/freeing allocations of a certain size, in any order, in O(1) time.

Use cases

Pool allocators are extremely helpful for allocating/freeing objects of a certain size which often have to be created/destroyed dynamically, such as weapon bullets, entity instances, rigid bodies, etc.

Most of those objects are created/destroyed in completely random order, due to their dynamic nature. Therefore, it is desirable to be able to allocate/free memory with as little fragmentation as possible. Pool allocators are extremely well suited for that.

How does it work?

Simply put, a pool allocator allocates a chunk of memory once, and divides that memory into slots/bins/pools which fit exactly M instances of size N. As an example, consider we want to have a maximum of 256 bullets in flight at the same time, each bullet having a size of 32 bytes. Thus, the pool allocator would allocate 256*32 = 8192 bytes once, dividing it into slots which are then used for allocating/freeing objects of size 32.

But how are those allocations made? How can we guarantee O(1) time? How can allocations be made in any order, without fragmentation?

Freelists

The answer to all of the above boils down to the use of what is called a free list. Free lists internally store a linked-list of free slots inside the allocated memory. Storing them inplace is crucial – there is no std::vector, std::list, or similar that keeps track of free slots. It is all stored inside the pre-allocated pool of memory.

The way it is usually done is the following: each slot (32 bytes in our example) in the pool of memory is connected to the next slot simply by storing the pointer to the next slot in the first few bytes of the slot.

Assuming our pool of memory sits at location 0×0 in memory, the layout would be something like the following:

         +---------+---------+---------+---------+
         | 0x20    | 0x40    | 0x60    | nullptr |
         +---------+---------+---------+---------+

         ^         ^         ^         ^
         |         |         |         |
address: 0x0       0x20      0x40      0x60

The blocks denote the slots in memory, the bottom row shows the address in memory. As can be seen, the memory at 0×0 would contain a pointer to 0×20, which would contain a pointer to 0×40, and so on. We have just formed an intrusive linked-list in our memory pool.

There is one thing to note here: as long as slots are free, we can store anything we want inside those 32 bytes. When a slot is in use, we don’t need to store anything, because that slot is occupied anyway and so no longer part of our free list. All we have to do is remove a slot from the free list whenever it is allocated, and add it to the linked-list again whenever it is freed.

Implementation

Let us take a look at the inner workings of a free list:

class Freelist
{
public:
  Freelist(void* start, void* end, size_t elementSize, size_t alignment, size_t offset);

  inline void* Obtain(void);

  inline void Return(void* ptr);

private:
  Freelist* m_next;
};

The only member we need to store is a pointer to the free list, which simply acts as an alias in memory, and stores a pointer to a currently free slot in our memory pool.

Leaving alignment and offset requirements out of the equation for now, initializing a free list is quite simple:

union
{
  void* as_void;
  char* as_char;
  Freelist* as_self;
};

// assume as_self points to the first entry in the free list
m_next = as_self;
as_char += elementSize;

// initialize the free list - make every m_next of each element point to the next element in the list
Freelist* runner = m_next;
for (size_t i=1; i<numElements; ++i)
{
  runner->m_next = as_self;
  runner = as_self;
  as_char += elementSize;
}

runner->m_next = nullptr;

With the intrusive linked-list in place, allocating/freeing memory really becomes an O(1) operation, and is just ordinary linked-list manipulation code:

inline void* Freelist::Obtain(void)
{
  // is there an entry left?
  if (m_next == nullptr)
  {
    // we are out of entries
    return nullptr;
  }

  // obtain one element from the head of the free list
  Freelist* head = m_next;
  m_next = head->m_next;
  return head;
}

inline void Freelist::Return(void* ptr)
{
  // put the returned element at the head of the free list
  Freelist* head = static_cast<Freelist*>(ptr);
  head->m_next = m_next;
  m_next = head;
}

I moved the free list implementation to its own class so it can be used by both the non-growing and growing variant of the pool allocator. Furthermore, it is handy for other things, too.

Alignment requirements

In order to satisfy alignment requirements, we need to offset our slots into the memory pool once, so all slots can satisfy the same offset and alignment requirements. The disadvantage of this approach is that a pool allocator can never satisfy more than one offset requirement, but such cases should be very seldom.

The way it is done in Molecule is that users have to provide their maximum object size and maximum alignment requirement when constructing the pool allocator. The allocator can then satisfy all alignments <= maximumAlignment and object sizes <= maximumSize, and simply asserts in all other cases. This at least allows the user to allocate objects of different sizes (such as e.g. 4, 8, 12, 16) out of the same pool, with different alignments (such as e.g. 4, 8, 16, 32), if desired.

Control is in the user’s hands, so it is up to the user to either use different pools for different allocations (often recommended), or live with some wasted memory inside the memory pool.

Usage

Usage is simple. The following shows a free list which is able to satisfy allocations up to a size of 32 bytes and an alignment of 8 bytes. Note that the free list takes a range of memory in which it initializes itself. This ensures that free lists can be used for allocations on the stack, on the heap, and by different allocators (non-growing and growing).

ME_ALIGNED_SYMBOL(char memory[1024], 8) = {};

core::Freelist freelist(memory, memory+1024, 32, 8, 0);

// allocates a slot of 32 bytes, aligned to an 8-byte boundary
void* object0 = freelist.Obtain();

// allocates another slot of 32 bytes, aligned to an 8-byte boundary
void* object1 = freelist.Obtain();

// obtained slots can be returned in any order
freelist.Return(object1);
freelist.Return(object0);

The pool allocator described in this post simply has a Freelist instance as member, and forwards all Allocate() calls to freelist.Obtain(), and all Free() calls to freelist.Return(). Additionally, it asserts that allocation sizes and alignment requests fit the initial maximum sizes provided by the user.

Outlook

The pool allocator was the last remaining non-growing allocator we haven’t discussed yet. Starting with the next post, we will take a look at how to implement growing allocators by means of virtual memory and allocating physical pages from the OS.

About these ads

13 thoughts on “Memory allocation strategies: a pool allocator

  1. Thanks for sharing! I’m just wondering where you are storing book keeping info in the described case when the user is allowed to allocate objects with different sizes and different alignments.

    • Short answer: I don’t.

      Long answer: With the given maximum alignment and maximum size, I first determine the minimum slot size, and align the first slot accordingly. E.g., given a maximum size of 24 bytes and an alignment of 32 bytes, each slot would become 32 bytes in size.

      Whenever an allocation is made, I can simply assert(allocSize <= maxSize) and assert(allocAlignment <= maxAlignment). By definition this works because if memory is aligned to N bytes, it is also aligned to all powers-of-two smaller than N. Similarly, if a user wants to allocate e.g. 20 bytes out of the above-mentioned pool, he will actually get a 24-byte slot, but doesn't know about that.

      Of course that creates some wasted memory which cannot be reclaimed by other allocators, but I find it to be very convenient, and the user has complete freedom over when to use which pools. Alternatively, one could be very strict, and only allow allocations and alignments of the same size. This would ensure that no memory is wasted.

      • “By definition this works because if memory is aligned to N bytes, it is also aligned to all powers-of-two smaller than N.”

        Oh, this is what actually slipped my mind. Thank you very much for clarification.

  2. Great post! Thanks for sharing!

    I have actually two questions:
    1) How would I make growable pool? The straight forward solution is obviously returning indices instead of pointers. But I wonder if I can have several free lists that I link together?
    2) Is there an easy way to iterate the free list and skip unused elements? I am thinking about using the free list as an “array with holes”, but if I use a templated implementation I cannot easily put some magic value into the unused elements. Do you have any suggestions for this use case or do you recommend something else?

    • Thanks Dirk!

      Regarding your first question, there’s two possibilities:
      1) Request new chunks of memory (=pages) from the OS whenever the pool is full. These chunks store book-keeping information, and are held in a doubly-linked list so they can be easily added/removed.

      So whenever the pool is full, you allocate a chunk, initialize its header and a freelist in that chunk. Chunks that are full can be placed at the back of the list, so that you only need to check the freelist of *one* chunk to see if a slot is available from the freelist. That still keeps Allocate() and Free() at O(1). Purging is O(n) because you need to walk the list of chunks/pages to see which one can be returned to the OS.

      2) An easier solution is to reserve virtual address space up-front, and only initialize a freelist as you would with a non-growing pool. Now whenever a pool is full, you just request physical pages from the OS for a part of the virtual address space you reserved. This means that all freelists live right next to each other, and can simply be initialized without having to keep a doubly-linked list of pages.

      As an example, you could reserve address space for 64 MB, request physical memory for 32 KB, and initialize the freelist. As soon as the pool is full, request another 32 KB of physical memory from the OS, and initialize a freelist there.
      Note that there is no need to “connect” these two freelists using a linked list or anything like that.

      With this implementation, you have to specify an upper-limit for how large the pool can grow. Some consider this a bad thing, I consider it a good thing because it forces people to think about how much memory they are going to need. And because it’s only virtual address space you reserve (not allocate!) you can always specify some ridiculously high value, if you want to. Especially on 64-bit.
      Because virtual memory (or at least reserving address space, backing it with physical pages later) is available on all console platforms (and even on mobiles like the iPhone), this is the way it’s done in Molecule.

      Regarding your second question, let me think about that :). Can you give me an use-case for what kind of allocations/algorithms you’d like to use that?

      • A simple solution for your second question might be to just add an extra byte (or more likely 4 extra bytes) to each slot which you use for storing magic values, or some kind of extra information. E.g. a pool for allocations of 32 bytes could consist of slots having 36 bytes – the only difference being that you still only hand 32 bytes to the user, and use the remaining 4 bytes for whatever you want.

        When iterating the freelist, you can simply walk through the pool with 36 byte increments, and check the magic value to see whether a slot is free or not. Depending on the pool and allocation size that could be slow because you likely need to pull a lot of cache-lines in :(.

  3. Hi Stefan!

    Thanks for the fast reply. So regarding your first example the simplest chunk I can come up with would look something like this:

    struct Chunk
    {
    Freelist mFreelist;
    Chunk* mNext;
    };

    What else would go into the chunk header/footer? Is there really no need to link the free lists? Say I grab the last element in free list in some chunk. I need to link it to the new freellist, right? Or am I missing something.

    Your second approach sounds very interesting. I am not too much into virtual address spaces. Do you have a link where I can study a bit about these things (in our context here)? Your outlook for your blog gives me the impression you might talk about this in the future. Is this correct? Obviously this would be awesome for me :)

    • For storing the chunks in a doubly-linked list, you need both a Chunk* mNext and a Chunk* mPrevious in your Chunk struct. Chunks should be stored in a doubly-linked list so that adding and removing chunks is both an O(1) operation.

      At first I also thought that I somehow need to link the freelists of newly created chunks to previous ones, but if you think about it, you will realize that you actually don’t have to link them.

      Consider the following example:
      struct GrowingPoolAllocator
      {
      Freelist m_freelist;
      // other members not needed for now
      };

      At first, the freelist is initialized in whatever memory region you allocated for it, nothing fancy. Now consider the case where the last slot has been allocated: The Freelist* pointer stored inside the freelist (that is, m_freelist.m_next) will be a nullptr, indicating that there are no more slots left.
      If you now allocate a new chunk, all you have to do is initialize your freelist in the following way:

      m_freelist.~Freelist();
      new (&m_freelist) Freelist(startOfNewChunk, endOfNewChunk, …);

      This first properly destructs the freelist by calling its destructor, and creates a new instance by using placement-new at the address of m_freelist. This initializes a new freelist in the newly allocated chunk.

      Now think about what allocating a new slot will do: it will simply grab a slot out of the freelist, which now has plenty of space again.
      What happens if we free any of the slots that live in the “old” freelist?
      Actually, it’s quite simple: the slot will “automagically” be added to the freelist living in the newly created chunk, because its m_next member will now point to exactly that slot. This means that whenever allocations from the old slots are freed, they will automatically become part of the new freelist.

      Essentially, the slots from all freelists in every chunk allocated will automatically merge and be available without you having to link them explicitly. That’s the beauty of the system – if you go for the solution that stores individual chunks instead of reserving address space, all that’s left to do is link the chunks together in the doubly-linked list.

      Regarding the outlook on future posts: Yes, I will talk about virtual memory, address space, physical pages and that stuff in future posts :). I will also discuss how to implement growing allocators – stack-based, pool, and micro allocators for small-size allocations.

      • I see. That is indeed a good idea. I guess the only gotcha is that free list doesn’t own the memory and you keep track of the allocated chunks to free them later properly. Very nice solution. Thanks for pointing this out :).

  4. I thought about this too and I didn’t like it for the same reasons you point out. Any header or footer will add extra memory and also doesn’t work nice alignment requirements.

    Another idea was to let the user specify some fill value which you can check or some bit field at the end of the free list which marks free/used nodes in the pool. There was something like this in GPG8. I will check this out.

    I am physics programmer and I am looking for a good allocation strategy for rigid body contacts. You allocate and free quite a lot of them each frame and for obvious reasons I don’t want them to create any traffic on the heap. Even though we have a pretty good small object allocator I prefer a specialized solution.

    • Yes, storing bits at the end of a pool sounds like a good idea! It should not create too much wasted memory because you’re only storing individual bits, and for many object sizes you will end up having unused space at the end of the pool anyway. That space can be used for storing bits that indicate free/used slots.

      I assume rigid body contacts are always the same size, storing info about penetration depth, contact normals, and the objects involved in the collision (or something along those lines)? If so, then rigid body contacts sound like the perfect candidate for a pool allocator :).

      • There are two types of contacts essentially. Contacts between convex shapes and contacts between convex shapes and meshes. Both contact types store so called manifolds which describe the contact geometry. The contacts should *not* own the manifolds since this wastes quite a lot of memory because not every contact is actually touching, but only the AABBs in the broad phase overlap. So manifolds should most definitely come from a fast pool allocator. Also if you think about sleeping you might want to purge the manifolds, but not the contacts (but I am not sure if this is still necessary since this affects quality if things wake up and can lead to gameplay bugs like in Halo. Maybe on low memory devices you still want to do this though). I think the actual contacts should come from a pool allocator as well, they are are little bit more persistent than manifolds, but if things move you create a lot of traffic on their memory allocator as well.

        For creating the contact constraints for the manifolds in the solver I need to iterate the free list (hence the second part of my question). Initially I just had an intrusive linked list of manifolds, but for the large worlds that I am seeing for next generation hardware just iterating such a link list is too slow due to the cache misses. Even if you do nothing, but just iterate it takes some significant amount of ms. So I was thinking about “array with holes” to iterate faster.

        I hope this gives you some context. Ideally I would like the manifolds in a contiguous array, but this would involve too much copying – I guess. Maybe there exists some other solution, but I am not aware of any at the moment.

  5. Pingback: Adventures in data-oriented design – Part 3c: External References | Molecular Musings

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s