Drawing a Line Under Aligned Memory

When we allocate memory we often forget about alignment. Paul Floyd reminds about various aligned allocation functions.

Recently I’ve been doing some work with the various Unix-like systems implementations of C functions to allocate aligned memory. These are memalign, aligned_alloc and posix_memalign. Typically, you would use these functions to get memory that is aligned with cache lines or virtual memory pages. As an example of this, imagine a networking application that needs to allocate struct msghdr and to have the fastest memory access possible. This struct has a size of 56 bytes. If you use malloc to allocate your memory you are likely to get back a pointer that, depending on the system, is 8- or 16-byte aligned. That means that there is a fair chance that the memory will straddle a 64-byte alignment boundary. That is bad because that is what cache lines map to, meaning that accessing fields of the structure will hit two cache lines. This increases the risk of cache misses, resulting in lower performance.

I’m not going to detail the performance benefits (or drawbacks) of using these functions. Instead in this article I’ll be discussing some of the issues that I saw. The implementations that I’ve looked at are Linux glibc [GNU libc], Linux musl [musl], FreeBSD jemalloc [FreeBSD], macOS [XNU] and Illumos [illumos]. There are other malloc libraries (Illumos umem, tcmalloc, rpmalloc and snmalloc for instance) but I haven’t looked at them. Also, (almost) no Windows as I don’t use it enough to make fair comment.

History

These functions go back a long way. memalign goes back to SunOS 4.1.3 (Aug 1992 according to Wikipedia). Despite its age it is not a ‘standard’ function. The non-standard-ness shows, as we’ll see shortly. That means it doesn’t figure in either the C standard or the POSIX standard. It doesn’t exist on macOS. glibc and musl both have implementations. Finally, FreeBSD gained a version late in the game in 2020 to add glibc compatibility.

posix_memalign, as the name implies, is a bona fide part of the POSIX spec. IEEE Std 1003.1d-1999 Additional Realtime Extensions to be precise. All the systems and libraries that I looked at implement posix_memalign.

aligned_alloc was standardized in C11. Again, this was implemented on all the systems that I looked at.

What they claim to do

Here is what the Linux man page says:

The function posix_memalign() allocates size bytes and places the address of the allocated memory in *memptr. The address of the allocated memory will be a multiple of alignment, which must be a power of two and a multiple of sizeof(void *). This address can later be successfully passed to free(3). If size is 0, then the value placed in *memptr is either NULL or a unique pointer value.

The obsolete function memalign() allocates size bytes and returns a pointer to the allocated memory. The memory address will be a multiple of alignment, which must be a power of two.

The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a multiple of alignment.

That all sounds very reasonable. The POSIX standard has similar wording for posix_memalign. The spec can be accessed from The Open Group [opengroup], but you need to create an account and log in to access it.

Sadly, C11 does not have very much to say about aligned_alloc:

The value of alignment shall be a valid alignment supported by the implementation and the value of size shall be an integral multiple of alignment.

Great, so the alignment can be anything, but the size needs to be a multiple of the same anything. The final draft of C11 can be found here [C11 final].

I can’t comment on memalign since it isn’t standardized.

Musl, and more specifically Alpine Linux, doesn’t change the man page.

The FreeBSD description for posix_memalign is very similar. For aligned_alloc it says:

The aligned_alloc() function allocates size bytes of memory such that the allocation’s base address is a multiple of alignment. The requested alignment must be a power of 2. Behavior is undefined if size is not an integral multiple of alignment.

There is no manpage for memalign on FreeBSD.

Illumos has the following to say of memalign:

The memalign() function allocates size bytes on a specified alignment boundary and returns a pointer to the allocated block. The value of the returned address is guaranteed to be an even multiple of alignment. The value of alignment must be a power of two and must be greater than or equal to the size of a word.

The Illumos wording for posix_memalign is again similar to the others, but with one exception. This time the behaviour when the size is zero is specified:

If the size of the space requested is 0, the value returned in memptr will be a null pointer.

The macOS manpages are quite similar to FreeBSD.

To summarize so far, posix_memalign is fairly well defined. memalign is a bit hazy for a size of zero and I’m not sure what Solaris was getting on about saying that the return address will be an even multiple of the alignment. All of the descriptions of aligned_alloc say that the alignment must be a power of two and the size an integral multiple of the alignment.

What they actually do?

So how do the implementations match up to the specs? I’m not going to go into internal details – all the functions may allocate more than asked or be aligned to a higher value.

Thus far I’ve been describing the functions in chronological order. This time I’m going to let posix_memalign jump the queue. All the implementations behave as specified. Illumos does indeed not allocate if the size is zero. The other implementations allocate some unspecified amount.

The man page for Linux glibc memalign claimed that the alignment must be a power of two. In fact, any value of alignment will be accepted and silently bumped up to the next power of two.

Two of the memalign implementations were buggy. FreeBSD would crash if the alignment was zero – I’ve submitted a patch for that which has been merged. Illumos only restricts the memalign alignment to being a multiple of four. That can result in some peculiar values for the alignment. I’ve opened a bug tracker item for that. There was nothing wrong with musl that I could see.

On to the last of the trio, aligned_alloc. The Linux man page claims that this is the same as memalign except that the size should be a multiple of the alignment. For glibc, doing that would be an amazing technical feat. The two functions are in fact the same. To be more precise they are both weak aliases of __libc_memalign. So, there is no extra constraint on the size.

What is a 'weak alias'?

It is a mechanism that allows one or more symbols to refer to the same object or function. I shall now digress into the world of the link editor and the link loader. I expect that everyone reading this is familiar with compiling and linking libraries and executables. You compile some source files into object files and then link them. There isn’t always a 1:1 relationship between names in your source and symbols in object files. There are several ways in which this can happen. One way that this can be done is to explicitly request a ‘weak alias’. These aliases can refer to any other symbol, and unlike regular symbols it is not an error if weak aliases do not get resolved. That makes them ideal to use for functions such as the malloc family that are specified to be replaceable.

Consider this small program:

  #include <iostream>
  extern "C" void hello()
  {
    std::cout << "Hello from " << __func__ 
      << " address " << std::hex << (size_t)hello 
      << '\n'; 
  } 
  extern "C" void hello_alias() __attribute__ 
    ((weak, alias ("hello"))); 
  int main() 
  { 
    hello(); 
    hello_alias(); 
  }

As you can see, main() calls two functions, but only one is defined!

If I compile and run this, I get

paulf> ./weak_alias 
Hello from hello address 202740 
Hello from hello address 202740

As you see, both calls print the same function address, confirming that the weak alias calls the original strong function. The nm tool can show this in the binary:

paulf> nm weak_alias | grep hello 
0000000000202740 T hello 
0000000000202740 W hello_alias

T means a global function and W a weak alias. Getting back to the GNU libc case of weak aliases, nm can again be used to show them. First of all, aligned_alloc

paulf> nm /lib64/libc.so.6 | grep aligned_alloc 
000000000009a6f0 W aligned_alloc

Then all symbols with the same address:

paulf> nm /lib64/libc.so.6 | grep 000000000009a6f0 
000000000009a6f0 W aligned_alloc 
000000000009a6f0 t __GI___libc_memalign 
000000000009a6f0 T __libc_memalign 
000000000009a6f0 W memalign 
000000000009a6f0 t __memalign

Here, __libc_memalign is the real, private, implementation and aligned_alloc and memalign are the public aliases for __libc_memalign.

Other platforms also use a lot of code sharing. FreeBSD memalign calls aligned_alloc but with the size rounded up to a multiple of alignment. If anything, I would have expected the opposite, but anything goes when functions are non-standard, or implementation defined. Musl memalign just calls aligned_allloc. And with a nice bit of symmetry, Illumos aligned_alloc just calls memalign.

Just when I thought I’d covered everything, I discovered that if you use a huge value of alignment with musl aligned_alloc then it will crash with a segfault. The crash is in version 1.2.2 and it has apparently been fixed in 1.2.3.

So far, no platform has done anything about the “the value of size shall be an integral multiple of alignment” part of the C11 standard. macOS is the remaining platform and it DOES do something about it. If the size isn’t an integral multiple of the alignment, then it will return NULL and set errno to EINVAL.

One thing that is generally not documented is that most of the functions will fail if the alignment is huge (over half the memory space). In that case they will return NULL and set errno to EINVAL.

Windows almost got away without a mention. Whilst Windows doesn’t have any of the Unix aligned allocation functions (not even C11 aligned_alloc), it does have its own variation. It’s called _aligned_malloc [Microsoft].

Other than having an underscore and an extra ‘m’, Microsoft also has the order of the alignment and the size arguments reversed. That seems to me a source of confusion and potential bugs. I’m not sure if _aligned_alloc predates memalign, I see references to it going as far back as VC++ 6.0 (1998). That means that by the time C11 came around there were already functions with different argument ordering.

Advice

Whilst I must say that I was quite underwhelmed by the quality of what I saw, I don’t think that in practice these are big issues. I do recommend that you avoid using an alignment that is zero or a non-power of two. Unfortunately, Hyram’s law [hyrum] says that there is probably code out there that is taking advantage of Linux glibc working out the next power of two for the alignment. For portability, posix_memalign and aligned_alloc have the edge, and of the two, aligned_alloc is easier to adapt to its Windows counterpart, _aligned_malloc. However, you still need to take care that the size is an integral multiple of the alignment if you also port to macOS.

References

[C11 final] International Standard: https://open-std.org/JTC1/SC22/WG14/www/docs/n1570.pdf

[FreeBSD] Source for freebsd: https://github.com/freebsd/freebsd-src

[GNU libc] Source for glibc v2.37: https://elixir.bootlin.com/glibc/glibc-2.37/source

[hyrum] Hyrum’s Law: https://www.hyrumslaw.com/

[illumos] Illumos is the continuation of OpenSolaris: https://github.com/illumos/illumos-gate

[Microsoft] _aligned_malloc: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/aligned-malloc?view=msvc-170

[musl] Source for musl: https://elixir.bootlin.com/musl/v1.2.3/source

[opengroup] Open Group Library: https://publications.opengroup.org

[XNU] Source browser: https://opensource.apple.com/source/xnu/ (there are also GitHub mirrors)

Paul Floyd has been writing software, mostly in C++ and C, for about 30 years. He lives near Grenoble, on the edge of the French Alps and works for Siemens EDA developing tools for analogue electronic circuit simulation. In his spare time, he maintains Valgrind.

Idalia is a freelance artist operating at the intersection of art and geek, using a myriad of techniques and styles to produce works that both delight and entertain.