When we allocate memory we often forget about alignment. Paul Floyd reminds about various aligned allocation functions.
Recently I’ve been doing some work with the various Unix-like systems implementations of C functions to allocate aligned memory. These are
posix_memalign. Typically, you would use these functions to get memory that is aligned with cache lines or virtual memory pages. As an example of this, imagine a networking application that needs to allocate
struct msghdr and to have the fastest memory access possible. This struct has a size of 56 bytes. If you use
malloc to allocate your memory you are likely to get back a pointer that, depending on the system, is 8- or 16-byte aligned. That means that there is a fair chance that the memory will straddle a 64-byte alignment boundary. That is bad because that is what cache lines map to, meaning that accessing fields of the structure will hit two cache lines. This increases the risk of cache misses, resulting in lower performance.
I’m not going to detail the performance benefits (or drawbacks) of using these functions. Instead in this article I’ll be discussing some of the issues that I saw. The implementations that I’ve looked at are Linux glibc [GNU libc], Linux musl [musl], FreeBSD jemalloc [FreeBSD], macOS [XNU] and Illumos [illumos]. There are other malloc libraries (Illumos umem, tcmalloc, rpmalloc and snmalloc for instance) but I haven’t looked at them. Also, (almost) no Windows as I don’t use it enough to make fair comment.
These functions go back a long way.
memalign goes back to SunOS 4.1.3 (Aug 1992 according to Wikipedia). Despite its age it is not a ‘standard’ function. The non-standard-ness shows, as we’ll see shortly. That means it doesn’t figure in either the C standard or the POSIX standard. It doesn’t exist on macOS. glibc and musl both have implementations. Finally, FreeBSD gained a version late in the game in 2020 to add glibc compatibility.
posix_memalign, as the name implies, is a bona fide part of the POSIX spec. IEEE Std 1003.1d-1999 Additional Realtime Extensions to be precise. All the systems and libraries that I looked at implement
aligned_alloc was standardized in C11. Again, this was implemented on all the systems that I looked at.
What they claim to do
Here is what the Linux man page says:
The function posix_memalign() allocates size bytes and places the address of the allocated memory in *memptr. The address of the allocated memory will be a multiple of alignment, which must be a power of two and a multiple of sizeof(void *). This address can later be successfully passed to free(3). If size is 0, then the value placed in *memptr is either NULL or a unique pointer value.
The obsolete function memalign() allocates size bytes and returns a pointer to the allocated memory. The memory address will be a multiple of alignment, which must be a power of two.
The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a multiple of alignment.
That all sounds very reasonable. The POSIX standard has similar wording for
posix_memalign. The spec can be accessed from The Open Group [opengroup], but you need to create an account and log in to access it.
Sadly, C11 does not have very much to say about
The value of alignment shall be a valid alignment supported by the implementation and the value of size shall be an integral multiple of alignment.
Great, so the alignment can be anything, but the size needs to be a multiple of the same anything. The final draft of C11 can be found here [C11 final].
I can’t comment on
memalign since it isn’t standardized.
Musl, and more specifically Alpine Linux, doesn’t change the man page.
The FreeBSD description for
posix_memalign is very similar. For
aligned_alloc it says:
The aligned_alloc() function allocates size bytes of memory such that the allocation’s base address is a multiple of alignment. The requested alignment must be a power of 2. Behavior is undefined if size is not an integral multiple of alignment.
There is no manpage for
memalign on FreeBSD.
Illumos has the following to say of
The memalign() function allocates size bytes on a specified alignment boundary and returns a pointer to the allocated block. The value of the returned address is guaranteed to be an even multiple of alignment. The value of alignment must be a power of two and must be greater than or equal to the size of a word.
The Illumos wording for
posix_memalign is again similar to the others, but with one exception. This time the behaviour when the
size is zero is specified:
If the size of the space requested is 0, the value returned in memptr will be a null pointer.
The macOS manpages are quite similar to FreeBSD.
To summarize so far,
posix_memalign is fairly well defined.
memalign is a bit hazy for a size of zero and I’m not sure what Solaris was getting on about saying that the return address will be an even multiple of the alignment. All of the descriptions of
aligned_alloc say that the alignment must be a power of two and the size an integral multiple of the alignment.
What they actually do?
So how do the implementations match up to the specs? I’m not going to go into internal details – all the functions may allocate more than asked or be aligned to a higher value.
Thus far I’ve been describing the functions in chronological order. This time I’m going to let
posix_memalign jump the queue. All the implementations behave as specified. Illumos does indeed not allocate if the size is zero. The other implementations allocate some unspecified amount.
The man page for Linux glibc
memalign claimed that the alignment must be a power of two. In fact, any value of alignment will be accepted and silently bumped up to the next power of two.
Two of the
memalign implementations were buggy. FreeBSD would crash if the alignment was zero – I’ve submitted a patch for that which has been merged. Illumos only restricts the
memalign alignment to being a multiple of four. That can result in some peculiar values for the alignment. I’ve opened a bug tracker item for that. There was nothing wrong with musl that I could see.
On to the last of the trio,
aligned_alloc. The Linux man page claims that this is the same as
memalign except that the size should be a multiple of the alignment. For glibc, doing that would be an amazing technical feat. The two functions are in fact the same. To be more precise they are both weak aliases of
__libc_memalign. So, there is no extra constraint on the size.
|What is a 'weak alias'?|
Other platforms also use a lot of code sharing. FreeBSD
aligned_alloc but with the size rounded up to a multiple of alignment. If anything, I would have expected the opposite, but anything goes when functions are non-standard, or implementation defined. Musl
memalign just calls
aligned_allloc. And with a nice bit of symmetry, Illumos
aligned_alloc just calls
Just when I thought I’d covered everything, I discovered that if you use a huge value of alignment with musl
aligned_alloc then it will crash with a segfault. The crash is in version 1.2.2 and it has apparently been fixed in 1.2.3.
So far, no platform has done anything about the “the value of size shall be an integral multiple of alignment” part of the C11 standard. macOS is the remaining platform and it DOES do something about it. If the size isn’t an integral multiple of the alignment, then it will return NULL and set errno to EINVAL.
One thing that is generally not documented is that most of the functions will fail if the alignment is huge (over half the memory space). In that case they will return
NULL and set
Windows almost got away without a mention. Whilst Windows doesn’t have any of the Unix aligned allocation functions (not even C11
aligned_alloc), it does have its own variation. It’s called
Other than having an underscore and an extra ‘m’, Microsoft also has the order of the alignment and the size arguments reversed. That seems to me a source of confusion and potential bugs. I’m not sure if
memalign, I see references to it going as far back as VC++ 6.0 (1998). That means that by the time C11 came around there were already functions with different argument ordering.
Whilst I must say that I was quite underwhelmed by the quality of what I saw, I don’t think that in practice these are big issues. I do recommend that you avoid using an alignment that is zero or a non-power of two. Unfortunately, Hyram’s law [hyrum] says that there is probably code out there that is taking advantage of Linux glibc working out the next power of two for the alignment. For portability,
aligned_alloc have the edge, and of the two,
aligned_alloc is easier to adapt to its Windows counterpart,
_aligned_malloc. However, you still need to take care that the size is an integral multiple of the alignment if you also port to macOS.
[C11 final] International Standard: https://open-std.org/JTC1/SC22/WG14/www/docs/n1570.pdf
[FreeBSD] Source for freebsd: https://github.com/freebsd/freebsd-src
[GNU libc] Source for glibc v2.37: https://elixir.bootlin.com/glibc/glibc-2.37/source
[hyrum] Hyrum’s Law: https://www.hyrumslaw.com/
[illumos] Illumos is the continuation of OpenSolaris: https://github.com/illumos/illumos-gate
[Microsoft] _aligned_malloc: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/aligned-malloc?view=msvc-170
[musl] Source for musl: https://elixir.bootlin.com/musl/v1.2.3/source
[opengroup] Open Group Library: https://publications.opengroup.org
[XNU] Source browser: https://opensource.apple.com/source/xnu/ (there are also GitHub mirrors)
has been writing software, mostly in C++ and C, for about 30 years. He lives near Grenoble, on the edge of the French Alps and works for Siemens EDA developing tools for analogue electronic circuit simulation. In his spare time, he maintains Valgrind.
Idalia is a freelance artist operating at the intersection of art and geek, using a myriad of techniques and styles to produce works that both delight and entertain.