From 9376443d9d795fff5b4e4b1e1d936327c86b4776 Mon Sep 17 00:00:00 2001 From: vsrinivas Date: Tue, 22 Mar 2011 18:45:13 -0700 Subject: [PATCH] /* Add nmalloc GSoC project idea */ --- docs/developer/gsocprojectspage/index.mdwn | 54 ++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/docs/developer/gsocprojectspage/index.mdwn b/docs/developer/gsocprojectspage/index.mdwn index 30490ccb..0a654581 100644 --- a/docs/developer/gsocprojectspage/index.mdwn +++ b/docs/developer/gsocprojectspage/index.mdwn @@ -381,5 +381,59 @@ Meta information: * Difficulty: Hard * Contact point: Samuel J. Greear +--- +##### nmalloc (libc malloc) measurements and performance work + +nmalloc is our libc memory allocator it is a slab-like allocator; it recently had some work done to add per-thread caches, but there is much more work that could be done. A project on this might characterize fragmentation, try out a number of techniques to improve per-thread caching and reduce the number of total syscalls, and see if any are worth applying. + +Possible things to work on: +(thread caches) +* The per-thread caches are fixed-size; at larger object sizes (say 4K), this can result in a lot of memory tied up. Perhaps they should scale their max size inversely to the object size. + +* The per-thread caches are filled one-at-a-time from free(). Perhaps the per-thread caches should be burst-filled. + +* Perhaps the per-thread caches should age items out + +(slab zone allocation) +* zone_alloc() currently burst-allocates slab zones with the zone magazine held across a spinlock. + +* zone_free() holds the zone magazine lock around bzero()ing a slab zone header + +* zone_free() madvise()s one slab at a time; it'd be nice to madvise() runs of contiguous slabs + +* zone_free() madvise()s very readily (for every slab freed). Perhaps it should only madvise slabs that are idle for some time + +* zone_free() burst-frees slabs. Its not clear whether this is a good idea. + +(VMEM): +* currently allocations > either 4k or 8k are forced directly to mmap(); this means that idle memory from free slabs cannot be used to service those allocations and that we do no caching for allocations > than that size. this is almost certainly a mistake. + +* we could use a small (embeddable) data structure that allows: +1. efficient coalescing of adjacent mmap space for madvise +2. efficient queries for vmem_alloc() (w/ alignment!) +3. compact and doesn't use any space in the zone header (dirty/cold!) +4. allows traversal in address order to fight fragmentation +5. keep two such data structures (one for dirty pages, one for cold pages) + +(Note) +* These are just ideas; there are many more things possible and many of these things need a lot of measurement to evaluate them. It'd be interesting to see if any of these are appropriate for it. + +References: +* http://www.usenix.org/event/usenix01/bonwick.html + +A description of the Sun Solaris work on which the DragonFly allocator is based; use this as an overview, but do not take it as gospel for how the DFly allocator works. + +* http://leaf.dragonflybsd.org/~vsrinivas/jemalloc-tech-talk.ogv (Jason Evans tech talk about jemalloc, 1/2011) + +jemalloc is FreeBSD's and Firefox's (and NetBSD and GNASH and ...)'s malloc; in this tech talk, Jason Evans reviews how jemalloc works, how it has changed recently, and how it avoid fragmentation. + +* http://endeavour.zapto.org/src/malloc-thesis.pdf (Ayelet Wasik's thesis 'Features of a Multi-Threaded Memory Allocator') + +This thesis is an excellent overview of many techniques to reduce contention and the effects these techniques have on fragmentation. + +* Prerequisites: C, a taste of data structures +* Difficulty: moderate +* Contact point: Venkatesh Srinivas + --- (please add) -- 2.41.0