performance/index.html

   1 [[!template id=performance-includes.tmpl]]
   2
   3 <p><strong>Please note:</strong> Many of the figures on this page utilize SVG, if your browser does not show these a plugin needs to be installed or your browser updated.<p>
   4
   5 <p>DragonFly BSD has numerous attributes that make it compare favorably to other operating systems under a great diversity of workloads. Some select benchmarks that represent the general performance attributes of DragonFly are included in this page. If you have a DragonFly BSD performance success story, the developers would like to hear about it on the mailing lists!</p>
   6
   7 <h2>Symmetric Multi-Processor Scaling (2012)</h2>
   8
   9 <p>It is true that one of the original goals of the DragonFly BSD project was performance-oriented, the project sought to do SMP in more straightforward, composable, understandable and algorithmically superior ways to the work being done in other operating system kernels. The results of this process have become staggeringly obvious with the 3.0 and 3.2 releases of DragonFly, which saw a significant amount of polishing and general scalability work, and the culmination of which can be seen in the following graph.</p>
  10
  11 <p>The following graph charts the performance of the PostgreSQL 9.3 development version as of <a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b0fc0df9364d2d2d17c0162cf3b8b59f6cb09f67" target="_blank">late June 2012</a> on DragonFly BSD 3.0 and 3.2, FreeBSD 9.1, NetBSD 6.0 and Scientific Linux 6.2 running Linux kernel version 2.6.32. The tests were performed using system defaults on each platform with pgbench as the test client with a scaling factor of 800. The test system in question was a dual-socket Intel Xeon X5650 with 24GB RAM.</p>
  12
  13 [[!template id=performance-scaling-postgresql.tmpl]]
  14
  15 <p>NetBSD 6.0 was unable to complete the benchmark run.</p>
  16
  17 <p>In this particular test, which other operating systems have often utilized to show how well they scale, you can see the immense improvement in scalability of DragonFly BSD 3.2. PostgreSQL's scalability in 3.2 far surpasses other BSD-derived codebases and is roughly on par with the Linux kernel (which has been the object of expensive, multi-year optimization efforts). DragonFly performs better than Linux at lower concurrencies and the small performance hit at high client counts was given up willingly to ensure that we maintain acceptable interactivity even at extremely high throughput levels.</p>
  18
  19 <p><em>Note: single-host pgbench runs like the ones above are not directly useful for estimating PostgreSQL scaling behavior. For example, any real-world setup that needs to handle that many transactions would be using multiple PostgreSQL servers and the client themselves would be running on a set of different hosts. This would be much less demanding of the underlying OS. If you plan to use PostgreSQL on DragonFly and are targeting high-throughput, we encourage you to do your own testing and would appreciate any reports of inadequate performance. That said, the above workload does demonstrate the effect of algorithmic improvements that have been incorporated into the 3.2 kernel and should positively affect many real-world setups (not just PostgreSQL ones).</em></p>
  20 <br/>
  21
  22 <h2>Swapcache</h2>
  23
  24 <p>One of the novel features in DragonFly that is able to boost the throughput of a large number of workloads is called swapcache. Swapcache gives the kernel the ability to retire cached pages to one or more interleaved swap devices, usually using commodity solid state disks. By caching filesystem metadata, data or both on an SSD the performance of many read-centric workloads is improved and worst case performance is kept well bounded.</p>
  25
  26 <p>The following chart depicts relative performance of a system with and without swapcache. The application being tested is a PostgreSQL database under a read-only workload, with varying database sizes ranging from smaller than the total ram in the system to double the size of total available memory.</p>
  27
  28 [[!template id=performance-swapcache.tmpl]]
  29
  30 <p>As you can plainly see, performance with swapcache is more than just well bounded, it is dramatically improved. Similar gains can be seen in many other scenarios. As with all benchmarks, the above numbers are indicative only of the specific test performed and to get a true sense of whether or not it will be a benefit to a specific workload it must be tested in that environment. Disclaimers aside, swapcache is appropriate for a huge variety of common workloads, the DragonFly team invites you to try it and see what a difference it can make.</p>
  31
  32 <h2>Symmetric Multi-Processor Scaling (2018)</h2>
  33
  34 <p>It's time to update!  We ran a new set of pgbench tests on a Threadripper 2990WX (32-core/64-thread) running with 128G of ECC 2666C14 memory (8 sticks) and a power cap of 250W at the wall.  The power cap was set below stock at a more maximally efficient point in the power curve.  Stock power consumption usually runs in 330W range.</p>
  35
  36 <p>With the huge amount of SMP work done since 2012, the entire query path no longer has any SMP contention whatsoever and we expect a roughly linear curve until we run out of CPU cores and start digging into hyperthreads.  The only possible sources of contention are increasing memory latencies as more cores load memory, and probably a certain degree of cache mastership ping-ponging that occurs even when locks are shared and do not contend.</p>
  37
  38 <p>The purple curve is the TPS, the green curve is the measured TPS/cpu-thread (with 16 threads selected as the baseline 1.0 ratio), and the light blue curve shows the approximate drop in cpu core frequency as the thread count increases and more CPU cores get loaded.  This particular CPU was running at around 4.2 GHz with one thread and 2.8 GHz or so with all 32 cores loaded.  pgbench was allowed to heat-up during each run, an average of the last three 60-second samples is then graphed.</p>
  39
  40 [[!img tr2990wx01.png align="right" size="" alt=""]]
  41
  42 <p>As we can see, query TPS scales very well with cores on this cpu.  Keeping in mind that this cpu starts digging into its hyperthreads past 32 (really past 28 or so given the client-vs-server breakdown), there is an expectation of seeing the curve go non-linear past 32 threads and then going flat at 64 threads is borne out.  In particular, we can note that hyperthreading gives us approximately 50% more TPS performance, capping out at around 1.18M TPS/sec.</p>