Benchmarking Ruby Enterprise Edition

January 31, 2011 at 11:42 AM

As part of a perpetual quest to make things run faster, I benchmarked various Ruby configurations using the Urbanspoon codebase. We have a smoke test that hits our key pages regularly to ensure things are running well, and I decided to use this for speed comparison as well. My methodology was simple.

  1. Run a local staging server with MySQL & memcache. Run smoke against it a lot to get the caches really hot – we want to test the variations in Ruby, not in other layers.
  2. For each test configuration, start Unicorn and run smoke against that server 10 times [1]. This basically tests how quickly Rails can load a few simple objects from the database and squish them together with a pre-rendered page from memcache.
  3. Repeat #2, but clear out memcache[2] before the start of each run of the smoke test. This tests a lot more of the application's internals, since it makes each test run rebuild the important pages and repopulate memcache. The MySQL cache stays hot throughout both tests and it's not anywhere near breaking a sweat.
  4. Now that we have 2 views of speed recorded 10 times each for all of the configurations, we can crunch some numbers. Throw out the best and worst times, take the average, and learn.

I benchmarked four REE configs, and I used the standard MRI 1.8.7 version as a baseline. The results were gratifying.

Five Pairs of Ruby Benchmarks
Configuration Cache hot Cache cold
MRI 1.8.7 10.526875 0.0% 33.77225 0.0%
REE (default) 7.90675 -24.89% 25.43275 -24.69%
REE (tuned GC[3]) 6.972 -33.76% 22.948875 -32.05%
REE (copy-on-write[4]) 9.0375 -14.14% 26.539625 -21.41%
REE (tuned GC, copy-on-write) 7.117625 -32.4% 23.1595 -31.4%

Figures shown are total seconds elapsed and difference relative to MRI 1.8.7

Hey, cool! There's some validation, all right. Garbage collection has a huge impact on the performance of web applications, and proper tuning can mean a world of difference. REE does better out of the box, and really flies with a little tuning. Copy-on-write, which reduces overall memory usage, definitely has some performance penalties. But when GC flags are set it really doesn't degrade things much at all. This could be a huge win.

I'd love to test some other implementations when I get the time, but for now we're going to slowly migrate things to REE.