GeeCon - Solr at Allegro #
One particularly interesting to me was on Allegro’s (polish Ebay) Solr usage. In terms of numbers: They have 20Mio
offers in Poland, another 10Mio active offers in partnering countries. In addition in their index there are 50Mio
inactive offers in Poland and 40 Mio closed offers outside that country. They serve 8Mio updates a day, that is 100
updates a second. Those are related to start/end of bidding phase, buy now actions, cancelled bids, bids
themselves.
Per day they have 105Mio requests per day, on peak time in the evening that is 3.5k requests per
second. Of those 75% are answered in less than 5ms, 90% in less than 20ms.
To achieve that performance they are
using Solr. Coming from a database based system, going via a proprietary search product they are now happy users of
Solr with much better customer support both from the community as well as from contractors than with their previous
paid for solution.
The speakers went into some detail on how they solved particular technical issues: They had
to decide to go for an external data feeder to avoid putting the database itself under too much load even when just
indexing the updates. On updates they need to deal with having to reconstruct the whole document as updates for Solr
right now mean deleting the old document and indexing the new one. In addition commits are pretty expensive, so they
ended up delaying commits for as long as the SLA would allow (one minute) and committing them as batch.
They
tried to shard indexes by category facetted by – that did not work particularly wrong as with their user behaviour it
resulted in too many cross-shard requests. Index size was an issue for them so they reduced the amount of data indexed
and stored in Solr to the absolute minimum – all else was out-sourced to a key-value store (in their case
MongoDB).
When it comes to caching that proved to be the component that needed most tweaks – they put a varnish
in front (Solr speaks xml over http which is simple enough to find caches for) – in relation with the index delay they
had in place they could tune eviction times. Result were cache hit rates of about 30 to 40 percent. When it comes to
internal caches: High eviction and low hit rates are a problem. Watch the Solr Admin Console for more statistics. Are
there too many unique objects in your index? Are caches too small? Are there too many unique queries? They ended up
binding users to solr backends by having a routing be sticky with the user’s cookie – as users tend to drill down on
the same dataset over and over again in their case that raised hit rates substancially. When tuning filter queries:
Have them as independent as possible – don’t use many unique combinations of the same filtering over and over again.
Instead filter individually to better use that cache.
For them Solr proved to be a stable, efficient, flexible,
easy to monitor and maintain and change system that ran without failure for the first 8 months with the whole
architecture being designed and prototyped (at near production quality) by one developer in six
months.
Currently the system is running on 10 solr slaves (+ power backup) compared to 25 nodes before. A full
index takes 4 hours, bottlenecked at the feeder, potentially that could be pushed down to one hour. Updates of course
flow in continuously.