blog.notmyidea.org/content/python/caching-elastic-search.rst

92 lines
2.9 KiB
ReStructuredText

How to cache Elastic Seach Queries?
###################################
:status: other
These days, I'm working on Marketplace, the Mozilla's Appstore. Internally,
we're doing Elastic Search to do search, and after some load tests, we
eventually found out that Elastic Search was our bottleneck.
So we want to reduce the number of requests we do to this server. Currently,
the requests are done trough HTTP.
The first thing to realize is what do we want to cache exactly? There is a fair
bit of things we might want to cache. Let's start by the most queried pages:
the home and the list of apps per category.
Different approaches
====================
You can put the cache in many different locations. The code powering
Marketplace is kinda fuzzy sometimes. The requests to Elastic Search are done
in a number of different parts of the code. They're done sometimes directly
with HTTP calls, sometimes using the ElasticUtils library, sometimes using some
other lib…
That's kind of hard to get where and how to add the caching layer here. What
did we do? We started to work on an HTTP caching proxy. This proxy could
Find a key
==========
Caching things
==============
Caching is easy, it's something like that, in term of python code::
if key in cache:
value = cache.get(key)
else:
value = do_the_request()
cache.set(key, value)
return value
Back to business logic: I want to cache the request that are done on the
homepage. The code currently looks like this::
popular = Webapp.popular(region=region, gaia=request.GAIA)[:10]
Nothing fancy going on here, we're displaying the list of popular and latest
apps on the marketplace. I can cache the results for each region, and depending
if `request.GAIA` is `True` or not. The key will look like `popular-{region}`,
to which I will eventually append `-gaia` if it makes sense. More generally,
I will turn a number of criterias into a single key, like this::
def cache(qs, *keys):
"""Cache the given querystring"""
key = '-'.join(keys)
if key in cache:
result = cache.get(key)
else:
result = qs
cache.set(key, result)
return result
Which I can use like this::
keys = [region.slug, ]
if request.GAIA:
keys.append('gaia')
popular = cache(Webapp.popular(region=region, gaia=request.GAIA)[:10],
'popular', *keys)
Invalidation is hard?
=====================
Right, we got this cached, good. But what happens when the data we cached
changes? Currently, nothing, we return the same data over and over again.
We need to tell our application when this cache isn't good anymore; we need to
*invalidate* it.
We can use different strategies for this:
* Set a timeout for the cache.
* Invalidate manually the cache when something changes, for instance using
signals.
I started by having a look at how I wanted to invalidate all of this, case by
case. The problem