mirror of
https://github.com/almet/notmyidea.git
synced 2025-04-28 19:42:37 +02:00
92 lines
2.9 KiB
ReStructuredText
92 lines
2.9 KiB
ReStructuredText
How to cache Elastic Seach Queries?
|
|
###################################
|
|
|
|
:status: other
|
|
|
|
These days, I'm working on Marketplace, the Mozilla's Appstore. Internally,
|
|
we're doing Elastic Search to do search, and after some load tests, we
|
|
eventually found out that Elastic Search was our bottleneck.
|
|
|
|
So we want to reduce the number of requests we do to this server. Currently,
|
|
the requests are done trough HTTP.
|
|
|
|
The first thing to realize is what do we want to cache exactly? There is a fair
|
|
bit of things we might want to cache. Let's start by the most queried pages:
|
|
the home and the list of apps per category.
|
|
|
|
Different approaches
|
|
====================
|
|
|
|
You can put the cache in many different locations. The code powering
|
|
Marketplace is kinda fuzzy sometimes. The requests to Elastic Search are done
|
|
in a number of different parts of the code. They're done sometimes directly
|
|
with HTTP calls, sometimes using the ElasticUtils library, sometimes using some
|
|
other lib…
|
|
|
|
That's kind of hard to get where and how to add the caching layer here. What
|
|
did we do? We started to work on an HTTP caching proxy. This proxy could
|
|
|
|
Find a key
|
|
==========
|
|
|
|
|
|
|
|
Caching things
|
|
==============
|
|
|
|
Caching is easy, it's something like that, in term of python code::
|
|
|
|
if key in cache:
|
|
value = cache.get(key)
|
|
else:
|
|
value = do_the_request()
|
|
cache.set(key, value)
|
|
return value
|
|
|
|
Back to business logic: I want to cache the request that are done on the
|
|
homepage. The code currently looks like this::
|
|
|
|
popular = Webapp.popular(region=region, gaia=request.GAIA)[:10]
|
|
|
|
Nothing fancy going on here, we're displaying the list of popular and latest
|
|
apps on the marketplace. I can cache the results for each region, and depending
|
|
if `request.GAIA` is `True` or not. The key will look like `popular-{region}`,
|
|
to which I will eventually append `-gaia` if it makes sense. More generally,
|
|
I will turn a number of criterias into a single key, like this::
|
|
|
|
def cache(qs, *keys):
|
|
"""Cache the given querystring"""
|
|
key = '-'.join(keys)
|
|
if key in cache:
|
|
result = cache.get(key)
|
|
else:
|
|
result = qs
|
|
cache.set(key, result)
|
|
return result
|
|
|
|
Which I can use like this::
|
|
|
|
keys = [region.slug, ]
|
|
if request.GAIA:
|
|
keys.append('gaia')
|
|
|
|
popular = cache(Webapp.popular(region=region, gaia=request.GAIA)[:10],
|
|
'popular', *keys)
|
|
|
|
Invalidation is hard?
|
|
=====================
|
|
|
|
Right, we got this cached, good. But what happens when the data we cached
|
|
changes? Currently, nothing, we return the same data over and over again.
|
|
|
|
We need to tell our application when this cache isn't good anymore; we need to
|
|
*invalidate* it.
|
|
|
|
We can use different strategies for this:
|
|
|
|
* Set a timeout for the cache.
|
|
* Invalidate manually the cache when something changes, for instance using
|
|
signals.
|
|
|
|
I started by having a look at how I wanted to invalidate all of this, case by
|
|
case. The problem
|