Retention policy in Elasticsearch 6.x on AWS: deleting a document by query

The retention policy of data stored in a time series database is a subject that sooner or later every system administrator will face in order to keep the server space consumption under control.

This article explains how to introduce a retention policy with which we will be able to delete documents stored in Elasticsearch 6.x (hosted by Amazon Web Server in this case) older than a certain range of time using the Elasticsearch 6.x API.

The technology stack used in this article:

Elasticsearch 6.0, hosted by AWS
Python 3.6.5

The method we are gonna use is called delete_by_query with the following syntax:

POST twitter/_delete_by_query

  "query": { 
    "match": {
      "message": "some message"

The above query shows how to delete documents making a punctual match based on content of “message” string.

In order to achieve our desired retention policy, we will delete all the documents having been created earlier than a certain date and we will do it in regular basis thanks to a Python script. In order to do so, we will use the “range” keyword combined with “lte“, the “less than equal” operator.

This time the query will look like:

POST twitter/_delete_by_query

  "query": { 
    "range": {
      "timestamp": {
          "lte": "June 7th 2018, 23:59:59.999"  

Doing so, we will delete all the stored documents which were created before June 8th 2018.

To make it systematically, we can embed the deletion query in a scheduled Python script running, for instance, every night. The following Python script, using the Elasticsearch Python client, will delete all the documents older than seven days from now, which have been recorded in test-index, having “tweet” as doc_type.

from elasticsearch import Elasticsearch
from datetime import datetime, timedelta
retention_days = 7
offset = - timedelta(days=retention_days)
retention_policy = offset.replace(hour=23, minute=59, second=59, microsecond=99999)
es = Elasticsearch(['ES_EndPoint'])
body = {
  'query': {
    'range': {
      'timestamp': {
        'lte': retention_policy

This is the situation in the AWS Elasticsearch dashboard before to run the code of above:

AWS ES Dashboard before deletion

So running the code, finally we will end up deleting the only 3 available documents, obtaining the following response in case everything ran just fine:

  'took': 7,
  'timed_out': False,
  'total': 3,
  'deleted': 3,
  'batches': 1,
  'version_conflicts': 0,
  'noops': 0,
  'retries': {'bulk': 0, 'search': 0},
  'throttled_millis': 0,
  'requests_per_second': -1.0,
  'throttled_until_millis': 0,
  'failures': []

In the end AWS will show the following final state:

AWS ES Dashboard after deletion

Coffee from the world, Kankan

Tasting a not bad at all coffee on Sunday on our terrace in Kankan. It comes together with a small glass of Croatian liquor, kindly donated me by a colleague from Amsterdam.

@Kankan, Briqueterie
@Kankan, Briqueterie – Guinea

Hello world, from the midpoint of the world

Office @Kankan

Probably “Ciao Mondo”, both in Italian and in its English version “Hello World”, is my favorite string for “craft-debugging”.

After 7 months this string seems to be enriched, bringing more meaning since I code  and “bang” it on standard output from the midpoint of the world.  From Guinea and, being even more precise, from the sunny town of Kankan.

Hot temperatures, tropical rains, mosquitoes, poor variety of food and an endless humanity have transformed me in a “tubabu” (white man, in Malinkè) freelancer, citizen of the world.

Home Office @Kankan
Home Office @Kankan







from termcolor import cprint
cprint('ciao mondo, dall\'ombelico del mondo', 'yellow')