Skip to main content

Command Palette

Search for a command to run...

Quick use case : Remove a field from a batch of documents in elasticsearch index

Published
2 min read
Y

Product Manager at Synapticiel, IT & Telecoms Professional, Interested in ML, AI, IoT, BigData, Cyber Security, RA, FMS, Next Generation BSS/OSS, Mobile Money

I have an index that holds some inventory data from network devices, and tried to update these documents using logstash with other attributes from differents systems like billing.

Surprise! after 2 hours of batch running with logstash, i discovered that that new object (set of attributes from billing) added is not what i'm expecting and i started lokking for a quick way to delete this object from all documents.

The solution was to use update_by_query like this :

POST snmp-inventory-devices/_update_by_query?wait_for_completion=true&conflicts=proceed
{
  "script": "ctx._source.remove('billing')",
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "billing"
          }
        }
      ]
    }
  }
}

Internally elasticsearch does a scan/scroll to collect batches of documents and then update them like the bulk update interface. This is faster than doing it manually with my own scan/scroll interface due to not having the overhead of network and serialization. Each record must be loaded into ram, modified and then written.

Looking into setting conflict=proceed if the cluster has other update traffic, or the whole job will stop when it hits a ConflictError when one of the records is updated underneath one of the batches.

Similarly setting wait_for_completion=false will cause the update_by_query to run via the tasks interface. Otherwise the job will terminate if the connection is closed due to timeout.

More from this blog

My Tech Memories

10 posts

Welcome to my Hashnode blog where i post some of my finding in tech world, mostly interested by topics related to data analytics, observability and cyber security. I like share my experience on ELK

Quick use case : Remove a field from a batch of documents in elasticsearch index