ElasticSearch

Akansha Bansal
2 min readSep 20, 2020

--

High-Level Design of Elastic Search

Elastic search is a real-time distributed search engine suitable for Full-text search. The ES library is built on the Lucene index. Lucene is great at compressing data; when you have log-like data its best to partition it into one index per day and reduce search space. Shards evenly distribute data between indexes.ES started in 2010 is the best open-source tech as of today for full-text search.

Its applications are in the space of Big Data, Faceting*, Geo Queries. ES has a document store much like MongoDB, has a look and feels like JSON, access over port 9200. More powerful and complex queries, including those that involve faceting and statistical operations, should use the full ElasticSearch query language and API. In the query, language queries are written as a JSON structure and is then sent to the query endpoint (details of the query language below).

When search meets analytics at scale (in near real time) ~ElasticSearch

Every time text type data is inserted into the Elastic Search index it is analyzed and, then, stored at the inverted index. Depending on how you configure the analyzer will impact your searching capabilities because the analyzer is also applied for full-text search.

Analyzer pipe consists of three stages:

Character filter (0+) → Tokeniser (1) → Token filter (0+)

Step-by-step guide

Concepts:(more to come .. to be continued)

  1. Data Structures: Inverted Index, Stored Fields, Document Values
  2. Mapping Schema: Mapping is the process of defining how a document should be mapped to the Search Engine, including its searchable characteristics such as which fields are searchable and if/how they are tokenized. In ElasticSearch, an index may store documents of different “mapping types”. ElasticSearch allows one to associate multiple mapping definitions for each mapping type.
  3. Filters and Queries
  4. Document Store
  5. Query DSL

Application

Let’s get familiar with writing queries in elastic search to allow filtering logs by (a) timestamp (b) removing repetitive messages in a preset buffer time duration

https://github.com/ak-b/ElasticSearch101

Resources

The elastic search website is the most promising place to start learning about and finding documentation on Logstash, Elastic Search, and Kibana.
http://www.elasticsearch.org/overview/

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html

https://okfnlabs.org/blog/2013/07/01/elasticsearch-query-tutorial.html

http://www.elasticsearch.org/guide/reference/api/search/uri-request.html

https://elasticsearch-py.readthedocs.io/en/master/

https://towardsdatascience.com/deep-dive-into-querying-elasticsearch-filter-vs-query-full-text-search-b861b06bd4c0

--

--

No responses yet