Introduction

Elasticsearch

Elasticsearch is a full-text search engine accesible through a restful interface. It is a document-based store, and can thus be used to replace other document stores e.g. MongoDB or RavenDB.

Anytime you start an instande of Elasticsearch, you are starting a node. A collection of nodes is called a cluster. If you are running a single node of Elasticsearch, then you have a cluster of one node. There are different types of nodes for different purposes.

A role of a Elasticsearch server can be either master or slave. Master servers are responsible for the cluster health and stability. In large deployments with a lot of cluster nodes, it’s recommended to have more than one dedicated master. Typically, a dedicated master will not store data or create indexes. Thus, there should be no chance of being overloaded, by which the cluster health could be endangered.

Slave servers are used as workhorses which can be loaded with data tasks. Even if a slave node is overloaded, the cluster health shouldn’t be affected seriously, provided there are other nodes to take additional load.

A table comparing terminologies

MySQL (RDBMS) Terminology ElasticSearch Terminology
Database Index
Table Type
Row Document

Example response

{ "name" : "Domo", "cluster_name" : "elasticsearch_root", "version" : { "number" : "2.2.0", "build_hash" : "8ff36d139e16f8720f2947ef62c8167a888992fe", "build_timestamp" : "2016-01-27T13:32:39Z", "build_snapshot" : false, "lucene_version" : "5.4.1" }, "tagline" : "You Know, for Search" }

Types of nodes

Master-eligible node

A node that has node.master set to true (default), which makes it eligible to be elected as the master node, which controls the cluster.

Data node

A node that has node.data set to true (default). Data nodes hold data and perform data related operations such as CRUD, search, and aggregations.

Ingest node

A node that has node.ingest set to true (default). Ingest nodes are able to apply an ingest pipeline to a document in order to transform and enrich the document before indexing. With a heavy ingest load, it makes sense to use dedicated ingest nodes and to mark the master and data nodes as node.ingest: false.

Tribe node

A tribe node, configured via the tribe.* settings, is a special type of coordinating only node that can connect to multiple clusters and perform search and other operations across all connected clusters.

Shards and replicas

An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.

Shards

To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent “index” that can be hosted on any node in the cluster.

Having more shards improves the indexing performance.

Replicas

In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.

Having more replicas makes searching faster.

To get started read Quickstart followed by Configuration.