Skip to main content

Migrating ElasticSearch 2.x to ElasticSearch 5.x

In my previous blog post, I described how to install and configure an ElasticSearch 5.x cluster.
In this blog post, we will look at how to migrate data.
Consult this table to verify that rolling upgrades are supported for your version of Elasticsearch.

Full cluster upgrade (2.x to 5.x)

We will have to do full cluster upgrade and restart.

  1. Install Elasticsearch Migration Helper on old cluster. This plugin will help you to check whether you can upgrade directly to the next major version of Elasticsearch, or whether you need to make changes to your data and cluster before doing so.
    cd /work/elk/elasticsearch-2.4.3/
    curl -O -L https://github.com/elastic/elasticsearch-migration/releases/download/v2.0.4/elasticsearch-migration-2.0.4.zip
    ./bin/plugin install file:///work/elk/elasticsearch-2.4.3/elasticsearch-migration-2.0.4.zip
    
  2. Start old ElasticSearch:
    ./bin/elasticsearch &
    
  3. Browse elasticsearch-migration
  4. Click on "Cluster Checkup" > "Run checks now". Check all the suggestions.
  5. "Reindex Helper": Elasticsearch is able to read indices created in the previous major version only. For instance, Elasticsearch 5.x can use indices created in Elasticsearch 2.x, but not those created in Elasticsearch 1.x or before. Hence, all indices created before v2.0.0 need to be reindexed before they can be used in Elasticsearch 5.x. The reindex helper upgrades old indices at the click of a button. Works in Elasticsearch 2.3.x and 2.4.x only.
  6. Use "Deprecation Logging" to log a message whenever deprecated functionality is used. This tool enables or disables deprecation logging on your cluster. Deprecation logging is available in Elasticsearch 2.4.x only.
  7. Disable shard allocation:
    curl -XPUT 'http://localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d '{
      "persistent": {
        "cluster.routing.allocation.enable": "none"
      }
    }'
    
  8. Perform a synced flush: Shard recovery will be much faster if you stop indexing and issue a synced-flush request. A synced flush request is a "best effort" operation. It will fail if there are any pending indexing operations, but it is safe to reissue the request multiple times if necessary.
    curl -XPOST 'http://localhost:9200/_flush/synced?pretty'
    
  9. Register a repository to save your snapshot: Check the path that you configures as "path.repo" in your elasticsearc.yml. In my case it was: "/work/elk/data/backup/"
    curl -XPUT 'http://localhost:9200/_snapshot/my_backup_2_to_5?pretty' -H 'Content-Type: application/json' -d '{
      "type": "fs",
      "settings": {
        "location": "/work/elk/data/backup/my_backup_2_to_5",
        "compress": true
      }
    }'
    
    When a repository is registered, it’s immediately verified on all master and data nodes to make sure that it is functional on all nodes currently present in the cluster.
    You can manually verify using:
    curl -XPOST 'http://localhost:9200/_snapshot/my_backup_2_to_5/_verify?pretty'
    
  10. Take the snapshot of indices. By default, a snapshot of all open and started indices in the cluster is created. This behaviour can be changed by specifying the list of indices in the body of the snapshot request like: "indices": "index_1,index_2".
    curl -XPUT 'http://localhost:9200/_snapshot/my_backup_2_to_5/snapshot_1?wait_for_completion=true&pretty' -H 'Content-Type: application/json' -d '{
      "ignore_unavailable": true,
      "include_global_state": false
    }'
    
    NOTE: None of the closed indices are included in the snapshot.
    NOTE: "include_global_state": false is required when doing full upgrades, as we don't want templates to be part of the upgrade process. Make it to true, when doing rollover upgrades.
    Check your snapshot details using:
    curl -XGET 'http://localhost:9200/_snapshot/my_backup_2_to_5/_all?pretty'
    
  11. Shutdown and upgrade all nodes: Stop all Elasticsearch services on all nodes in old cluster. Each node can be upgraded following the same procedure described in upgrade-node. For rollover upgrades, you can backup config, data, logs and plugins folders, and make sure that new ES has files from these folders respectively. For full upgrades, you need to use snapshot-recovery procedure as explained below.
  12. Upgrade any plugins: Elasticsearch plugins must be upgraded when upgrading a node. Use the elasticsearch-plugin script to install the correct version of any plugins that you need.
  13. Copy scripts folder from old elasticsearch cluster to new one.
  14. Start the cluster. If you have dedicated master nodes — nodes with node.master set to true(the default) and node.data set to false —  then it is a good idea to start them first. Wait for them to form a cluster and to elect a master before proceeding with the data nodes. You can check progress by looking at the logs.

    As soon as the minimum number of master-eligible nodes have discovered each other, they will form a cluster and elect a master. From that point on, the _cat/health and _cat/nodes APIs can be used to monitor nodes joining the cluster:
    curl -XGET 'http://localhost:9200/_cat/health?pretty'
    
    curl -XGET 'http://localhost:9200/_cat/nodes'
    
    Use these APIs to check that all nodes have successfully joined the cluster.
  15. Wait for yellow cluster state. As soon as each node has joined the cluster, it will start to recover any primary shards that are stored locally. Initially, the _cat/health request will report a status of red, meaning that not all primary shards have been allocated.

    Once each node has recovered its local shards, the status will become yellow, meaning all primary shards have been recovered, but not all replica shards are allocated. This is to be expected because allocation is still disabled.
  16. Reenable allocation: Delaying the allocation of replicas until all nodes have joined the cluster allows the master to allocate replicas to nodes which already have local shard copies. At this point, with all the nodes in the cluster, it is safe to reenable shard allocation:
    curl -XPUT 'http://localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d '{
      "persistent": {
        "cluster.routing.allocation.enable": "all"
      }
    }'
    
    The cluster will now start allocating replica shards to all data nodes. At this point, it is safe to resume indexing and searching, but your cluster will recover more quickly if you can delay indexing and searching until all shards have recovered.

    You can monitor progress with the _cat/health and _cat/recovery APIs:
    curl -XGET 'http://localhost:9200/_cat/health?pretty'
    
    curl -XGET 'http://localhost:9200/_cat/recovery'
    
    Once the status column in the _cat/health output has reached green, all primary and replica shards have been successfully allocated.
  17. Add updated index templates.
  18. Register the snapshot repository in new cluster:
    curl -XPUT 'http://localhost:9200/_snapshot/my_backup_2_to_5?pretty' -H 'Content-Type: application/json' -d '{
      "type": "fs",
      "settings": {
        "location": "/work/elk/data/backup/my_backup_2_to_5",
        "compress": true
      }
    }'
    
  19. Restore snapshot:
    curl -XPOST 'http://localhost:9200/_snapshot/my_backup_2_to_5/snapshot_1/_restore?wait_for_completion=true' -H 'Content-Type: application/json' -d '{
      "indices": "*",
      "index_settings": {
        "index.number_of_replicas": 0
      },
      "ignore_index_settings": [
        "index.refresh_interval"
      ],
      "ignore_unavailable": true,
      "include_global_state": false
    }'
    
    If the restore fails because any of the indices is already open, the you might need to closeit, then restore snapshot & finally reopen the index:
    curl -XPOST 'http://localhost:9200/.kibana/_close?pretty'
    
    curl -XPOST 'http://localhost:9200/.kibana/_open?pretty'
    
  20. After verifying that data is properly migrated to new cluster, delete the snapshot:
    curl -XDELETE 'http://localhost:9200/_snapshot/my_backup_2_to_5/snapshot_1?pretty'
    

Rolling Upgrades

A rolling upgrade allows the Elasticsearch cluster to be upgraded one node at a time, with no downtime for end users. Running multiple versions of Elasticsearch in the same cluster for any length of time beyond that required for an upgrade is not supported, as shards will not be replicated from the more recent version to the older version.
  1. Disable shard allocation: When you shut down a node, the allocation process will wait for one minute before starting to replicate the shards that were on that node to other nodes in the cluster, causing a lot of wasted I/O. This can be avoided by disabling allocation before shutting down a node:
    curl -XPUT 'http://localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d '{
      "transient": {
        "cluster.routing.allocation.enable": "none"
      }
    }'
    
  2. Stop non-essential indexing and perform a synced flush (optional). You may happily continue indexing during the upgrade. However, shard recovery will be much faster if you temporarily stop non-essential indexing and issue a synced-flush request:
    curl -XPOST 'http://localhost:9200/_flush/synced?pretty'
    
    A synced flush request is a “best effort” operation. It will fail if there are any pending indexing operations, but it is safe to reissue the request multiple times if necessary.
  3. Stop and upgrade a single node. Move config, data, logs, scripts and plugins directories from old elasticsearch to new elasticsearch installation.
  4. Elasticsearch plugins must be upgraded when upgrading a node. Use the elasticsearch-plugin script to install the correct version of any plugins that you need.
  5. Start the now upgraded node and confirm that it joins the cluster by checking the log file or by checking the output of this request:
    curl -XGET 'http://localhost:9200/_cat/nodes?pretty'
    
  6. Once the node has joined the cluster, reenable shard allocation to start using the node:
    curl -XPUT 'http://localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d '{
      "transient": {
        "cluster.routing.allocation.enable": "all"
      }
    }'
    
  7. Wait for the node to recover. You should wait for the cluster to finish shard allocation before upgrading the next node. You can check on progress with the _cat/health request:
    curl -XGET 'http://localhost:9200/_cat/health?pretty'
    
    Wait for the status column to move from yellow to green. Status green means that all primary and replica shards have been allocated.
  8. Shards that have not been sync-flushed may take some time to recover. The recovery status of individual shards can be monitored with the _cat/recovery request:
    curl -XGET 'http://localhost:9200/_cat/recovery?pretty'
    
  9. When the cluster is stable and the node has recovered, repeat the above steps for all remaining nodes.
  10. If you stopped indexing, then it is safe to resume indexing as soon as recovery has completed.

Comments

Popular posts from this blog

MPlayer subtitle font problem in Windows

While playing a video with subtitles in mplayer, I was getting the following problem: New_Face failed. Maybe the font path is wrong. Please supply the text font file (~/.mplayer/subfont.ttf). Solution is as follows: Right click on "My Computer". Select "Properties". Go to "Advanced" tab. Click on "Environment Variables". Delete "HOME" variable from User / System variables.

wget and curl behind corporate proxy throws certificate is not trusted or certificate doesn't have a known issuer

If you try to run wget or curl in Ununtu/Debian behind corporate proxy, you might receive errors like: ERROR: The certificate of 'apertium.projectjj.com' is not trusted. ERROR: The certificate of 'apertium.projectjj.com' doesn't have a known issuer. wget https://apertium.projectjj.com/apt/apertium-packaging.public.gpg ERROR: cannot verify apertium.projectjj.com's certificate, issued by 'emailAddress=proxyteam@corporate.proxy.com,CN=diassl.corporate.proxy.com,OU=Division UK,O=Group name,L=Company,ST=GB,C=UK': Unable to locally verify the issuer's authority. To connect to apertium.projectjj.com insecurely, use `--no-check-certificate'. To solution is to install your company's CA certificate in Ubuntu. In Windows, open the first part of URL in your web browser. e.g. open https://apertium.projectjj.com in web browser. If you inspect the certifcate, you will see the same CN (diassl.corporate.proxy.com), as reported by the error above ...

Kafka performance tuning

Performance Tuning of Kafka is critical when your cluster grow in size. Below are few points to consider to improve Kafka performance: Consumer group ID : Never use same exact consumer group ID for dozens of machines consuming from different topics. All of those commits will end up on the same exact partition of __consumer_offsets , hence the same broker, and this might in turn cause performance problems. Choose the consumer group ID to group_id+topic_name . Skewed : A broker is skewed if its number of partitions is greater that the average of partitions per broker on the given topic. Example: 2 brokers share 4 partitions, if one of them has 3 partitions, it is skewed (3 > 2). Try to make sure that none of the brokers is skewed. Spread : Brokers spread is the percentage of brokers in the cluster that has partitions for the given topic. Example: 3 brokers share a topic that has 2 partitions, so 66% of the brokers have partitions for this topic. Try to achieve 100% broker spread...