Skip to main content

Migrating ElasticSearch 2.x to ElasticSearch 5.x

In my previous blog post, I described how to install and configure an ElasticSearch 5.x cluster.
In this blog post, we will look at how to migrate data.
Consult this table to verify that rolling upgrades are supported for your version of Elasticsearch.

Full cluster upgrade (2.x to 5.x)

We will have to do full cluster upgrade and restart.

  1. Install Elasticsearch Migration Helper on old cluster. This plugin will help you to check whether you can upgrade directly to the next major version of Elasticsearch, or whether you need to make changes to your data and cluster before doing so.
    cd /work/elk/elasticsearch-2.4.3/
    curl -O -L https://github.com/elastic/elasticsearch-migration/releases/download/v2.0.4/elasticsearch-migration-2.0.4.zip
    ./bin/plugin install file:///work/elk/elasticsearch-2.4.3/elasticsearch-migration-2.0.4.zip
    
  2. Start old ElasticSearch:
    ./bin/elasticsearch &
    
  3. Browse elasticsearch-migration
  4. Click on "Cluster Checkup" > "Run checks now". Check all the suggestions.
  5. "Reindex Helper": Elasticsearch is able to read indices created in the previous major version only. For instance, Elasticsearch 5.x can use indices created in Elasticsearch 2.x, but not those created in Elasticsearch 1.x or before. Hence, all indices created before v2.0.0 need to be reindexed before they can be used in Elasticsearch 5.x. The reindex helper upgrades old indices at the click of a button. Works in Elasticsearch 2.3.x and 2.4.x only.
  6. Use "Deprecation Logging" to log a message whenever deprecated functionality is used. This tool enables or disables deprecation logging on your cluster. Deprecation logging is available in Elasticsearch 2.4.x only.
  7. Disable shard allocation:
    curl -XPUT 'http://localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d '{
      "persistent": {
        "cluster.routing.allocation.enable": "none"
      }
    }'
    
  8. Perform a synced flush: Shard recovery will be much faster if you stop indexing and issue a synced-flush request. A synced flush request is a "best effort" operation. It will fail if there are any pending indexing operations, but it is safe to reissue the request multiple times if necessary.
    curl -XPOST 'http://localhost:9200/_flush/synced?pretty'
    
  9. Register a repository to save your snapshot: Check the path that you configures as "path.repo" in your elasticsearc.yml. In my case it was: "/work/elk/data/backup/"
    curl -XPUT 'http://localhost:9200/_snapshot/my_backup_2_to_5?pretty' -H 'Content-Type: application/json' -d '{
      "type": "fs",
      "settings": {
        "location": "/work/elk/data/backup/my_backup_2_to_5",
        "compress": true
      }
    }'
    
    When a repository is registered, it’s immediately verified on all master and data nodes to make sure that it is functional on all nodes currently present in the cluster.
    You can manually verify using:
    curl -XPOST 'http://localhost:9200/_snapshot/my_backup_2_to_5/_verify?pretty'
    
  10. Take the snapshot of indices. By default, a snapshot of all open and started indices in the cluster is created. This behaviour can be changed by specifying the list of indices in the body of the snapshot request like: "indices": "index_1,index_2".
    curl -XPUT 'http://localhost:9200/_snapshot/my_backup_2_to_5/snapshot_1?wait_for_completion=true&pretty' -H 'Content-Type: application/json' -d '{
      "ignore_unavailable": true,
      "include_global_state": false
    }'
    
    NOTE: None of the closed indices are included in the snapshot.
    NOTE: "include_global_state": false is required when doing full upgrades, as we don't want templates to be part of the upgrade process. Make it to true, when doing rollover upgrades.
    Check your snapshot details using:
    curl -XGET 'http://localhost:9200/_snapshot/my_backup_2_to_5/_all?pretty'
    
  11. Shutdown and upgrade all nodes: Stop all Elasticsearch services on all nodes in old cluster. Each node can be upgraded following the same procedure described in upgrade-node. For rollover upgrades, you can backup config, data, logs and plugins folders, and make sure that new ES has files from these folders respectively. For full upgrades, you need to use snapshot-recovery procedure as explained below.
  12. Upgrade any plugins: Elasticsearch plugins must be upgraded when upgrading a node. Use the elasticsearch-plugin script to install the correct version of any plugins that you need.
  13. Copy scripts folder from old elasticsearch cluster to new one.
  14. Start the cluster. If you have dedicated master nodes — nodes with node.master set to true(the default) and node.data set to false —  then it is a good idea to start them first. Wait for them to form a cluster and to elect a master before proceeding with the data nodes. You can check progress by looking at the logs.

    As soon as the minimum number of master-eligible nodes have discovered each other, they will form a cluster and elect a master. From that point on, the _cat/health and _cat/nodes APIs can be used to monitor nodes joining the cluster:
    curl -XGET 'http://localhost:9200/_cat/health?pretty'
    
    curl -XGET 'http://localhost:9200/_cat/nodes'
    
    Use these APIs to check that all nodes have successfully joined the cluster.
  15. Wait for yellow cluster state. As soon as each node has joined the cluster, it will start to recover any primary shards that are stored locally. Initially, the _cat/health request will report a status of red, meaning that not all primary shards have been allocated.

    Once each node has recovered its local shards, the status will become yellow, meaning all primary shards have been recovered, but not all replica shards are allocated. This is to be expected because allocation is still disabled.
  16. Reenable allocation: Delaying the allocation of replicas until all nodes have joined the cluster allows the master to allocate replicas to nodes which already have local shard copies. At this point, with all the nodes in the cluster, it is safe to reenable shard allocation:
    curl -XPUT 'http://localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d '{
      "persistent": {
        "cluster.routing.allocation.enable": "all"
      }
    }'
    
    The cluster will now start allocating replica shards to all data nodes. At this point, it is safe to resume indexing and searching, but your cluster will recover more quickly if you can delay indexing and searching until all shards have recovered.

    You can monitor progress with the _cat/health and _cat/recovery APIs:
    curl -XGET 'http://localhost:9200/_cat/health?pretty'
    
    curl -XGET 'http://localhost:9200/_cat/recovery'
    
    Once the status column in the _cat/health output has reached green, all primary and replica shards have been successfully allocated.
  17. Add updated index templates.
  18. Register the snapshot repository in new cluster:
    curl -XPUT 'http://localhost:9200/_snapshot/my_backup_2_to_5?pretty' -H 'Content-Type: application/json' -d '{
      "type": "fs",
      "settings": {
        "location": "/work/elk/data/backup/my_backup_2_to_5",
        "compress": true
      }
    }'
    
  19. Restore snapshot:
    curl -XPOST 'http://localhost:9200/_snapshot/my_backup_2_to_5/snapshot_1/_restore?wait_for_completion=true' -H 'Content-Type: application/json' -d '{
      "indices": "*",
      "index_settings": {
        "index.number_of_replicas": 0
      },
      "ignore_index_settings": [
        "index.refresh_interval"
      ],
      "ignore_unavailable": true,
      "include_global_state": false
    }'
    
    If the restore fails because any of the indices is already open, the you might need to closeit, then restore snapshot & finally reopen the index:
    curl -XPOST 'http://localhost:9200/.kibana/_close?pretty'
    
    curl -XPOST 'http://localhost:9200/.kibana/_open?pretty'
    
  20. After verifying that data is properly migrated to new cluster, delete the snapshot:
    curl -XDELETE 'http://localhost:9200/_snapshot/my_backup_2_to_5/snapshot_1?pretty'
    

Rolling Upgrades

A rolling upgrade allows the Elasticsearch cluster to be upgraded one node at a time, with no downtime for end users. Running multiple versions of Elasticsearch in the same cluster for any length of time beyond that required for an upgrade is not supported, as shards will not be replicated from the more recent version to the older version.
  1. Disable shard allocation: When you shut down a node, the allocation process will wait for one minute before starting to replicate the shards that were on that node to other nodes in the cluster, causing a lot of wasted I/O. This can be avoided by disabling allocation before shutting down a node:
    curl -XPUT 'http://localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d '{
      "transient": {
        "cluster.routing.allocation.enable": "none"
      }
    }'
    
  2. Stop non-essential indexing and perform a synced flush (optional). You may happily continue indexing during the upgrade. However, shard recovery will be much faster if you temporarily stop non-essential indexing and issue a synced-flush request:
    curl -XPOST 'http://localhost:9200/_flush/synced?pretty'
    
    A synced flush request is a “best effort” operation. It will fail if there are any pending indexing operations, but it is safe to reissue the request multiple times if necessary.
  3. Stop and upgrade a single node. Move config, data, logs, scripts and plugins directories from old elasticsearch to new elasticsearch installation.
  4. Elasticsearch plugins must be upgraded when upgrading a node. Use the elasticsearch-plugin script to install the correct version of any plugins that you need.
  5. Start the now upgraded node and confirm that it joins the cluster by checking the log file or by checking the output of this request:
    curl -XGET 'http://localhost:9200/_cat/nodes?pretty'
    
  6. Once the node has joined the cluster, reenable shard allocation to start using the node:
    curl -XPUT 'http://localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d '{
      "transient": {
        "cluster.routing.allocation.enable": "all"
      }
    }'
    
  7. Wait for the node to recover. You should wait for the cluster to finish shard allocation before upgrading the next node. You can check on progress with the _cat/health request:
    curl -XGET 'http://localhost:9200/_cat/health?pretty'
    
    Wait for the status column to move from yellow to green. Status green means that all primary and replica shards have been allocated.
  8. Shards that have not been sync-flushed may take some time to recover. The recovery status of individual shards can be monitored with the _cat/recovery request:
    curl -XGET 'http://localhost:9200/_cat/recovery?pretty'
    
  9. When the cluster is stable and the node has recovered, repeat the above steps for all remaining nodes.
  10. If you stopped indexing, then it is safe to resume indexing as soon as recovery has completed.

Comments

Popular posts from this blog

Procedure for name and date of birth change (Pune)

For change of name, the form (scribd) is available free of cost at Government Book Depot (Shaskiya Granthagar), which is located near Collector’s office, next to Saint Helena's School. The postal address is:
Government Photozinco Press Premises and Book Depot,
5, Photozinco Press Road, Pune, MH, 411001.
Wikimapia link

Charges for name or date of birth change, in the Maharashtra Government Gazette:
INR 120.00 per insertion (for two copies of the Gazette)
For backward class applicants: INR 60.00
Charges for extra copy of the Gazette: INR 15.00 per copy (two copies are enough, so you may not want to pay extra for extra copies).

Backward class applicants are required to submit a xerox of caste certificate of old name as issued by the Collector of the District concerned.

Once the form is duly submitted, it normally takes 10 to 15 days for publication of advertisement in the Maharashtra Government Gazette. The Gazette copy reaches to the address filled in the form within next 7 to 15 day…

ElasticSearch max file descriptors too low error

ElasticSearch 5.x requires a minimum of Max file descriptors 65536 and Max virtual memory areas 262144.
It throws an error on start-up if these are set to very low value.
ERROR: bootstrap checks failed max file descriptors [16384] for elasticsearch process is too low, increase to at least [65536] max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
Check current values using:
$ cat /proc/sys/fs/file-max 16384 $ cat /proc/sys/vm/max_map_count 65530 $ ulimit -Hn 16384 $ ulimit -Sn 4096
To fix this, following files need to change/add below settings:
Recommended: Add a new file 99-elastic.conf under /etc/security/limits.d with following settings:
elasticsearch - nofile 800000 elasticsearch - nproc 16384 defaultusername - nofile 800000 defaultusername - nproc 16384 Alternatively, edit /etc/sysctl.conf with following settings:
fs.file-max = 800000 vm.max_map_count=300000

MPlayer subtitle font problem in Windows

While playing a video with subtitles in mplayer, I was getting the following problem:
New_Face failed. Maybe the font path is wrong. Please supply the text font file (~/.mplayer/subfont.ttf).
Solution is as follows:
Right click on "My Computer".Select "Properties".Go to "Advanced" tab.Click on "Environment Variables".Delete "HOME" variable from User / System variables.