Skip to main content

ElasticSearch Curator

Curator is a tool from Elastic to help manage your ElasticSearch cluster.
For certain logs/data, we use one ElasticSearch index per year/month/day and might keep a rolling 7 day window of history.
This means that every day we need to create, backup, and delete some indices.
Curator helps make this process automated and repeatable.

Installation

Curator is written in Python, so will need pip to install it:
pip install elasticsearch-curator
curator --config ./curator_cluster_config.yml curator_actions.yml --dry-run

Configuration

Create a file curator_cluster_config.yml with following contents:
---
# Remember, leave a key empty if there is no value.  None will be a string, not a Python "NoneType"
client:
  hosts:
    - "es_coordinating_01.singhaiuklimited.com"
  port: 9200
  url_prefix:
  use_ssl: True
  # The certificate file is the CA certificate used to sign all ES node certificates.
  # Use same CA certificate to generate and sign the certificate running curator (specified in properties client_cert and client_key)
  certificate: '/work/elk/elasticsearch-6.3.2/config/x-pack/certificate-bundle/ca/ca.crt'
  client_cert: '/work/elk/elasticsearch-6.3.2/config/x-pack/certificate-bundle/myhostname/myhostname.crt'
  client_key: '/work/elk/elasticsearch-6.3.2/config/x-pack/certificate-bundle/myhostname/myhostname.key'
  ssl_no_validate: False
  # Username password to connect to ES using basic auth
  http_auth: "username:password"
  timeout: 30
  master_only: False
 
logging:
  loglevel: INFO
  logfile:
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

A non SSL cluster configuration will look much simpler like:
---
# Remember, leave a key empty if there is no value.  None will be a string, not a Python "NoneType"
client:
  hosts:
    - "es_coordinating_01.singhaiuklimited.com"
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  ssl_no_validate: False
  http_auth: "username:password"
 
logging:
  loglevel: WARNING
  logfile:
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

Now, we need to define an action. i.e. what will curator do. There are many actions to choose from. Check the documentation for more information.

Removing time-series indices

ElasticSearch is a great choice for storing time-series data for a number of reasons. ElasticSearch indices teamplates automatically create indices and aliases allow to seamlessly search across many indices.
ElasticSearch doesn’t provide automatic removal of data.

As example we will delete all .watcher-history and .monitoring-*indices that are older than 3 days. We will use Delete Indices as the action.
Our indices are named with YYYY.MM.DD duffix, so we have to tell Curator about our format and what indices to remove.
Below is the sample action file delete3DaysOldUselessIndices.yml, which will delete the watcher indices which are older than 3 days:
---
# Remember, leave a key empty if there is no value.  None will be a string, not a Python "NoneType"
actions:
  1:
    action: delete_indices
    description: >-
      "Delete indices older than 3 days (based on index name), for .watcher-history-
      or .monitoring-es-6- or .monitoring-kibana-6- or .monitoring-logstash-6-
      prefixed indices. Ignore the error if the filter does not result in an
      actionable list of indices (ignore_empty_list) and exit cleanly."
    options:
      timeout_override: 300
      continue_if_exception: True
      ignore_empty_list: True
      disable_action: False
    filters:
      - filtertype: pattern
        kind: regex
        value: '^\.(monitoring-es-6-|monitoring-kibana-6-|monitoring-logstash-6-|watcher-history-6-).*$'
        exclude:
      - filtertype: age
        source: name
        direction: older
        timestring: '%Y.%m.%d'
        unit: days
        unit_count: 3

To run this action, simple use the command:
curator --config ./curator_cluster_config.yml ./delete3DaysOldUselessIndices.yml --dry-run
2017-10-04 12:15:38,544 INFO      Preparing Action ID: 1, "delete_indices"
2017-10-04 12:15:38,900 INFO      Trying Action ID: 1, "delete_indices": "Delete indices older than 1 day (based on index name), for .watcher-history- or .monitoring-es-6- or .monitoring-kibana-6- or .monitoring-logstash-6- prefixed indices. Ignore the error if the filter does not result in an actionable list of indices (ignore_empty_list) and exit cleanly."
2017-10-04 12:15:39,351 INFO      DRY-RUN MODE.  No changes will be made.
2017-10-04 12:15:39,351 INFO      (CLOSED) indices may be shown that may not be acted on by action "delete_indices".
2017-10-04 12:15:39,351 INFO      Action ID: 1, "delete_indices" completed.
2017-10-04 12:15:39,352 INFO      Job completed.
The --dry-run mode will not actually delete the index. It can be used to test the output of the action.

Managing snapshots

Another task that Curator helps us automate is using Elasticsearch snapshots.
---
# Remember, leave a key empty if there is no value.  None will be a string, not a Python "NoneType"
actions:
  1:
    action: snapshot
    description: >-
      Snapshot selected indices to 'repository' with the snapshot name or name
      pattern in 'name'.  Use all other options as assigned
    options:
      repository: <repository name="">
      # Leaving name blank will result in the default 'curator-%Y%m%d%H%M%S'
      name:
      wait_for_completion: True
      max_wait: 3600
      wait_interval: 10
    filters:
      - filtertype: ...
This will create a snapshot of all your indices with a name such as curator-20170928193030, which works fine for our use case (of course you can customize the name and date format).
You’ll want to remove snapshots after a certain time as well otherwise snapshot performance will reduce dramatically as the number of snapshots will grow.
---
# Remember, leave a key empty if there is no value.  None will be a string, not a Python "NoneType"
actions:
  1:
    action: delete_snapshots
    description: "Delete selected snapshots from 'repository'"
    options:
      repository: <repository name="">
      retry_interval: 120
      retry_count: 3
    filters:
    - filtertype: ...

Changing Index Settings

For indices that aren’t being actively written to, you can make them read only.
With date-based indices, only the current index is being written to, so it is safe to make older indices read-only.
---
# Remember, leave a key empty if there is no value.  None will be a string, not a Python "NoneType"
actions:
  1:
    action: index_settings
    description: >-
      Set Monitoring and watcher indices older than 1 day to be read only (block writes)
    options:
      disable_action: False
      index_settings:
      index:
      blocks:
      write: True
      ignore_unavailable: False
      preserve_existing: False
    filters:
    - filtertype: pattern
      kind: regex
      value: '^\.(monitoring-es-6-|monitoring-kibana-6-|monitoring-logstash-6-|watcher-history-6-).*$'
      exclude:
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 1

To run this action, simple use the command:
curator --config ./curator_cluster_config.yml readOnly1DayOldUselessIndices.yml --dry-run
2017-10-03 16:39:10,237 INFO      Preparing Action ID: 1, "index_settings"
2017-10-03 16:39:10,602 INFO      Trying Action ID: 1, "index_settings": Set Monitoring ES indices older than 1 day to be read only (block writes)
2017-10-03 16:39:11,075 INFO      DRY-RUN MODE.  No changes will be made.
2017-10-03 16:39:11,075 INFO      (CLOSED) indices may be shown that may not be acted on by action "indexsettings".
2017-10-03 16:39:11,075 INFO      DRY-RUN: indexsettings: .monitoring-es-6-2017.10.02 with arguments: {'index': {'blocks': {'write': True}}}
2017-10-03 16:39:11,075 INFO      DRY-RUN: indexsettings: .monitoring-kibana-6-2017.10.02 with arguments: {'index': {'blocks': {'write': True}}}
2017-10-03 16:39:11,075 INFO      Action ID: 1, "index_settings" completed.
2017-10-03 16:39:11,075 INFO      Job completed.

Shrinking static indices

For indices that aren’t being actively written to, you can shrink them to reduce and merge the shards/segments that represent the index’s data on disk. Shrinking an index is a similar concept to defragmenting your hard drive. Indices can only be shrunk if they satisfy the following requirements:
  • The source index must be marked as read-only
  • A (primary or replica) copy of every shard in the index must be relocated to the same node
  • The cluster must have health green
  • The target index must not exist
  • The source index must have more primary shards than the target index.
  • The number of primary shards in the target index must be a factor of the number of primary shards in the source index. The source index must have more primary shards than the target index.
  • The index must not contain more than 2,147,483,519 documents in total across all shards that will be shrunk into a single shard on the target index as this is the maximum number of docs that can fit into a single shard.
  • The node handling the shrink process must have sufficient free disk space to accommodate a second copy of the existing index.
  • When an index is being written to, the segment merge process happens automatically so you don’t want to explicitly call shrink on an active index. With date-based indices, only the current index is being written to, so it is safe to shrink older indices.
---
# Remember, leave a key empty if there is no value.  None will be a string, not a Python "NoneType"
actions:
  1:
    action: shrink
    description: >-
      Shrink monitoring and watcher-history indices older than 1 day on the node with the most available space.
      Delete source index after successful shrink, then reroute the shrunk
      index with the provided parameters.
    options:
      ignore_empty_list: True
      
      # The shrinking will take place on the node identified by shrink_node,
      # unless DETERMINISTIC is specified, in which case Curator will evaluate
      # all of the nodes to determine which one has the most free space.
      # If multiple indices are identified for shrinking by the filter block,
      # and DETERMINISTIC is specified, the node selection process will be
      # repeated for each successive index, preventing all of the space being
      # consumed on a single node.
      shrink_node: DETERMINISTIC
      
      node_filters:
        # If you have a small cluster with only master/data nodes, you must set permit_masters to True in order to select one of those nodes as a potential shrink_node.
        permit_masters: False
        # exclude_nodes: ['some_named_node']
      
      # The resulting index will have number_of_shards primary shards, and number_of_replicas replica shards
      number_of_shards: 1
      number_of_replicas: 1
      
      # Name of target index will be shrink_prefix + the source index name + shrink_suffix
      shrink_prefix:
      shrink_suffix: '-shrink'
      
      # By default, Curator will delete the source index after a successful shrink.
      # This can be disabled by setting delete_after to False.
      # If the source index, is not deleted after a successful shrink, Curator will
      # remove the read-only setting and the shard allocation routing applied to the
      # source index to put it on the shrink node.
      # Curator will wait for the shards to stop rerouting before continuing.
      delete_after: True
      
      # The post_allocation option applies to the target index after the shrink is complete.
      # If set, this shard allocation routing will be applied (after a successful shrink) and
      # Curator will wait for all shards to stop rerouting before continuing.
      # post_allocation:
        # allocation_type: include
        # Following will allocate shards to nodes that have "node_tag" attribute with value "cold"
        # key: node_tag
        # value: cold
      
      wait_for_active_shards: 1
      
      # The only extra_settings which are acceptable are settings and aliases.
      # Please note that in the example above, while best_compression is being
      # applied to the new index, it will not take effect until new writes are
      # made to the index, such as when force-merging the shard to a single segment.
      extra_settings:
        settings:
          index.codec: best_compression
      
      wait_for_completion: True
      wait_interval: 9
      max_wait: -1
    filters:
      - filtertype: pattern
        kind: regex
        value: '^\.(monitoring-es-6-|monitoring-kibana-6-|monitoring-logstash-6-|watcher-history-6-).*$'
        exclude:
      - filtertype: age
        source: name
        direction: older
        timestring: '%Y.%m.%d'
        unit: days
        unit_count: 1
  2:
    action: index_settings
    description: >-
      Set monitoring and watcher-history indices older than 1 day to be read only (block writes)
    options:
      disable_action: False
      index_settings:
        index:
          blocks:
            write: True
      ignore_unavailable: False
      preserve_existing: False
    filters:
      - filtertype: pattern
        kind: regex
        value: '^\.(monitoring-es-6-|monitoring-kibana-6-|monitoring-logstash-6-|watcher-history-6-).*$'
      - filtertype: pattern
        kind: suffix
        value: -shrink
  3:
    action: alias
    description: "Add/Remove selected indices to or from the .monitoring-es alias"
    options:
      name: monitoring-es
    add:
      filters:
        - filtertype: pattern
          kind: regex
          value: '^\.monitoring-es.*$'
        - filtertype: pattern
          kind: suffix
          value: -shrink
  4:
    action: alias
    description: "Add/Remove selected indices to or from the .monitoring-kibana alias"
    options:
      name: monitoring-kibana
    add:
      filters:
        - filtertype: pattern
          kind: regex
          value: '^\.monitoring-kibana.*$'
        - filtertype: pattern
          kind: suffix
          value: -shrink
  5:
    action: alias
    description: "Add/Remove selected indices to or from the .monitoring-logstash alias"
    options:
      name: monitoring-logstash
    add:
      filters:
        - filtertype: pattern
          kind: regex
          value: '^\.monitoring-logstash.*$'
        - filtertype: pattern
          kind: suffix
          value: -shrink
  6:
    action: alias
    description: "Add/Remove selected indices to or from the .watcher-history alias"
    options:
      name: watcher-history
    add:
      filters:
        - filtertype: pattern
          kind: regex
          value: '^\.watcher-history.*$'
        - filtertype: pattern
          kind: suffix
          value: -shrink

Scheduling Curator Jobs

If you want to schedule it in a cron, you can do so using crontab -e
0 0 6 * * * root curator --config ./curator_cluster_config.yml ./delete3DaysOldUselessIndices.yml
The above configuration will cleanup the indices older than 1 day everyday at 6 AM.

Comments

Popular posts from this blog

MPlayer subtitle font problem in Windows

While playing a video with subtitles in mplayer, I was getting the following problem: New_Face failed. Maybe the font path is wrong. Please supply the text font file (~/.mplayer/subfont.ttf). Solution is as follows: Right click on "My Computer". Select "Properties". Go to "Advanced" tab. Click on "Environment Variables". Delete "HOME" variable from User / System variables.

wget and curl behind corporate proxy throws certificate is not trusted or certificate doesn't have a known issuer

If you try to run wget or curl in Ununtu/Debian behind corporate proxy, you might receive errors like: ERROR: The certificate of 'apertium.projectjj.com' is not trusted. ERROR: The certificate of 'apertium.projectjj.com' doesn't have a known issuer. wget https://apertium.projectjj.com/apt/apertium-packaging.public.gpg ERROR: cannot verify apertium.projectjj.com's certificate, issued by 'emailAddress=proxyteam@corporate.proxy.com,CN=diassl.corporate.proxy.com,OU=Division UK,O=Group name,L=Company,ST=GB,C=UK': Unable to locally verify the issuer's authority. To connect to apertium.projectjj.com insecurely, use `--no-check-certificate'. To solution is to install your company's CA certificate in Ubuntu. In Windows, open the first part of URL in your web browser. e.g. open https://apertium.projectjj.com in web browser. If you inspect the certifcate, you will see the same CN (diassl.corporate.proxy.com), as reported by the error above ...

Kafka performance tuning

Performance Tuning of Kafka is critical when your cluster grow in size. Below are few points to consider to improve Kafka performance: Consumer group ID : Never use same exact consumer group ID for dozens of machines consuming from different topics. All of those commits will end up on the same exact partition of __consumer_offsets , hence the same broker, and this might in turn cause performance problems. Choose the consumer group ID to group_id+topic_name . Skewed : A broker is skewed if its number of partitions is greater that the average of partitions per broker on the given topic. Example: 2 brokers share 4 partitions, if one of them has 3 partitions, it is skewed (3 > 2). Try to make sure that none of the brokers is skewed. Spread : Brokers spread is the percentage of brokers in the cluster that has partitions for the given topic. Example: 3 brokers share a topic that has 2 partitions, so 66% of the brokers have partitions for this topic. Try to achieve 100% broker spread...