Skip to main content

Posts

Logstash throws error while installing plugins

Recent posts

Read and parse CSV containing Key-value pairs using Akka Streams

Let's say we want to read and parse a CSV file containing Key value pairs.
We will be using Alpakka's CSVParser for this.

A snippet of a file (src/main/resources/CountryNicCurrencyKeyValueMap.csv) that shows mapping from Country NIC code to currency code with pipe (|) as field delimiter:
AD|EUR AE|AED AF|AFN AG|XCD AI|XCD AL|ALL AM|AMD AN|ANG AO|AOA AQ|AQD AR|ARS AS|EUR AT|EUR AU|AUD AW|ANG AX|EUR AZ|AZN BA|BAM BB|BBD BD|BDT BE|EUR BF|XOF BG|BGN BH|BHD BI|BIF BJ|XOF BL|EUR BM|BMD BN|BND BO|BOB BR|BRL BS|BSD BT|INR
Following is the code:
import java.io.File import java.nio.charset.StandardCharsets import akka.actor.ActorSystem import akka.stream._ import akka.stream.alpakka.csv.scaladsl.CsvParsing import akka.stream.scaladsl.{FileIO, Flow, Sink} import akka.util.ByteString import scala.collection.immutable import scala.concurrent.{ExecutionContext, _} import scala.concurrent.duration._ implicit val system: ActorSystem = ActorSystem("TestApplication") implicit val ma…

Kafka performance tuning

Performance Tuning of Kafka is critical when your cluster grow in size. Below are few points to consider to improve Kafka performance:
Consumer group ID: Never use same exact consumer group ID for dozens of machines consuming from different topics. All of those commits will end up on the same exact partition of __consumer_offsets, hence the same broker, and this might in turn cause performance problems. Choose the consumer group ID to group_id+topic_name.
Skewed: A broker is skewed if its number of partitions is greater that the average of partitions per broker on the given topic. Example: 2 brokers share 4 partitions, if one of them has 3 partitions, it is skewed (3 > 2). Try to make sure that none of the brokers is skewed.
Spread: Brokers spread is the percentage of brokers in the cluster that has partitions for the given topic. Example: 3 brokers share a topic that has 2 partitions, so 66% of the brokers have partitions for this topic. Try to achieve 100% broker spread.
Leader skew…

Migrating ElasticSearch 2.x to ElasticSearch 5.x

In my previous blog post, I described how to install and configure an ElasticSearch 5.x cluster.
In this blog post, we will look at how to migrate data.
Consult this table to verify that rolling upgrades are supported for your version of Elasticsearch.

Full cluster upgrade (2.x to 5.x)We will have to do full cluster upgrade and restart.

Install Elasticsearch Migration Helper on old cluster. This plugin will help you to check whether you can upgrade directly to the next major version of Elasticsearch, or whether you need to make changes to your data and cluster before doing so.
cd /work/elk/elasticsearch-2.4.3/ curl -O -L https://github.com/elastic/elasticsearch-migration/releases/download/v2.0.4/elasticsearch-migration-2.0.4.zip ./bin/plugin install file:///work/elk/elasticsearch-2.4.3/elasticsearch-migration-2.0.4.zip Start old ElasticSearch:
./bin/elasticsearch & Browse elasticsearch-migration
Click on "Cluster Checkup" > "Run checks now". Check all the suggest…

ElasticSearch max file descriptors too low error

ElasticSearch 5.x requires a minimum of Max file descriptors 65536 and Max virtual memory areas 262144.
It throws an error on start-up if these are set to very low value.
ERROR: bootstrap checks failed max file descriptors [16384] for elasticsearch process is too low, increase to at least [65536] max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
Check current values using:
$ cat /proc/sys/fs/file-max 16384 $ cat /proc/sys/vm/max_map_count 65530 $ ulimit -Hn 16384 $ ulimit -Sn 4096
To fix this, following files need to change/add below settings:
Recommended: Add a new file 99-elastic.conf under /etc/security/limits.d with following settings:
elasticsearch - nofile 800000 elasticsearch - nproc 16384 defaultusername - nofile 800000 defaultusername - nproc 16384 Alternatively, edit /etc/sysctl.conf with following settings:
fs.file-max = 800000 vm.max_map_count=300000

ElasticSearch Curator

Curator is a tool from Elastic to help manage your ElasticSearch cluster.
For certain logs/data, we use one ElasticSearch index per year/month/day and might keep a rolling 7 day window of history.
This means that every day we need to create, backup, and delete some indices.
Curator helps make this process automated and repeatable.

InstallationCurator is written in Python, so will need pip to install it:
pip install elasticsearch-curator curator --config ./curator_cluster_config.yml curator_actions.yml --dry-run
ConfigurationCreate a file curator_cluster_config.yml with following contents:
--- # Remember, leave a key empty if there is no value. None will be a string, not a Python "NoneType" client: hosts: - "es_coordinating_01.singhaiuklimited.com" port: 9200 url_prefix: use_ssl: True # The certificate file is the CA certificate used to sign all ES node certificates. # Use same CA certificate to generate and sign the certificate running curator (specif…

ElasticSearch cluster SSL/TLS configuration

ElasticSearch X-pack documentation a good description on how to secure your ElasticSearch cluster using SSL/TLS.
I used certgen to generate certificates for all the nodes as below:

Create a instances.yml file:
vim /work/elk/elasticsearch-5.6.2/config/x-pack/instances.yml instances: - name: "hostname-00" ip: - "192.126.0.163" - "192.0.2.2" - "198.51.100.1" dns: - "hostname-00" - "hostname-00.mydomain.name" - name: "hostname-01" ip: - "192.126.0.164" dns: - "hostname-01" - "hostname-01.mydomain.name" - name: "hostname-02" - name: "CN=hostname-03,C=GB,ST=Greater London,L=London,O=OrgName,OU=OrgUnit,DC=mydomain,DC=com" dns: - "hostname-03.mydomain.name" - "hostname-03.internal" - "hostname-03" Run below command to generate a CA certificate and private…