Enabling ElasticSearch after the update - CDCR implementation

In this article we will provide a procedure to create a cluster with all ElasticSearch nodes from 2 datacenters.

During this procedure the datacenters will be named DC1 and DC2.

1) Requirements

Elasticsearch will reuse the ports used by kafka (9092 - cross kafka cluster communication and 9095 - application to kafka communication), so there are no need to open new network connections. Still please check if there is network connectivity between datacenters. How to test connectivity:

CODE

#for sure port 9095 is opened correctly, please test between servers in different datacenters if port 9092 is opened

#check if port is opened on a persistence server:
netstat -tulpna | grep 9092
#if it is not opened run the following to open a temporary port:
nc -l 9092

#try to connect from a different machine where the port was opened:
nc -zv IP 9092

on each PERSISTENCE node during data the migration it will be necessary have enough disk space to do the migration and also to keep data both in elasticsearch and cassandra (session and history table) . It is necessary to have relocated the following:

for data - there should be the same size as now it is in cassandra
for backup - the same in partition where backups are kept

CODE

##check what is the space used for session and history
du -sh /opt/veridiumid/cassandra/data/veridium/history-*
du -sh /opt/veridiumid/cassandra/data/veridium/session_finished-*

## there should be free space here:
df -h /opt/veridiumid/elasticsearch/data/

## also the same amount should be here, for backup:
df -h /opt/veridiumid/backup/

## if there is the same disk mounted in both places, on that disk it will be necessary to have doubled the space

2) Starting the cluster in DC1 and DC2

1.1) Disable Kafka and Kafka Streams applications

Please run the following command as root on ALL nodes, starting with webapp nodes

CODE

python3 /etc/veridiumid/scripts/elasticsearch_enable.py --disable --kafka

1.2) Enable ElasticSearch on all persistence nodes in DC1 and DC2.

Connect to all persistence nodes and run the following command as root to enable and start ElasticSearch service on all nodes, in parallel. This command waits all nodes to join the cluster.

CODE

python3 /etc/veridiumid/scripts/elasticsearch_enable.py --enable --elastic

In case of also modifying the password used for the default elastic user the following command should be used instead of the previous one:

CODE

python3 /etc/veridiumid/scripts/elasticsearch_enable.py --enable --elastic --password PASSWORD

Where PASSWORD is the new password you wish to set for the elastic user; it should be the same in both DC'es

3) Configure nodes - Run this only for CDCR environments

3.1) Publish Datacenter related configuration

Connect to all persistence nodes and run the following command as root to publish configuration in Cassandra.

Run command in DC1 on all persistence nodes sequentially:

CODE

bash /opt/veridiumid/migration/bin/elk_ops.sh --cdcr-publish --primary

Run command in DC2 on all persistence nodes sequentially. Also in DC2, stop all Elasticsearch nodes.

CODE

bash /opt/veridiumid/migration/bin/elk_ops.sh --cdcr-publish
service ver_elasticsearch stop

At this point there are 2 distinct Elastic clusters deployed in DC1 and DC2.

Next steps will be to join nodes from DC2 to the initialised Elastic cluster from DC1.

3.2) Run create CDCR command sequentially in each DC

Run this command in DC1 on all persistence nodes sequentially:

CODE

bash /opt/veridiumid/elasticsearch/bin/create_cdcr.sh primary

Run command in DC2 on all persistence nodes sequentially:

BASH

bash /opt/veridiumid/elasticsearch/bin/create_cdcr.sh

3.3) Update replication factor

Update ES replication factor to 3. This command should be run on one persistence server in one DC.

CODE

bash /opt/veridiumid/migration/bin/elk_ops.sh --update-factor

The replication factor is calculated automatically as it follows: total ES nodes >= 6 ? 3 : 1

This command updates the replication factor in Zookeeper, Elastic Index templates and existing es indexes.

The requirement to be executed in each DC is necessarily to have the same replication factor in both Zookeeper clusters

4) Change the Zookeeper configuration to start sending data to ElasticSearch server

Run this command on one server in each datacenter:

CODE

python3 /etc/veridiumid/scripts/elasticsearch_enable.py --enable --zk-change

This need to be executed only if the version is less than 3.5: Please connect to webseacadmin → settings → Advanced → elasticsearch Directory → component-templates.json and modify for httpContextIp type from ip to keyword. It should be like this:

CODE

                            "httpContextIP": {
                                "type": "keyword"
                            },

Please delete indices incorrectly created, if any:

CODE

##check template
eops -x=GET -p=/_component_template/veridium.session-mappings | less 

## check created indices
eops -l

## delete them
eops -x=DELETE -p=/veridium.sessions_history-2023-10
eops -x=DELETE -p=/veridium.sessions-2023-10

5) Migrate session and history data from Cassandra to Elastic

Run this command on ONE persistence node

CODE

nohup bash /etc/veridiumid/scripts/migrateDataCassandraToElastic.sh &

follow nohup.out log and see if the migration has finished.
go to nohup.out log (at begging search number of rows per each month), to see how many rows are in Cassandra - search for “number of rows per each month”
run below command to see the data imported in elasticsearch. This script WILL RUN IN BACKGROUND, so even the putty session finishes, it will continue to run.
CODE
```
bash /opt/veridiumid/elasticsearch/bin/elasticsearch_ops.sh -l
```

4 million sessions are migrated in around half an hour; the equivalent history is migrated in around one hour.
from space point of view:
- in elastic, sessions will take around 2/3 of the cassandra necessary for sessions space (this is due to the fact that one index is not on all servers, as it is in cassandra). in case of single node implementation, it has the same size.
- history is taking 1/3 of the current space, due to better compression of data and the fact that one index in not on all servers. In single node implementation is 2/3.

In case of timeout error while reading data from the ‘session_finished’ table, please run the following command:

/opt/veridiumid/cassandra/bin/nodetool garbagecollect veridium session_finished

6) after all the data is migrated, disable writing session and history to Cassandra

Run this command as root on one persistence node in both DCs:

BASH

python3 /etc/veridiumid/scripts/elasticsearch_enable.py --cassandra-write --disable

7) Enable Backups for ElasticSearch

Select one node in each existing datacenter and uncomment the following line from root user’s crontab (please run crontab -e in order to edit crontab)

BASH

#15 0 * * * bash /opt/veridiumid/elasticsearch/bin/elasticsearch_backup.sh /opt/veridiumid/elasticsearch/bin/elasticsearch_backup.conf

This will enable daily backups for ElasticSearch done at 00:15.

8) Troubleshooting

In case that there are problems during migration, the migration script is able to restart from the last chunk of that that failed.
In case that there are problems with some just specific chunk of data (for example, not being able to read from Cassandra) the process generates a control file with data that was not read. Restarting the process will resend only the missing data to elasticsearch.
In case of there are errors reading from Cassandra, it is necessary to run compaction and repair on that specific table.
Files that might contains unsent chunks of data are:
- sessions-failed-pages.json
- sessions-import-page-failures.json
If it necessary to run from beginning the migration, please remove indices from elasticsearch, delete files sessions-last-page-state.dat, sessions-failed-pages.json, sessions-import-page-failures.json and rerun the job.

CODE

#run on one cassandra node
/opt/veridiumid/cassandra/bin/nodetool repair -local veridium session_finished
/opt/veridiumid/cassandra/bin/nodetool repair -local veridium history

#run on each cassandra node
/opt/veridiumid/cassandra/bin/nodetool compact veridium session_finished
/opt/veridiumid/cassandra/bin/nodetool compact veridium history

## to monitor completion of repair, use below command
/opt/veridiumid/cassandra/bin/nodetool compactionstats

## how to see indices:
check_services

##eops is alias for bash /opt/veridiumid/elasticsearch/bin/elasticsearch_ops.sh
## how to delete an indice
eops -x=DELETE -p=/veridium.sessions_history-2023-04
eops -x=DELETE -p=/veridium.sessions-2023-04

## how to flush the data from memory to disk and update counters:
eops -x=POST -p=/veridium.*/_flush

/opt/veridiumid/migration/bin/elk_ops.sh --update-settings