ElasticSearch Migration Guide

In this version, account, devices and identities data has been indexed in ElasticSearch together with their list of history events, for search purposes.

First of all, the esSettingsUpdate task has to be executed to create the needed indices and update mappings

CODE

java -DZKProperties=${PROJECT_DIR}/configs/zkConfigFiles/zookeeper.properties \
    -Dspring.profiles.active=cassandra \
    -Dlog4j.configuration=file:${PROJECT_DIR}/projects/standalonerunner/src/main/resources/log4j2.xml \
    -jar persistence-migration-fat.jar esSettingsUpdate

In order to migrate the data existing in Cassandra, the following commands can be used:

Devices

The first index that needs to be handled is the devices index

First of all, devices and their history need to be exported into multiple files with the following command:

CODE

java -DZKProperties=${PROJECT_DIR}/configs/zkConfigFiles/zookeeper.properties \
    -Dspring.profiles.active=cassandra \
    -Dlog4j.configuration=file:${PROJECT_DIR}/projects/standalonerunner/src/main/resources/log4j2.xml \
    -jar persistence-migration-fat.jar esDevicesMigration --export [--outDir=path-to-outdir] [--partitions=partitions] [--exportPageSize=exportPageSize]

The optional outDir parameter refer to the directory where the exported files will be created (default is “devices”), the partitions parameter refer to the number of files the devices are partitioned into (default 1000) and exportPageSize refer to the page size used to iterate over the tables in Cassandra (default 1000)

If there are batches of data that fail to be exported, they will be saved in a file on disk. In order to retry to export them, the following command can be used

CODE

java -DZKProperties=${PROJECT_DIR}/configs/zkConfigFiles/zookeeper.properties \
    -Dspring.profiles.active=cassandra \
    -Dlog4j.configuration=file:${PROJECT_DIR}/projects/standalonerunner/src/main/resources/log4j2.xml \
    -jar persistence-migration-fat.jar esDevicesMigration --retry-devices-export [--outDir=path-to-outdir] [--partitions=partitions] [--exportPageSize=exportPageSize]

In order to import the data exported in files in Elastic, the following command can be used:

CODE

java -DZKProperties=${PROJECT_DIR}/configs/zkConfigFiles/zookeeper.properties \
    -Dspring.profiles.active=cassandra \
    -Dlog4j.configuration=file:${PROJECT_DIR}/projects/standalonerunner/src/main/resources/log4j2.xml \
    -jar persistence-migration-fat.jar esDevicesMigration --import [--inDir=path-to-indir]

Secondly, the devices_history index needs to be handled. This is done using the following command:

CODE

java -DZKProperties=${PROJECT_DIR}/configs/zkConfigFiles/zookeeper.properties \
    -Dspring.profiles.active=cassandra \
    -Dlog4j.configuration=file:${PROJECT_DIR}/projects/standalonerunner/src/main/resources/log4j2.xml \
    -jar persistence-migration-fat.jar esDevicesHistoryRebuild [--keyspace=keyspace] [--pageSize=pageSize] [--requestTimeoutMin=requestTimeoutMin] [connectionTimeoutMin=connectionTimeoutMin]

The keyspace parameter allows to specify a different keyspace (by default the one in zk is used) and the pageSize describes the page size used when iterating over the first index in Elastic (default 1000). The requestTimeoutMin and connectionTimeoutMin are the request and connection timeouts in minutes for ElasticSearch (default 5 and 1).

Accounts

The first index that needs to be handled is the accounts index

First of all, accounts adnd their history need to be exported into multiple files with the following command:

CODE

java -DZKProperties=${PROJECT_DIR}/configs/zkConfigFiles/zookeeper.properties \
    -Dspring.profiles.active=cassandra \
    -Dlog4j.configuration=file:${PROJECT_DIR}/projects/standalonerunner/src/main/resources/log4j2.xml \
    -jar persistence-migration-fat.jar esAccountsMigration --export [--outDir=path-to-outdir] [--partitions=partitions] [--exportPageSize=exportPageSize]

The optional outDir parameter refer to the directory where the exported files will be created (default is “accounts”), the partitions parameter refer to the number of files the accounts are partitioned into (default 1000) and exportPageSize refer to the page size used to iterate over the tables in Cassandra (default 1000)

If there are batches of data that fail to be exported, they will be saved in a file on disk. In order to retry to export them, the following command can be used

CODE

java -DZKProperties=${PROJECT_DIR}/configs/zkConfigFiles/zookeeper.properties \
    -Dspring.profiles.active=cassandra \
    -Dlog4j.configuration=file:${PROJECT_DIR}/projects/standalonerunner/src/main/resources/log4j2.xml \
    -jar persistence-migration-fat.jar esAccountsMigration --retry-accounts-export [--outDir=path-to-outdir] [--partitions=partitions] [--exportPageSize=exportPageSize]

In order to import the data exported in files in Elastic, the following command can be used:

CODE

java -DZKProperties=${PROJECT_DIR}/configs/zkConfigFiles/zookeeper.properties \
    -Dspring.profiles.active=cassandra \
    -Dlog4j.configuration=file:${PROJECT_DIR}/projects/standalonerunner/src/main/resources/log4j2.xml \
    -jar persistence-migration-fat.jar esAccountsMigration --import [--inDir=path-to-indir]

Secondly, the accounts_history index needs to be handled. This is done using the following command:

CODE

java -DZKProperties=${PROJECT_DIR}/configs/zkConfigFiles/zookeeper.properties \
    -Dspring.profiles.active=cassandra \
    -Dlog4j.configuration=file:${PROJECT_DIR}/projects/standalonerunner/src/main/resources/log4j2.xml \
    -jar persistence-migration-fat.jar esAccountsHistoryRebuild [--keyspace=keyspace] [--pageSize=pageSize] [--requestTimeoutMin=requestTimeoutMin] [connectionTimeoutMin=connectionTimeoutMin]

The keyspace parameter allows to specify a different keyspace (by default the one in zk is used) and the pageSize describes the page size used when iterating over the first index in Elastic (default 1000). The requestTimeoutMin and connectionTimeoutMin are the request and connection timeouts in minutes for ElasticSearch (default 5 and 1).

Identities

Identities are migrated using the following command:

CODE

java -DZKProperties=${PROJECT_DIR}/configs/zkConfigFiles/zookeeper.properties \
    -Dspring.profiles.active=cassandra \
    -Dlog4j.configuration=file:${PROJECT_DIR}/projects/standalonerunner/src/main/resources/log4j2.xml \
    -jar persistence-migration-fat.jar esProfilesMigration [--exportPageSize=exportPageSize] [--taskSubmitAttempts=taskSubmitAttempts]

exportPageSize refers to the page size used to iterate over the entries in Elastic (default 100), taskSubmitAttempts refers to the number of retries per each batch (default 5).

If there are batches of data that fail to be exported, they will be saved in a file on disk. In order to retry to export them, the following command can be used

CODE

java -DZKProperties=${PROJECT_DIR}/configs/zkConfigFiles/zookeeper.properties \
    -Dspring.profiles.active=cassandra \
    -Dlog4j.configuration=file:${PROJECT_DIR}/projects/standalonerunner/src/main/resources/log4j2.xml \
    -jar persistence-migration-fat.jar esProfilesMigration --retry-profiles

Action Logs

Identities are migrated using the following command:

CODE

java -DZKProperties=${PROJECT_DIR}/configs/zkConfigFiles/zookeeper.properties \
    -Dspring.profiles.active=cassandra \
    -Dlog4j.configuration=file:${PROJECT_DIR}/projects/standalonerunner/src/main/resources/log4j2.xml \
    -jar persistence-migration-fat.jar esActionLogsMigration [--exportPageSize=exportPageSize] [--taskSubmitAttempts=taskSubmitAttempts]

exportPageSize refers to the page size used to iterate over the entries in Elastic (default 100), taskSubmitAttempts refers to the number of retries per each batch (default 5).

If there are batches of data that fail to be exported, they will be saved in a file on disk. In order to retry to export them, the following command can be used

CODE

java -DZKProperties=${PROJECT_DIR}/configs/zkConfigFiles/zookeeper.properties \
    -Dspring.profiles.active=cassandra \
    -Dlog4j.configuration=file:${PROJECT_DIR}/projects/standalonerunner/src/main/resources/log4j2.xml \
    -jar persistence-migration-fat.jar esActionLogsMigration --retry-action-logs

Other configs

In order for new data to be saved in elastic, make sure that elasticsearch is enabled in elasticsearch.json. The four flags at the bottom can be used to control whether lucene searches are performed in cassandra or in elastic.