Session Retention

1. Overview

In production scenarios with lots of users (corporate-level) Veridium DB gets to sizes that are becoming difficult to manage (display, search, operate information in tables containing certain type of data that are always high volume). The solution approached is to divide the data in 2 types - available instantly from cassandra and only on demand in special cases, from the disk..

Data type	Persistence	Availability	Location
hot data	relative short	real-time (ms)	Cassandra Hot
cold data	long period	on demand	Binary storage (cold)

For each type of temporary records will be defined the stages and rules regarding the persistence period.

Cassandra Hot represents the main storage used by VeridiumId platform.

2. Temporary records types ( Cassandra tables)

The cassandra tables which are are considered temporary data and should be obeyed to the flow above are:

session_finished
action_log
history

This data and they configurations (policy) are defined in zookeeper into config.json.

Below are the configurations to keep the data for one year.

CODE

    "retentionPolicy": {
        "cassandraDetails": {
            "retentionKeyspace": "veridium-retention",
            "maxNumRetrievedRecords": 500
        },
        "kafkaDetails": {
            "defaultGroupId": "consumerId"
        },
        "data": [
            {
                "dataType": "history",
                "topic": "history-events",
                "retention": {
                    "archived": true,
                    "hot": 365
                }
            },
            {
                "dataType": "action_log",
                "topic": "action-log-events",
                "retention": {
                    "archived": true,
                    "hot": 365
                }
            },
            {
                "dataType": "session_finished",
                "topic": "session-finished-events",
                "retention": {
                    "archived": true,
                    "warm": 0,
                    "hot": 365
                }
            }
        ],
        "generalSettings": {
            "schedulerFrequency": "0 1 0 * * *",
            "useWarmLayer": false,
            "archivingPath": "/opt/veridiumid/backup/data_retention"
        }
    },

3. Archival process

Archival process is done by a series of chron-jobs created in DataRetentionService. This jobs produce CSV files stored in folders in a tree structure with next format:

{CONFIGURATE_PATH}/archives / {table_name} / {data_1} / (table_name)_archive.csv

{table_name} = type of entry which is archived
{data_1} = date when was created the date in system

The process that is doing the data retention is ver_data_retention and is installed only on one persistence node.

Name	Basic Description	Default Value
Cassandra Details	Retention Keyspace - The Cassandra Keyspace used to store retention data.	veridium-retention
Cassandra Details	Max Num Retrieved Records - Maximum number of records to be retrieved once.	50
Kafka Details	Default Group Id - The default groupId used by data retention consumers.	consumerId
Data	The map which contains all configs per table.
General Settings	Scheduler Frequency - The chron expression which describe de frequency of archive/clean job.	0 1 0 * * *
General Settings	Archiving Path - The path where will be added the archives.	/existing/path

Don't change default values without reason.