Skip to main content
Skip table of contents

Session Retention

1. Overview

In production scenarios with lots of users (corporate-level) Veridium DB gets to sizes that are becoming difficult to manage (display, search, operate information in tables containing certain type of data that are always high volume). The solution approached is to divide the data in 2 types - available instantly from cassandra and only on demand in special cases, from the disk..

Data type

Persistence

Availability

Location

hot data

relative short

real-time (ms)

Cassandra Hot

cold data

long period

on demand

Binary storage (cold)

For each type of temporary records will be defined the stages and rules regarding the persistence period.

Cassandra Hot represents the main storage used by VeridiumId platform.

2. Temporary records types ( Cassandra tables)

The cassandra tables which are are considered temporary data and should be obeyed to the flow above are:

  1. session_finished

  2. action_log

  3. history

This data and they configurations (policy) are defined in zookeeper into config.json.

Below are the configurations to keep the data for one year.

CODE
    "retentionPolicy": {
        "cassandraDetails": {
            "retentionKeyspace": "veridium-retention",
            "maxNumRetrievedRecords": 500
        },
        "kafkaDetails": {
            "defaultGroupId": "consumerId"
        },
        "data": [
            {
                "dataType": "history",
                "topic": "history-events",
                "retention": {
                    "archived": true,
                    "hot": 365
                }
            },
            {
                "dataType": "action_log",
                "topic": "action-log-events",
                "retention": {
                    "archived": true,
                    "hot": 365
                }
            },
            {
                "dataType": "session_finished",
                "topic": "session-finished-events",
                "retention": {
                    "archived": true,
                    "warm": 0,
                    "hot": 365
                }
            }
        ],
        "generalSettings": {
            "schedulerFrequency": "0 1 0 * * *",
            "useWarmLayer": false,
            "archivingPath": "/opt/veridiumid/backup/data_retention"
        }
    },

3. Archival process

Archival process is done by a series of chron-jobs created in DataRetentionService. This jobs produce CSV files stored in folders in a tree structure with next format:

{CONFIGURATE_PATH}/archives / {table_name} / {data_1} / (table_name)_archive.csv

  • {table_name} = type of entry which is archived

  • {data_1} = date when was created the date in system

The process that is doing the data retention is ver_data_retention and is installed only on one persistence node.

Name

Basic Description

Default Value

Cassandra Details

Retention Keyspace - The Cassandra Keyspace used to store retention data.

veridium-retention

Max Num Retrieved Records - Maximum number of records to be retrieved once.

50

Kafka Details

Default Group Id - The default groupId used by data retention consumers.

consumerId

Data

The map which contains all configs per table.

General Settings

Scheduler Frequency - The chron expression which describe de frequency of archive/clean job.

0 1 0 * * *

Archiving Path - The path where will be added the archives.

/existing/path

Don't change default values without reason.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.