Skip to main content
Skip table of contents

Session Retention

1. Overview

In production scenarios with lots of users (corporate-level) Veridium DB gets to sizes that are becoming difficult to manage (display, search, operate information in tables containing certain type of data that are always high volume). The solution approached is to divide the data in 2 types - available instantly from cassandra and only on demand in special cases, from the disk..

Data type

Persistence

Availability

Location

hot data

relative short

real-time (ms)

Cassandra Hot

cold data

long period

on demand

Binary storage (cold)

For each type of temporary records will be defined the stages and rules regarding the persistence period.

Cassandra Hot represents the main storage used by VeridiumId platform.

2. Temporary records types ( Cassandra tables)

The cassandra tables which are are considered temporary data and should be obey to the flow above are:

  1. session_finished

  2. action_log

  3. history

This data and they configurations (policy) are defined in zookeeper into config.json.

Below are the configurations to keep the data for one year.

CODE
    "retentionPolicy": {
        "cassandraDetails": {
            "retentionKeyspace": "veridium-retention",
            "maxNumRetrievedRecords": 500
        },
        "kafkaDetails": {
            "defaultGroupId": "consumerId"
        },
        "data": [
            {
                "dataType": "history",
                "topic": "history-events",
                "retention": {
                    "archived": true,
                    "hot": 365
                }
            },
            {
                "dataType": "action_log",
                "topic": "action-log-events",
                "retention": {
                    "archived": true,
                    "hot": 365
                }
            },
            {
                "dataType": "session_finished",
                "topic": "session-finished-events",
                "retention": {
                    "archived": true,
                    "warm": 0,
                    "hot": 365
                }
            }
        ],
        "generalSettings": {
            "schedulerFrequency": "0 1 0 * * *",
            "useWarmLayer": false,
            "archivingPath": "/opt/veridiumid/backup/data_retention"
        }
    },

3. Archival process

Archival process is done by a series of chron-jobs created in DataRetentionService. This jobs produce CSV files stored in folders in a tree structure with next format:

{CONFIGURATE_PATH}/archives / {table_name} / {data_1} / (table_name)_archive.csv

  • {table_name} = type of entry which is archived

  • {data_1} = date when was created the date in system

The process that is doing the data retention is ver_data_retention and is installed only on one persistence node.

Name

Basic Description

Default Value

General Settings

Scheduler Frequency - The chron expression which describe de frequency of archive/clean job.

0 1 0 * * *

Use Warm Layer - Decide if data will be place also in warm layer.

Switched off

Archiving Path - The path where will be added the archives.

/existing/path

Cassandra Details

Max Num Retrieved Records - Maximum number of records to be retrieved once.

500

Don't change default values without reason.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.