Session Retention
1. Overview
In production scenarios with lots of users (corporate-level) Veridium DB gets to sizes that are becoming difficult to manage (display, search, operate information in tables containing certain type of data that are always high volume). The solution approached is to divide the data in 2 types - available instantly from cassandra and only on demand in special cases, from the disk..
Data type | Persistence | Availability | Location |
---|---|---|---|
hot data | relative short | real-time (ms) | Cassandra Hot |
cold data | long period | on demand | Binary storage (cold) |
For each type of temporary records will be defined the stages and rules regarding the persistence period.
Cassandra Hot represents the main storage used by VeridiumId platform.
2. Temporary records types ( Cassandra tables)
The cassandra tables which are are considered temporary data and should be obey to the flow above are:
session_finished
action_log
history
This data and they configurations (policy) are defined in zookeeper into config.json.
Below are the configurations to keep the data for one year.
"retentionPolicy": {
"cassandraDetails": {
"retentionKeyspace": "veridium-retention",
"maxNumRetrievedRecords": 500
},
"kafkaDetails": {
"defaultGroupId": "consumerId"
},
"data": [
{
"dataType": "history",
"topic": "history-events",
"retention": {
"archived": true,
"hot": 365
}
},
{
"dataType": "action_log",
"topic": "action-log-events",
"retention": {
"archived": true,
"hot": 365
}
},
{
"dataType": "session_finished",
"topic": "session-finished-events",
"retention": {
"archived": true,
"warm": 0,
"hot": 365
}
}
],
"generalSettings": {
"schedulerFrequency": "0 1 0 * * *",
"useWarmLayer": false,
"archivingPath": "/opt/veridiumid/backup/data_retention"
}
},
3. Archival process
Archival process is done by a series of chron-jobs created in DataRetentionService. This jobs produce CSV files stored in folders in a tree structure with next format:
{CONFIGURATE_PATH}/archives / {table_name} / {data_1} / (table_name)_archive.csv
{table_name} = type of entry which is archived
{data_1} = date when was created the date in system
The process that is doing the data retention is ver_data_retention and is installed only on one persistence node.
Name | Basic Description | Default Value |
---|---|---|
General Settings | Scheduler Frequency - The chron expression which describe de frequency of archive/clean job. | 0 1 0 * * * |
Use Warm Layer - Decide if data will be place also in warm layer. | Switched off | |
Archiving Path - The path where will be added the archives. | /existing/path | |
Cassandra Details | Max Num Retrieved Records - Maximum number of records to be retrieved once. | 500 |
Don't change default values without reason.