Operational runbook
This article will provide a list with all the troubleshooting tools and scripts present on deployed environments.
Disk path | Script Name | Node Type (webapp/persistence) |
---|---|---|
/etc/veridiumid/scripts/ | backup_configs.sh | All |
config_revert.sh | All | |
convert_haproxy_cert.sh | webapp | |
manage_alerts.py | All | |
check_license.py | Webapp | |
check_prepreqs.sh | All | |
check_services.sh | All | |
server_pins.sh | Webapp | |
veridium_services.sh | All | |
check_certificates.py | All | |
check_domain_cert.sh | Webapp | |
getLogs.sh | All | |
gather_env_info.sh | All | |
campaign_results.sh | Persistence | |
znode_cleanup.py | Webapp | |
check_db_ops.sh | Persistence | |
check_readyness.sh | Webapp | |
change_env_config.py | Webapp | |
veridiumid_cdcr.sh | All | |
veridiumid_crontab.sh | All | |
/opt/veridiumid/elasticsearch/bin/ | elasticsearch_ops.sh | Persistence |
where the veridium installation is uncompressed or in /home/veridiumid | check_prereqs.sh | InstallationNode |
/etc/veridiumid/update-procedure/current | update_manager.sh | All |
/opt/veridiumid/migration/bin | migration.sh | All |
/opt/veridiumid/cassandra/conf | recreateCassandraLuceneIndexes.sh | Persistence |
CLI Command - Alias | Full comand | Node (webapp/persistence) |
---|---|---|
check_services | bash /etc/veridiumid/scripts/check_services.sh | all |
ver_stop | bash /etc/veridiumid/scripts/veridium_services.sh stop | all |
ver_start | bash /etc/veridiumid/scripts/veridium_services.sh start | all |
ver_disable | bash /etc/veridiumid/scripts/veridium_services.sh disable | all |
ver_enable | bash /etc/veridiumid/scripts/veridium_services.sh enable | all |
check_certificates | python3 /etc/veridiumid/scripts/check_certificates.py | persistence |
cqlsh | /opt/veridiumid/cassandra/bin/cqlsh --cqlshrc=/opt/veridiumid/cassandra/conf/veridiumid_cqlshrc --ssl | persistence |
zkcli | /opt/veridiumid/zookeeper/bin/zkCli.sh | persistence |
ver_getLogs | bash /etc/veridiumid/scripts/getLogs.sh | all |
eops | bash /opt/veridiumid/elasticsearch/bin/elasticsearch_ops.sh |
|
All scripts must be executed as root user.
1. backup_configs.sh
The backup configs script will perform a backup of all required configurations present on disk on each node, including the JSON configuration files.
Usage:
bash /etc/veridiumid/scripts/backup_configs.sh /etc/veridiumid/scripts/backup_configs.conf
2. config_revert.sh
The script will recover the node using the configurations taken from a configuration backup (the one done by backup_configs.sh)
Usage:
Usage: ./config_revert.sh <ARGS>
Args: -c CONFIG_FILE - full path to the configuration file (same one used for the config backup)
-b BACKUP_ZIP - full path to the bakcup archive
-t TRANSITION_FILE - full path to the transition file (containing IP addresses transition)
-j - run just the JSON upload
Local config revert:
bash /etc/veridiumid/scripts/config_revert.sh -c /etc/veridiumid/scripts/backup_configs.conf -b /opt/veridiumid/backup/all_configs/BACKUP_ARCHIVE_NAME -t PATH_TO_TRANSITION_FILE
JSON Upload:
bash /etc/veridiumid/scripts/config_revert.sh -c /etc/veridiumid/scripts/backup_configs.conf -b /opt/veridiumid/backup/all_configs/BACKUP_ARCHIVE_NAME -t PATH_TO_TRANSITION_FILE -j
Example Transition file:
OLD_IP1:NEW_IP1
OLD_IP2:NEW_IP2
OLD_IP3:NEW_IP3
3. convert_haproxy_cert.sh
The script will convert from a PKCS12 certificate to the server.pem file required by HaProxy service.
Usage: ./convert_haproxy_cert.sh PATH_TO_PKCS_FILE
Example:
bash /etc/veridiumid/scripts/convert_haproxy_cert.sh /home/veridiumid/veridium.p12
4. manage_alerts.py
The below script is triggering specific alerts.
Usage: python3 /etc/veridiumid/alerts/manage_alerts.py --config /etc/veridiumid/alerts.conf
##The following alerts should be enabled in crontab.
Please create 2 files and setup in crontab to run as follow:
28 15 * * * python3 /etc/veridiumid/alerts/manage_alerts.py --config /etc/veridiumid/alertsDaily.conf
/etc/veridiumid/alertsDaily.conf
##Daily on one webapp and on one persistence
check_certificates.py:/etc/veridiumid/:ENABLE:/usr/bin/python3:warn:raep:toUser@domain.com:Check certificates results
*/10 * * * * python3 /etc/veridiumid/alerts/manage_alerts.py --config /etc/veridiumid/alertsEvery10Min.conf
/etc/veridiumid/alertsEvery10Min.conf
##Every 10 minutes, on each webapp and each persistance:
check_services.sh:/etc/veridiumid/scripts/:ENABLE:/bin/bash:starting,stopped:NONE:toUser@domain.com:Check services result
checkLdapConnections.sh:/etc/veridiumid/scripts/:ENABLE:/bin/bash:ERROR,RECOVERED:NONE:toUser@domain.com:LDAP connection status has changed
resource:disk_check:ENABLE:5:toUser@domain.com:Check disk usage
resource:mem_check:ENABLE:5:toUser@domain.com:Check memory usage
5. check_license.py
The check license script is used to validate in the license imported into the VeridiumID server is valid (has not expired and the server pinning provided is the correct one).
Usage:
python3 /etc/veridiumid/scripts/check_license.py
6. check_services.sh
The check services script will validate if the correct services are running on the node.
Usage:
bash /etc/veridiumid/scripts/check_services.sh
or using an alias:
check_services
7. server_pins.sh
The server pins script is used to provide the server pinning of the domain certificate (required for generating a license for VeridiumID services).
Usage:
With path to HaProxy domain certificate:
bash /etc/veridiumid/scripts/server_pins.sh /etc/veridiumid/haproxy/server.pem
With domain name:
bash /etc/veridiumid/scripts/server_pins.sh veridium.my-domain.com
8. veridium_services.sh
The veridium services script is used to stop/start/disable/enable all VeridiumID services present on the node.
Usage:
bash /etc/veridiumid/scripts/veridium_services.sh stop
bash /etc/veridiumid/scripts/veridium_services.sh start
bash /etc/veridiumid/scripts/veridium_services.sh disable
bash /etc/veridiumid/scripts/veridium_services.sh enable
or using aliases:
ver_stop
ver_start
ver_disable
ver_enable
9. check_certificates.py
The check certificates script is used to check the validity of all certificates used in the VeridiumID server (Friend, Default, Admin and device certificates).
The script should be executed on one WEBAPP and one PERSISTENCE in each datacenter. This is necessary, because it takes the information from Zookeeper, Haproxy and also from Cassandra.
Usage:
python3 /etc/veridiumid/scripts/check_certificates.py
or using an alias:
check_certificates
10. check_domain_cert.sh
The check domain certificate script will validate that all certificates are present in the domain cert set at HaProxy level.
Usage:
bash /etc/veridiumid/scripts/check_domain_cert.sh /etc/veridiumid/haproxy/server.pem
11. getLogs.sh
This script collect the logs for a specific date in order to be send to Veridium for troubleshoot.
The script will generate an archive with all the Veridium logs, in the location where the script is executed.
Usage:
#get the logs from today
bash /etc/veridiumid/scripts/getLogs.sh
#get the logs for a specific date, YYYY - year, MM - month, DD - day; example: 20220730
bash /etc/veridiumid/scripts/getLogs.sh YYYYMMDD
#get the logs for a specific intervarl, from first date to second date
bash /etc/veridiumid/scripts/getLogs.sh YYYYMMDD YYYYMMDD
12. gather_env_info.sh
This script will gather data required for better troubleshooting, data regarding:
The memory, CPU and disk allocated to the node
Which VeridiumID service are running on the node and their statuses
How much disk every VeridiumID service is occupying
VeridiumID services startup parameters
Node Kernel and OS version
JAVA version
Sysctl configurations (used for tunning)
Content of the hosts file
Which processes are consuming the most resources (using TOP)
Usage:
bash /etc/veridiumid/scripts/gather_env_info.sh
13. campaign_results.sh
This script will generate a list of users that logged in and did not change PIN
Also it will show the campaign progress (how many users have changed the PIN and how many users has logged on).
Usage:
bash /etc/veridiumid/reports/campaign_results.sh
14. check_prereqs.sh
this script checks pre installation checks, meaning:
port connectivity with wget and nc
checks if prerequisites are installed
checks if ntp is in sync
checks RAM and disk
Usage:
Usage: ./check_prereqs.sh <args>
Args: -l -> Run on local node
-r -> Run on remote nodes
-w IPS_LIST -> webapp node ip list delimitted by commas, for example: '10.0.0.1,10.0.0.2'
-p IPS_LIST -> persistence node ip list delimitted by commas, for example: '10.0.0.1,10.0.0.2'
Examples:
## local server
./check_prereqs.sh -l
## one server, checks remote
./check_prereqs.sh -r -w 10.109.20.133 -p 10.109.20.133
## 2 webapps and 3 persistance
./check_prereqs.sh -r -w 10.109.20.133,10.109.21.164 -p 10.109.50.171,10.109.51.185,10.109.52.72
15. znode_cleanup.py
This script will remove old Zookeeper configuration nodes from previous versions (created from updating environments).
Usage:
usage: znode_cleanup.py [-h] [--print_znodes] [--znode ZNODE] [--limit LIMIT]
[--delete] [--debug]
VeridiumID Zookeeper config read script
optional arguments:
-h, --help show this help message and exit
--print_znodes Print the existing configuration Zookeeper nodes
--znode ZNODE The name of the Zookeeper node that will be deleted
--limit LIMIT The number of Zookeeper nodes that should remain after the
deletion is done
--delete To configure the delete operation
--debug If Debug mode is enabled
Examples:
## To get existing Zookeeper configuration nodes:
python3 znode_cleanup.py --print_znodes
Example Output:
[INFO]The available Zookeeper configuration nodes are the following: ['6.1.20', '6.1.22', '6.1.24', '6.1.26', '6.1.27', '6.1.28', '6.1.29', '6.1.35', '6.1.36', '6.2.3', '6.2.19', '6.3.1', '6.4.3']
## Delete a single Zookeeper configuration node using the node's name
python3 znode_cleanup.py --delete --znode 6.1.20
Example Output:
[INFO]Removing /veridiumid/6.1.20
## Delete all but the last N Zookeeper configuration nodes using the limit argument
python3 znode_cleanup.py --delete --limit 10
Example Output:
[INFO]Removing /veridiumid/6.1.20
[INFO]Removing /veridiumid/6.1.22
[INFO]Removing /veridiumid/6.1.24
16. check_db_ops.sh
This script will check if the last Cassandra backup/maintenance task has been done successfully. It can be used in association with manager_alerts.py.
Usage:
bash /etc/veridiumid/scripts/check_db_ops.sh
Example Ouput:
[INFO] Successful operation found in /var/log/veridiumid/cassandra/backup.log: 20-dec-2022 11:15:35 INFO Finished snapshot: dc1_127.0.0.1
[INFO] Log file /var/log/veridiumid/cassandra/maintenance.log is empty. Checking previous log file...
[INFO] Successful operation found in /var/log/veridiumid/cassandra/maintenance.log-20221220: 20-dec-2022 07:23:34 INFO Nodetool repair finishes successfully
17. update_manager.sh
This is an additional, alternative way to do the upgrade.
The node from where the upgrade will be started must have a user that SSH connectivity using SSH keys to all other deployment nodes.
The remote user must have SUDO permissions or must be the root user.
On the deployment node install the latest veridiumid_update_procedure RPM and from the /etc/veridiumid/update-procedure/current directory run the following command to start the upgrade process:
# In case of using YUM repositories
bash update_manager.sh -v VERSION
# In case of using local RPM packages
bash update_manager.sh -v VERSION -r RPM_PATH
Where:
- VERSION -> is the build number of the new VeridiumID version, for example: 6.2.2
- RPM_PATH -> is the full path to the directory containing all RPMs required by the update process, for example: /home/veridiumid/rpms
18. elasticsearch_ops.sh
Usage:
Required parameters:
-b, --backup Runs a backup operation on the ElasticSearch cluster
-r=, --restore= Runs a restore operation on the ElasticSearch cluster. Requires the name of the snapshot that will be restored
-l, --list Lists the ElasticSearch snapshots
-i, --indices List the indices and their settings
-v, --debug Enable debug logging
-x=, --request= The request command: GET/PUT/POST/DELETE
-p=, --path= The API path
-d=, --data= The data required for the PUT API call
-h, --help Prints in the standard output the script's usage
To run API calls the following parameters must be used:
-x=/--request=
To configure the request method
Available request methods: GET/PUT/POST/DELETE
-p=/--path=
To configure the API path
For example: /_cat/repositories
More details can be found at: ElasticSearch APIs
-d=/--data=
Optional parameter
To add the JSON request string
PUT requests require a JSON data string.
Examples:
To check the current indexes states and distribution among all cluster nodes use the following command:
CODEeops -i
Check available snapshot repositories
- CODE
eops -x=GET -p=/_cat/repositories
Delete index
- CODE
eops -x=DELETE -p=/index_name
Modify cluster.routing.allocation.enable value
- CODE
eops -x=PUT -p=/_cluster/settings -d='{"persistent":{"cluster.routing.allocation.enable":"primaries"}}'
The JSON request must not contain spaces.
19. check_readyness.sh
This script will show the contents of the Health Ready API of primary webapss.
Usage:
bash /etc/veridiumid/scripts/check_readyness.sh
Example output:
Component Websec:
{
"opa": {
"status": "READY",
"details": "{}\n"
},
"cassandra": {
"status": "READY",
"details": ""
},
"adservice": {
"status": "SKIPPED",
"details": ""
}
}
Component ADService: READY
Component Shibboleth:
{
"websec": {
"status": "READY",
"details": "{\"opa\":{\"status\":\"READY\",\"details\":\"{}\\n\"},\"cassandra\":{\"status\":\"READY\",\"details\":\"\"},\"adservice\":{\"status\":\"SKIPPED\",\"details\":\"\"}}"
}
}
Component WebsecAdmin:
{
"cassandra": {
"status": "READY",
"details": ""
}
}
Component SelfServicePortal:
{
"websec": {
"status": "READY",
"details": "{\"opa\":{\"status\":\"READY\",\"details\":\"{}\\n\"},\"cassandra\":{\"status\":\"READY\",\"details\":\"\"},\"adservice\":{\"status\":\"SKIPPED\",\"details\":\"\"}}"
},
"cassandra": {
"status": "READY",
"details": ""
}
}
Component Fido:
{
"cassandra": {
"status": "READY",
"details": ""
}
}
20. migration.sh
This script is handling the handling content of zookeeper. This script uses /etc/veridiumid/zookeeper.properties file. Based on this file, it downloads or uploads a content of directory.
Usage:
Usage: ./migration.sh <args>
-z -> Zookeeper migration
-c -> Cassandra migration
-s IDP_HOME -> Shibboleth migration
-a -> Both Cassandra migration and Zookeeper migration (default value)
-d PATH -> Download the zookeeper configuration to PATH
-u PATH -> Upload the zookeeper configuration from PATH
-x PATH -> Delete Zookeeper PATH
To copy between paths in Zookeeper:
./migration.sh -from PATH1 -to PATH2
To copy and overwrite between paths in Zookeeper:
./migration.sh -from PATH1 -to PATH2 -force
Example:
##download zookeeper configuration (current one, for connection defined in zookeeper.properties )
/opt/veridiumid/migration/bin/migration.sh -d /tmp/zookBck1
#edit files in /tmp/zookBck1; do not leave additional files in this folder, files that should not be uploaded in zookeeper
##upload zookeeper configuration
/opt/veridiumid/migration/bin/migration.sh -u /tmp/zookBck1
21. recreateCassandraLuceneIndexes.sh
This script is dropping all lucene indexes and recreates them. This is useful during a cassandra migration or if indexes get corrupted.
##this can take a long time if the database is big. Can be executed in production without downtime; only admin is affected.
bash /opt/veridiumid/cassandra/conf/recreateCassandraLuceneIndexes.sh -c /opt/veridiumid/cassandra/conf/maintenance.conf
22. change_env_configuration.py
This script will be used in order to change the current Ports or SNI configuration. To use the script run as root user.
Usage:
VeridiumID FQDN change script
optional arguments:
-h, --help show this help message and exit
--debug If debug mode is enabled
--generate Generate current configuration file
--config CONFIG The full path to the configuration file containing the changes
--update Change the FQDN/ports
Generate the current configuration template
python3 /etc/veridiumid/scripts/change_env_config.py --generate
# The file will have the following format: DESCRIPTION|CURRENT_VALUE|CHANGED_VALUE
# Example PORTS config:
websec|443|443
dmz|8544|8544
websecadmin|9444|9444
shibboleth_ext|8944|8944
shibboleth_int|8945|8945
shibboleth_cert_ext|8946|8946
shibboleth_cert_int|8947|8947
selfservice|9987|9987
fqdn_1|test.veridium-dev.com|test.veridium-dev.com
# Example SNI config:
websec|develop.veridium-dev.com|develop.veridium-dev.com
dmz|dmz.develop.veridium-dev.com|dmz.develop.veridium-dev.com
shibboleth|shib.develop.veridium-dev.com|shib.develop.veridium-dev.com
websecadmin|admin.develop.veridium-dev.com|admin.develop.veridium-dev.com
selfservice|ssp.develop.veridium-dev.com|ssp.develop.veridium-dev.com
https_port|443|443
Change the configuration
After changing the values in the change_fqdn.config file run the following command to change the Ports/FQDN:
python3 /etc/veridiumid/scripts/change_env_config.py --update --config PATH_TO_CONFIG_FILE
# Where PATH_TO_CONFIG_FILE is the full path to the configuration file, for example: /etc/veridiumid/scripts/change_fqdn.config
23. veridiumid_cdcr.sh
This script will be used to add a second datacenter to the VeridiumID deployment:
Usage: ./veridiumid_cdcr.sh <args>
-g -> Generate archive (must be used on a Webapp node on the primary datacenter)
-w -> Configure Webapp node on secondary datacenter
-c -> Configure Cassandra node
-e -> Configure ElasticSearch node
-f -> First part of Cassandra configuration
-s -> Secondary datacenter
-z -> Upload modified Zookeeper configuraion (can be done just once in the secondary datacenter).
-a PATH -> The path to the CDCR archive created using the '-g' argument.
24. veridiumid_crontab.sh
This script will be used in case the client will require crontab tasks to be run as veridiumid user and not root.
Usage: bash /etc/veridiumid/scripts/veridiumid_crontab.sh
After running the script the permissions for backup and log directories will be changed in order to allow veridiumid user to access and modify them.
After this the user will need to manually make a copy of the root user’s current crontab list and move it to veridiumid user’s one.