Contents
Troubleshooting update errors
When upgrading KUMA, you may encounter the following errors:
- Timeout error
When upgrading from version 2.0.x on systems that contain large amounts of data and are operating with limited resources, the system may return the Wrong admin password error message after you enter the administrator password. If you specify the correct password, KUMA may still return an error because KUMA could not start the Core service due to resource limit and a timeout error. If you enter the administrator password three times without waiting for the installation to complete, the update may end with a fatal error.
Follow these steps to resolve the timeout error and successfully complete the update:
- Open a separate second terminal and run the following command to verify that the command output contains the timeout error line:
journalctl -u kuma-core | grep 'start operation timed out'
Timeout error message:
kuma-core.service: start operation timed out. Terminating.
- After you find the timeout error message, in the /usr/lib/systemd/system/kuma-core.service file, change the value of the
TimeoutSec
parameter from 300 to 0 to remove the timeout limit and temporarily prevent the error from recurring. - After modifying the service file, run the following commands in sequence:
systemctl daemon-reload
service kuma-core restart
- After running the commands and successfully starting the service in the second terminal, enter the administrator password again in your first terminal where the installer is prompting you for the password.
KUMA will continue the installation. In resource-limited environments, installation may take up to an hour.
- After installation finishes successfully, in the /usr/lib/systemd/system/kuma-core.service file, set the
TimeoutSec
parameter back to 300. - After modifying the service file, run the following commands in the second terminal:
systemctl daemon-reload
service kuma-core restart
After you run these commands, the update will be succeed.
- Open a separate second terminal and run the following command to verify that the command output contains the timeout error line:
- Invalid administrator password
The admin user password is needed to automatically populate the storage settings during the upgrade process. If you enter the admin user password incorrectly nine times during the TASK [Prompt for admin password], the installer still performs the update, and the web interface is available, but the storage settings are not migrated, and the storages have the red status.
To fix the error and make the repositories available again, update the storage settings:
- Go to the storage settings, manually fill in the fields of the ClickHouse cluster, and click Save.
- Restart the storage service.
The storage service starts with the specified settings, and its status is green.
- DB::Exception error
After upgrading KUMA, the storage may have the red status, and its logs may contain errors about suspicious strings.
Example error:
DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, int, bool) @ 0xda0553a in /opt/kaspersky/kuma/clickhouse/bin/clickhouse
To restart ClickHouse, run the following command on the KUMA storage server:
touch /opt/kaspersky/kuma/clickhouse/data/flags/force_restore_data && systemctl restart kuma-storage-<
ID of the storage that encountered the error
>
- Expiration of k0s cluster certificates
Symptoms
Controllers or worker nodes cannot connect; pods cannot be moved from one worker node to another.
Logs of the k0scontroller and k0sworker services contain multiple records with the following substring:
x509: certificate has expired or is not yet valid
Cause
Cluster service certificates are valid for 1 year from the time of creation. The k0s cluster used in the high-availability KUMA installation automatically rotates all the service certificates it needs, but the rotation is performed only at startup of the k0scontroller service. If k0scontroller services on cluster controllers run without a restart for more than 1 year, service certificates become invalid.
How to fix
To fix the error, restart the k0scontroller services one by one as root on each controller of the cluster. This reissues the certificates:
systemctl restart k0scontroller
To check the expiration dates of certificates on controllers, run the following commands as root:
find /var/lib/k0s/pki/ -type f -name "*.crt" -print|egrep -v 'ca.crt$'|xargs -L 1 -t -i bash -c 'openssl x509 -noout -text -in {}|grep After'
find /var/lib/k0s/pki/etcd -type f -name "*.crt" -print|egrep -v 'ca.crt$'|xargs -L 1 -t -i bash -c 'openssl x509 -noout -text -in {}|grep After'
You can find the names of certificate files and their expiration dates in the output of these commands.
Fix the errors to successfully complete the update.
Page top