UI Slowness, Nodes temporarily showing Offline, or Instances temporarily showing Error
Scenarios
- Loading the UI seems to be very slow
- Node status all seem to be delayed or alternating between different states randomly
- Instance states all seem to be delayed or alternating between different states randomly
Confirm an
average_queue_time
of > 5 when curling the controller’sapi/v1/stats
endpoint
Common Causes
- Lots of VMS, Nodes, and not enough load balancing/instances of the controller, etcd, etc.
- ETCD is very sensitive to disk latency; not using SSDs for the etcd storage
Solutions:
- Increase the space quota from the default 2GB: [https://etcd.io/docs/current/op-guide/maintenance/#space-quota]
- Rejoin your nodes with a higher
--heartbeat
value (> 20s) - Upgrade the host’s disk to an SSD or faster disk
- Set
ANKA_NUM_WORKERS
to more than the default 2 in the Controller configuration.
State changes can also be caused by network issues between the Node and the controller. Check the node’s
/var/log/veertu/anka_agent.ERROR
log to confirm you’re not seeing timeouts or connection errors.