Alertmanager is used by Calico Cloud to route alerts from Prometheus to the administrators. It handles routing, deduplicating, grouping, silencing and inhibition of alerts.
More detailed information about Alertmanager is available in the upstream documentation.
Updating the AlertManager config
Save the current alertmanager secret, usually named
alertmanager-<your-alertmanager-name>. Our manifests will end up creating a secret called:
kubectl -n tigera-operator get secrets alertmanager-calico-node-alertmanager -o yaml > alertmanager-secret.yaml
The current alertmanager.yaml file is encoded and stored inside the
alertmanager.yamlkey under the
datafield. You can decode it by copying the value of
alertmanager.yamland using the
echo "<whatever-you-copied>" | base64 --decode > alertmanager-config.yaml
Make necessary changes to
alertmanager-config.yaml. Once this is done, you have to re-encode and save it to
alertmanager-secret.yaml. You can do this by (in Linux):
cat alertmanager-config.yaml | base64 -w 0
Paste the output of the running the command above back in
alertmanager-secret.yamlreplacing the value present in
alertmanager.yamlfield. Then apply this updated manifest.
kubectl -n tigera-operator apply -f alertmanager-config.yaml
Your changes should be applied in a few seconds by the config-reloader
container inside the alertmanager pod launched by the prometheus-operator
For more advice on writing alertmanager configuration files, see the alertmanager configuration documentation.
Configure Inhibition Rules
Alertmanager has a feature to suppress certain notifications according to
defined rules. A typical use case for defining
inhibit rules is to suppress
notifications from a lower priority alert when one with a higher priority is
firing. These inhibition rules are defined in the alertmanager configuration
file. You can define one by adding this configuration snippet to your
# Apply inhibition for alerts generated by the same alerting rule
# and on the same node.
equal: ['alertname', 'instance']
Configure Grouping of Alerts
Alertmanager also has a feature to group alerts based on labels and fine tune how often to resend an alert and so on. In the case of Denied Packet metrics, simply defining a Prometheus alerting rule would mean that you will get an page (if so defined in your alertmanager configuration) for every policy on every node for every Source IP. All these alerts can be combined into a single alert by configuring grouping. The Alertmanager configuration file that is provided with Calico Cloud by default, groups alerts on a per-node basis. Instead, if the goal is to group all alerts with the same name, edit (and apply) the alertmanager configuration file like so:
- name: 'webhook'
- url: 'http://calico-alertmanager-webhook:30501/'
More information, including descriptions of the various options can be found under the route section of the Alertmanager Configuration guide.