Configure Alertmanager
Alertmanager is used by Calico Enterprise to route alerts from Prometheus to the administrators. It handles routing, deduplicating, grouping, silencing and inhibition of alerts.
More detailed information about Alertmanager is available in the upstream documentation.
Updating the AlertManager config
-
Save the current alertmanager secret, usually named
alertmanager-<your-alertmanager-name>
. Our manifests will end up creating a secret called:alertmanager-calico-node-alertmanager
.kubectl -n tigera-operator get secrets alertmanager-calico-node-alertmanager -o yaml > alertmanager-secret.yaml
-
The current alertmanager.yaml file is encoded and stored inside the
alertmanager.yaml
key under thedata
field. You can decode it by copying the value ofalertmanager.yaml
and using thebase64
command.echo "<whatever-you-copied>" | base64 --decode > alertmanager-config.yaml
-
Make necessary changes to
alertmanager-config.yaml
. Once this is done, you have to re-encode and save it toalertmanager-secret.yaml
. You can do this by (in Linux):cat alertmanager-config.yaml | base64 -w 0
-
Paste the output of the running the command above back in
alertmanager-secret.yaml
replacing the value present inalertmanager.yaml
field. Then apply this updated manifest.kubectl -n tigera-operator apply -f alertmanager-secret.yaml
Your changes should be applied in a few seconds by the config-reloader
container inside the alertmanager pod launched by the prometheus-operator
(usually named alertmanager-<your-alertmanager-instance-name>
).
For more advice on writing alertmanager configuration files, see the alertmanager configuration documentation.
Configure Inhibition Rules
Alertmanager has a feature to suppress certain notifications according to
defined rules. A typical use case for defining inhibit
rules is to suppress
notifications from a lower priority alert when one with a higher priority is
firing. These inhibition rules are defined in the alertmanager configuration
file. You can define one by adding this configuration snippet to your
alertmanager.yaml
.
[...]
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'info'
# Apply inhibition for alerts generated by the same alerting rule
# and on the same node.
equal: ['alertname', 'instance']
[...]
Configure Grouping of Alerts
Alertmanager also has a feature to group alerts based on labels and fine tune how often to resend an alert and so on. In the case of Denied Packet metrics, simply defining a Prometheus alerting rule would mean that you will get an page (if so defined in your alertmanager configuration) for every policy on every node for every Source IP. All these alerts can be combined into a single alert by configuring grouping. The Alertmanager configuration file that is provided with Calico Enterprise by default, groups alerts on a per-node basis. Instead, if the goal is to group all alerts with the same name, edit (and apply) the alertmanager configuration file like so:
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 1m
repeat_interval: 5m
receiver: 'webhook'
receivers:
- name: 'webhook'
webhook_configs:
- url: 'http://calico-alertmanager-webhook:30501/'
More information, including descriptions of the various options can be found under the route section of the Alertmanager Configuration guide.