You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Robusta has special features for handling Prometheus alerts in Kubernetes clusters including:
9
9
10
10
1. **Enrichers:** playbooks that enrich alerts with extra information based on the alert type
@@ -17,38 +17,34 @@ These features are still in beta and therefore have been implemented differently
17
17
of operation, you configure a root ``alerts_integration`` playbook in ``active_playbooks.yaml`` and then add special enrichment
18
18
and silencer playbooks underneath that playbook. In the future, this functionality will likely be merged into regular playbooks.
19
19
20
-
Setup and configuration
21
-
^^^^^^^^^^^^^^^^^^^^^^^^^^
20
+
Configure Robusta
21
+
---------------------------------
22
22
23
-
Configure Prometheus AlertManager
24
-
----------------------------------
25
-
Before you can enrich prometheus alerts, you must forward Prometheus alerts to Robusta by adding a webhook receiver to AlertsManager.
26
-
See :ref:`Setting up the webhook` for details.
23
+
.. admonition:: Configure Prometheus AlertManager
27
24
28
-
Configure Robusta
29
-
------------------------------
30
-
Lets look at the simplest possible ``active_playbooks.yaml`` which instructs Robusta to forward Prometheus alerts to Slack without any enrichment:
25
+
Before you can enrich prometheus alerts, you must forward Prometheus alerts to Robusta by adding a webhook receiver to AlertsManager.
31
26
32
-
|**Enabling it:**
27
+
See :ref:`Setting up the webhook` for details.
28
+
29
+
30
+
Lets look at the simplest possible ``active_playbooks.yaml`` which instructs Robusta to forward Prometheus alerts to Slack without any enrichment:
33
31
34
32
.. code-block:: yaml
35
33
36
34
active_playbooks:
37
35
- name: "alerts_integration"
38
36
39
37
The above configuration isn't very useful because we haven't enriched any alerts yet.
40
-
However, we do get a minor aesthetic benefit because Robusta adds pretty formatting to alerts as you can see below:
38
+
However, Robusta still sends default information for every alert as you can see below.
41
39
42
40
.. image:: /images/default-slack-enrichment.png
43
41
:width:30 %
44
42
:align:center
45
43
46
44
Adding an Enricher
47
-
-------------------
45
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
48
46
Now lets add an enricher to ``active_playbooks.yaml`` which enriches the ``HostHighCPULoad`` alert:
49
47
50
-
|**Enabling it:**
51
-
52
48
.. code-block:: yaml
53
49
54
50
active_playbooks:
@@ -78,7 +74,7 @@ Therefore, in the above example, we explicitly added back the ``AlertDefaults``
78
74
Make sure to check out the full list of enrichers to see what you can add.
79
75
80
76
Setting the default enricher
81
-
------------------------------
77
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
82
78
83
79
You can change the default enricher(s) for all alerts using the ``default_enrichers`` parameter.
84
80
@@ -91,10 +87,8 @@ You can change the default enricher(s) for all alerts using the ``default_enrich
91
87
- name: "AlertDefaults"
92
88
93
89
Adding a Silencer
94
-
-----------------
95
-
Now lets look at an example ``active_playbooks.yaml`` which silences KubePodCrashLooping alerts in the first ten minutes after a node (re)starts:
96
-
97
-
|**Enabling it:**
90
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
91
+
Lets silence `KubePodCrashLooping` alerts in the first ten minutes after a node (re)starts:
98
92
99
93
.. code-block:: yaml
100
94
@@ -109,8 +103,8 @@ Now lets look at an example ``active_playbooks.yaml`` which silences KubePodCras
109
103
post_restart_silence: 600# seconds
110
104
111
105
Full example
112
-
----------------
113
-
Here is an example which shows all the features discussed above working together:
106
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
107
+
Here are all the above features working together:
114
108
115
109
.. code-block:: yaml
116
110
@@ -133,108 +127,179 @@ Here is an example which shows all the features discussed above working together
133
127
params:
134
128
post_restart_silence: 600# seconds
135
129
136
-
Available enrichers
137
-
^^^^^^^^^^^^^^^^^^^^^^^^^^
130
+
Available Enrichers
131
+
-----------------------
138
132
139
-
**AlertDefaults:** send the alert message and labels to Slack
133
+
AlertDefaults
134
+
^^^^^^^^^^^^^^^^
135
+
Send the alert message and labels to Slack
140
136
141
-
**NodeCPUAnalysis:** provide deep analysis of node cpu usage
142
-
This enricher use ``prometheus``. The ``prometheus`` url can be overriden in the ``global_config`` section.
143
-
For example - ``prometheus_url: "http://prometheus-k8s.monitoring.svc.cluster.local:9090"``
137
+
NodeCPUAnalysis
138
+
^^^^^^^^^^^^^^^^^^^^^
139
+
Provide analysis of node cpu usage.
144
140
145
-
**OOMKillerEnricher:** shows which pods were recently OOM Killed on a node
141
+
.. note::
142
+
This enricher use ``prometheus``. The ``prometheus`` url can be overriden in the ``global_config`` section.
146
143
147
-
**GraphEnricher:** display a graph of the Prometheus query which triggered the alert
148
-
This enricher use ``prometheus``. The ``prometheus`` url can be overriden in the ``global_config`` section.
149
-
For example - ``prometheus_url: "http://prometheus-k8s.monitoring.svc.cluster.local:9090"``
144
+
For example - ``prometheus_url: "http://prometheus-k8s.monitoring.svc.cluster.local:9090"``
150
145
151
-
**StackOverflowEnricher:** add a button in Slack to search for the alert name on StackOverflow
146
+
GraphEnricher
147
+
^^^^^^^^^^^^^^^^^^^^^
148
+
Display a graph of the Prometheus query which triggered the alert.
152
149
153
-
**NodeRunningPodsEnricher:** add a list of the pods running on the node, with the pod Ready status
150
+
`See note above regarding the prometheus_url parameter.`
154
151
155
-
.. image:: /images/node-running-pods.png
156
-
:width:80 %
157
-
:align:center
152
+
.. admonition:: Example
158
153
159
-
**NodeAllocatableResourcesEnricher:** add the allocatable resources available on the node
154
+
.. image:: /images/graph-enricher.png
155
+
:width:50 %
156
+
:align:center
160
157
161
-
.. image:: /images/node-allocatable-resources.png
162
-
:width:80 %
163
-
:align:center
158
+
TemplateEnricher
159
+
^^^^^^^^^^^^^^^^^^^^^
160
+
Add a paragraph to the alert's description containing templated markdown. You can inject any of the alert's Prometheus labels into the markdown.
164
161
165
-
**DaemonsetEnricher:** for daemonset related alerts, adds details about the daemonset status
162
+
A variable like ``$foo`` will be replaced by the value of the Prometheus label ``foo``. If a label isn't present then the text "<missing>" will be used instead.
166
163
167
-
.. image:: /images/daemonset-enricher.png
168
-
:width:80 %
169
-
:align:center
164
+
Common variables to use are ``$alertname``, ``$deployment``, ``$namespace``, and ``$node``
170
165
171
-
**DaemonsetMisscheduledAnalysis:** analyze the known Prometheus alert ``KubernetesDaemonsetMisscheduled`` and provide
172
-
actionable advice on how to fix it. This enricher **only** displays output when it can verify that the alert is a false
173
-
positive.
166
+
The template can include all markdown directives supported by Slack. Note that Slack markdown links use a different format than GitHub.
174
167
175
-
.. image:: /images/daemonset-misscheduled.png
168
+
.. admonition:: Example
176
169
177
-
**PodBashEnricher:** runs the specified bash command, on the **pod** associated with the alert
170
+
.. code-block:: yaml
178
171
179
-
|**Note:** The bash command must be installed on the target pod
172
+
active_playbooks:
173
+
(...)
174
+
- alert_name: "ContainerVolumeUsage"
175
+
enrichers:
176
+
- name: "TemplateEnricher"
177
+
params:
178
+
template: "The alertname is $alertname and the pod is $pod"
180
179
181
-
|**Example Usage:**
180
+
LogsEnricher
181
+
^^^^^^^^^^^^^^^^^^^^^
182
+
Fetch logs related to the alert and attach them to the alert as a file.
182
183
183
-
.. code-block:: yaml
184
+
The pod to fetch logs for is determined by the alert's ``pod`` label from Prometheus.
184
185
185
-
active_playbooks:
186
-
(...)
187
-
- alert_name: "ContainerVolumeUsage"
188
-
enrichers:
189
-
- name: "PodBashEnricher"
190
-
params:
191
-
bash_command: "df -h"
186
+
By default, if the alert has no label named ``pod`` then this enricher will silently do nothing. To show an explicit error, set the ``warn_on_missing_label`` parameter to ``true``
192
187
193
-
|**The results:**
188
+
OOMKillerEnricher
189
+
^^^^^^^^^^^^^^^^^^^^^
190
+
Shows which pods were recently OOM Killed on a node
194
191
195
-
.. image:: /images/disk-usage.png
196
-
:width:80 %
197
-
:align:center
192
+
StackOverflowEnricher
193
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
194
+
Add a button in Slack to search for the alert name on StackOverflow
198
195
199
-
**NodeBashEnricher:** runs the specified bash command, on the **node** associated with the alert
196
+
NodeRunningPodsEnricher
197
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
198
+
Add a list of the pods running on the node, with the pod Ready status
200
199
201
-
|**Example Usage:**
200
+
.. admonition:: Example
202
201
203
-
.. code-block:: yaml
202
+
.. image:: /images/node-running-pods.png
203
+
:width:80 %
204
+
:align:center
204
205
205
-
active_playbooks:
206
-
(...)
207
-
- alert_name: "HostOutOfDiskSpace"
208
-
enrichers:
209
-
- name: "NodeBashEnricher"
210
-
params:
211
-
bash_command: "df -h"
206
+
NodeAllocatableResourcesEnricher
207
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
208
+
Add the allocatable resources available on the node
0 commit comments