Alertmanager 告警介绍和部署(1)

您所在的位置：网站首页 › alertmanager分组 › Alertmanager 告警介绍和部署(1)

Alertmanager 告警介绍和部署(1)

2024-05-10 03:18| 来源: 网络整理| 查看: 265

一.概述

　　告警是整个监控系统中重要的组成部分，在Prometheus监控体系中，指标的采集存储与告警是分开的。告警规则是在Prometheus server端定义的，告警规则被触发后，才会将信息发送给独立组件Alertmanager上，经过对告警的处理后，最终通过接收器(email)通知用户。

　　我们使用Prometheus server采集各类监控指标，然后基于promQL对这些指标定义阀值告警规则(Rules), Prometheus server对告警规则周期性地进行计算，如果满足告警触发条件，便生成一条告警信息，并将其推送到Alertmanager组件。收到告警后，Alertmanager会处理告警，进行分组(grouping)并将它们路由(routing)到正确的接收器(receiver)，如Eamil、PagerDuty、HipChat等。

　　1. 告警分组

　　　　分组机制(Grouping)是指，Alertmanager将同类型的告警进行分组，合并多条告警到一个通知中。避免瞬间突发性地接收大量的告警通知，使得管理员无法对问题进行快速定位。例如在大规则集群时，大量应用程序无法连接数据库的故障，如果我们在Prometheus告警规则中配置为每一个服务实例都发送告警，那么最后的结果就是大量的告警被发送到Alertmanager中心。

　　　　因此告警分组，告警时间、告警接收器均是通过Alertmanager的配置文件来完成的。

　　2. 告警抑制

　　　　抑制机制(Inhibition)是指，当某告警已经发出，停止重复发送由此告警引发的其它异常或故障的告警机制。Alertmanager的抑制机制在一定程序上避免了管理员收到过多的触发告警通知，抑制机制也是通过Alertmanager的配置文件进行设置的。

　　3.告警静默

　　　　告警静默(Silences)提供了一个简单的机制，可以根据标签快速对告警进行静默处理。对传入的告警进行匹配检查，如果接收到的告警符合静默的配置，Alertmanager则不会发送告警通知，管理员可以直接在Alertmanager的web界面中临时屏蔽指定的告警通知。

二.部署配置

　　Alertmanager也是基于Go语言编写，下载解压就可以使用。解压后，查看版本，当前版本号v0.24.0

[root@iZwz97yqubb71vyxhuskfyZ alertmanager]# pwd /root/prometheus/alertmanager [root@iZwz97yqubb71vyxhuskfyZ alertmanager]# ./alertmanager --version alertmanager, version 0.24.0 (branch: HEAD, revision: f484b17fa3c583ed1b2c8bbcec20ba1db2aa5f11) build user: root@265f14f5c6fc build date: 20220325-09:31:33 go version: go1.17.8 platform: linux/amd64

　　Alertmanager选项说明,说几个重要的参数，全部参数查看： ./alertmanager -h

选项名解释 --config.file 指定alertmanager.yml配置文件路径 --web.external-url 指定地址和端口，默认9093 格式：http://0.0.0.0:9093 --data.retention 历史数据最大保留时间，默认120小时

　　启动如下所示：

nohup ./alertmanager --config.file=alertmanager.yml --web.external-url=http://0.0.0.0:9093 > nohup.out&

三.配置介绍

　　Alertmanager配置文件格式通常包括global(全局配置)、templates(告警模板)、route(告警路由)、receivers(接收器)、inhibit_rules(抑制规则)等主要配置项模块。

　　这是alertmanager.yml模块格式，更多配置查看：https://prometheus.io/docs/alerting/latest/configuration/#filepath

global: # The smarthost and SMTP sender used for mail notifications. smtp_smarthost: 'localhost:25' smtp_from: '[email protected]' # The root route on which each incoming alert enters. route: # The root route must not have any matchers as it is the entry point for # all alerts. It needs to have a receiver configured so alerts that do not # match any of the sub-routes are sent to someone. receiver: 'team-X-mails' # The labels by which incoming alerts are grouped together. For example, # multiple alerts coming in for cluster=A and alertname=LatencyHigh would # be batched into a single group. # # To aggregate by all possible labels use '...' as the sole label name. # This effectively disables aggregation entirely, passing through all # alerts as-is. This is unlikely to be what you want, unless you have # a very low alert volume or your upstream notification system performs # its own grouping. Example: group_by: [...] group_by: ['alertname', 'cluster'] # When a new group of alerts is created by an incoming alert, wait at # least 'group_wait' to send the initial notification. # This way ensures that you get multiple alerts for the same group that start # firing shortly after another are batched together on the first # notification. group_wait: 30s # When the first notification was sent, wait 'group_interval' to send a batch # of new alerts that started firing for that group. group_interval: 5m # If an alert has successfully been sent, wait 'repeat_interval' to # resend them. repeat_interval: 3h # All the above attributes are inherited by all child routes and can # overwritten on each. # The child route trees. routes: # This routes performs a regular expression match on alert labels to # catch alerts that are related to a list of services. - match_re: service: ^(foo1|foo2|baz)$ receiver: team-X-mails # The service has a sub-route for critical alerts, any alerts # that do not match, i.e. severity != critical, fall-back to the # parent node and are sent to 'team-X-mails' routes: - match: severity: critical receiver: team-X-pager - match: service: files receiver: team-Y-mails routes: - match: severity: critical receiver: team-Y-pager # This route handles all alerts coming from a database service. If there's # no team to handle it, it defaults to the DB team. - match: service: database receiver: team-DB-pager # Also group alerts by affected database. group_by: [alertname, cluster, database] routes: - match: owner: team-X receiver: team-X-pager - match: owner: team-Y receiver: team-Y-pager # Inhibition rules allow to mute a set of alerts given that another alert is # firing. # We use this to mute any warning-level notifications if the same alert is # already critical. inhibit_rules: - source_matchers: - severity="critical" target_matchers: - severity="warning" # Apply inhibition if the alertname is the same. # CAUTION: # If all label names listed in `equal` are missing # from both the source and target alerts, # the inhibition rule will apply! equal: ['alertname'] receivers: - name: 'team-X-mails' email_configs: - to: '[email protected], [email protected]' - name: 'team-X-pager' email_configs: - to: '[email protected]' pagerduty_configs: - routing_key: - name: 'team-Y-mails' email_configs: - to: '[email protected]' - name: 'team-Y-pager' pagerduty_configs: - routing_key: - name: 'team-DB-pager' pagerduty_configs: - routing_key:

　　3.1 global

　　　　全局配置，可以作为其他配置项的默认值，也可以被其他配置项中的设置覆盖掉。

　　　　smtp_smarthost:邮箱smtp服务器代理地址。

　　　　smtp_from: 发送邮件的名称。

　　　　smtp_auth_username: 邮箱用户名称。

　　　　smtp_auth_password: 邮箱授权密码。

　　3.2 templates

　　　　与global同级，告警模板可以自定义告警通知的外观格式及其包含的对应告警数据。在templaes部分中包含告警模板的目录列表，也就是设置已存在的模板文件路径，例如：

templates: - 'templates/*.tmpl'

　　3.3 route

　　　　告警路由模块描述了在收到Prometheus server生成的告警后，将告警发送到receiver指定的目的地址的规则。下面来看一个示例

route: receiver: 'admin-receiver' group_wait: 30s group_interval: 5m repeat_interval: 4h group_by: [cluster,alertname] routes: -match: team: developers group_by: [product, environment] receiver: 'developer-pager' -match_re: service: mysql|redis receiver: 'database-pager'

　　　　选项说明：

　　　　　　route是根路由，

　　　　　　routes是子路由。match是通过字符形式进行告警匹配设置，用于判断当前告警中是否具有标签labelname且等于labelvalue。

　　　　　　group_by:是指定分组的标签，若告警中包含的标签符合group_by中指定的标签名称，这些警告会被合并为一个通知发送给接收器，即实现告警分组。

　　　　　　match_re:是通过正则表达式进行告警匹配设置，判断当前告警标签是否适配正则表达式的信息。

　　　　示例解释：

　　　　　　默认告警发送给管理员admin-receiver,且根路由中按照cluster和alertname进行了告警分组。

　　　　　　在子路由中若匹配到告警中标签team的值为developers,Alertmanager将按照标签product和environment对告警分组后发送通知，使得开发人员快速定位故障。

　　　　　　最后正则匹配规则，若告警信息中含有service标签，且值匹配到mysql或redis，就会向数据库管理员database-pager发送告警通知。

　　3.4 receivers

　　　　接收器是一个统称，每个receiver需要设置一个全局唯一的名称，并且对应一个或者多个通知方式，包括电子邮箱、微信、webhook等，官方建议通过webhook接收器实现自定义通知集成，可以支持用户定制。

　　3.5 inhibit_rules

　　　　inhibit_rules模块中设置实现告警抑制功能。可以指定在特定条件下要忽略的告警条件，合理设置抑制规则可以减少"垃圾"告警的产生。

四. 默认的Alertmanager配置介绍

route: 　　#路由配置模块 group_by: ['alertname'] 　　#告警分组 group_wait: 30s 　　#30秒内收到的同组告警在同一条告警通知中发送出去 group_interval: 5m #同组之间发送告警通知的时间间隔 repeat_interval: 1h　　　　　　　　#相同告警信息发送重复告警的周期 receiver: 'web.hook' #使用的接收器为web.hook receivers: #接收器模块设置 - name: 'web.hook' #设置接收器名称为web.hook webhook_configs:　　　　　　　　#设置webhook地址 - url: 'http://127.0.0.1:5001/' inhibit_rules: #告警抑制功能模块 - source_match: severity: 'critical' #当存在源标签告警触发时，抑制含有目标标签的告警 target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance'] #保证该配置下标签内容相同才会被抑制

【本文地址】

Alertmanager 告警介绍和部署(1)

Alertmanager 告警介绍和部署(1)

今日新闻

推荐新闻