记一次alertmanager发送邮件失败的处理过程 |
您所在的位置:网站首页 › qq邮箱收到邮件但是找不到 › 记一次alertmanager发送邮件失败的处理过程 |
文章目录
0 说明环境说明阅读说明
1 先验证smtp信息是否正确2 配置alertmanager配置文件并触发告警3 解决 smtp.plainAuth failed: wrong host name4 解决 dial tcp 127.0.0.1:5001: connect: connection refused5 解决 配置文件不对应的问题6 解决configmap跟挂载文件名不对应的问题
0 说明
环境说明
本文的alertmanager和对应的prometheus都是容器化部署的 使用的是k8s(华为云的CCE) 为了减少篇幅,下文中基础操作,诸如重启pod、修改configmap、修改deployment等已略去 通读并理解如下报错和解决办法 需要读者具备如下基础知识 - 容器化基础 - 编辑configmap deployment的指令 - 其他基础指令 如 get / logs / delete - 理解并可以配置deployment的volume - docker的基础命令 - python基础 - prometheus监控基础 阅读说明如下的过程记录了我遇到alertmanager不发送告警邮件后完整的排查过程,也贴有详细的报错信息,如果有时间可以通读,没时间可以直接在全文搜索你自己的报错信息,快速定位问题。 1 先验证smtp信息是否正确我用如下脚本验证了我自己的smtp信息ok 是可以发送邮件的 #!/usr/bin/python3 import smtplib from email.mime.text import MIMEText # 第三方 SMTP 服务 mail_host = "smtp.xxx.com" # SMTP服务器 mail_user = "[email protected]" # 邮箱地址 mail_pass = "smtp_password" # smtp服务器授权密码 sender = "[email protected]" # 邮箱地址 receivers = ['[email protected]'] # 接收人邮箱 content = 'Python Send Mail !' title = 'Python SMTP Mail Test' # 邮件主题 message = MIMEText(content, 'plain', 'utf-8') # 内容, 格式, 编码 message['From'] = "{}".format(sender) message['To'] = ",".join(receivers) message['Subject'] = title try: smtpObj = smtplib.SMTP_SSL(mail_host, 465) # 启用SSL发信, 端口一般是465 smtpObj.login(mail_user, mail_pass) # 登录验证 smtpObj.sendmail(sender, receivers, message.as_string()) # 发送 print("mail has been send successfully.") except smtplib.SMTPException as e: print(e) 2 配置alertmanager配置文件并触发告警配置好之后如下所示
在这个博客(https://blog.csdn.net/qq_22543991/article/details/88356928)中 看到了解决办法 于是参考下 把自己的镜像也升级到v0.16.1
容器中的端口如下 是9093 9094 [root@xxxxx ~]# kubectl -n monitoring exec alertmanager-7d5c68df6f-4dzl7 -it -- sh /alertmanager $ netstat -anpt | grep LISTEN tcp 0 0 :::9093 :::* LISTEN 1/alertmanager tcp 0 0 :::9094 :::* LISTEN 1/alertmanager于是怀疑是不是新换的镜像 用了新的配置文件 新的配置文件中有5001端口 4 解决 dial tcp 127.0.0.1:5001: connect: connection refused先查看下我自己的deployment中配置文件的设置 如下 配置文件是/etc/alertmanager/config.yml volumeMounts: - mountPath: /etc/localtime name: localtime readOnly: true - mountPath: /etc/alertmanager/config.yml name: alertmanager-conf readOnly: true subPath: config.yml再去alertmanager容器所在的主机上 查看下对应容器的信息 如下 [root@yyyyy ~]# docker ps | grep alertmanager 2d074e0ab797 rancher/prom-alertmanager "/bin/alertmanager -…" 3 minutes ago Up 3 minutes k8s_container-0_alertmanager-7d5c68df6f-4dzl7_monitoring_bcc9065c-52d6-4ee0-ab12-9e8bbe2b09a8_0 003d2bba2ebd cce-pause:3.1 "/pause" 4 minutes ago Up 4 minutes k8s_POD_alertmanager-7d5c68df6f-4dzl7_monitoring_bcc9065c-52d6-4ee0-ab12-9e8bbe2b09a8_0 [root@yyyyy ~]# docker inspect 2d074e0ab797 [ { "Id": "2d074e0ab797ba255fdeedbf68502fc82913ed24bc65704a835da9bd2345ff92", "Created": "2022-03-23T02:34:40.824251211Z", "Path": "/bin/alertmanager", "Args": [ "--config.file=/etc/alertmanager/alertmanager.yml", "--storage.path=/alertmanager" ], "State": { "Status": "running", "Running": true, ...... 后边的内容省略从上边的信息可以看出 新的镜像用的配置文件是/etc/alertmanager/alertmanager.yml 配置文件不对应 没有读取到正确的配置 5 解决 配置文件不对应的问题配置文件既然不对应 那就想办法让它对应 这里采用的方法是 改自己挂载的configmap 更改deployment 将挂载volume的地方进行修改 修改为如下 volumeMounts: - mountPath: /etc/localtime name: localtime readOnly: true - mountPath: /etc/alertmanager/alertmanager.yml name: alertmanager-conf readOnly: true subPath: alertmanager.yml更改后 pod启动失败 报错如下 说Are you trying to mount a directory onto a file Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 36s Successfully assigned monitoring/alertmanager-5c4664d45f-gq9fn to 172.16.0.216 Normal SuccessfulMountVolume 35s (x2 over 36s) kubelet, 172.16.0.216 Successfully mounted volumes for pod "alertmanager-5c4664d45f-gq9fn_monitoring(b78797dd-7ce8-44fc-b23b-a8bb0c3b5e7e)" Warning FailedStart 35s kubelet, 172.16.0.216 Error: failed to start container "container-0": Error response from daemon: OCI runtime create failed: container_linux.go:330: starting container process caused "process_linux.go:381: container init caused \"rootfs_linux.go:61: mounting \\\"/mnt/paas/kubernetes/kubelet/pods/b78797dd-7ce8-44fc-b23b-a8bb0c3b5e7e/volume-subpaths/alertmanager-conf/container-0/1\\\" to rootfs \\\"/var/lib/docker/devicemapper/mnt/8f3de32f6add251b9b040c16fc26a86d653c850550f075d05831459b8fb17a83/rootfs\\\" at \\\"/var/lib/docker/devicemapper/mnt/8f3de32f6add251b9b040c16fc26a86d653c850550f075d05831459b8fb17a83/rootfs/etc/alertmanager/alertmanager.yml\\\" caused \\\"not a directory\\\"\"": unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type Warning FailedStart 34s kubelet, 172.16.0.216 Error: failed to start container "container-0": Error response from daemon: OCI runtime create failed: container_linux.go:330: starting container process caused "process_linux.go:381: container init caused \"rootfs_linux.go:61: mounting \\\"/mnt/paas/kubernetes/kubelet/pods/b78797dd-7ce8-44fc-b23b-a8bb0c3b5e7e/volume-subpaths/alertmanager-conf/container-0/1\\\" to rootfs \\\"/var/lib/docker/devicemapper/mnt/c5d9b7e08e9d64bcda975995bf695b2292bf9745e5f5d8644e591dd7ee83fdc1/rootfs\\\" at \\\"/var/lib/docker/devicemapper/mnt/c5d9b7e08e9d64bcda975995bf695b2292bf9745e5f5d8644e591dd7ee83fdc1/rootfs/etc/alertmanager/alertmanager.yml\\\" caused \\\"not a directory\\\"\"": unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type Normal Pulled 20s (x3 over 35s) kubelet, 172.16.0.216 Container image "rancher/prom-alertmanager:v0.16.1" already present on machine Normal SuccessfulCreate 20s (x3 over 35s) kubelet, 172.16.0.216 Created container container-0 Warning FailedStart 19s kubelet, 172.16.0.216 Error: failed to start container "container-0": Error response from daemon: OCI runtime create failed: container_linux.go:330: starting container process caused "process_linux.go:381: container init caused \"rootfs_linux.go:61: mounting \\\"/mnt/paas/kubernetes/kubelet/pods/b78797dd-7ce8-44fc-b23b-a8bb0c3b5e7e/volume-subpaths/alertmanager-conf/container-0/1\\\" to rootfs \\\"/var/lib/docker/devicemapper/mnt/04bb9820df0c996854cebfbe4606cc0fc56e9791b83af3183ea5ca70f56374c4/rootfs\\\" at \\\"/var/lib/docker/devicemapper/mnt/04bb9820df0c996854cebfbe4606cc0fc56e9791b83af3183ea5ca70f56374c4/rootfs/etc/alertmanager/alertmanager.yml\\\" caused \\\"not a directory\\\"\"": unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type Warning BackOffStart 6s (x2 over 34s) kubelet, 172.16.0.216 the failed container exited with ExitCode: 127 Warning BackOffStart 6s (x2 over 34s) kubelet, 172.16.0.216 Back-off restarting failed container看了眼alertmanager的configmap 原来文件名字不对应 apiVersion: v1 data: config.yml: | global: resolve_timeout: 5m smtp_from: [email protected] smtp_smarthost: 'smtp.xxx.com:465' smtp_auth_username: [email protected] smtp_auth_password: smtp_password smtp_require_tls: false route: receiver: email group_by: - alertname - kubernetes_namespace - kubernetes_pod group_wait: 10s group_interval: 10s repeat_interval: 1h inhibit_rules: 6 解决configmap跟挂载文件名不对应的问题修改alertmanager的configmap 将配置文件吗修改为alertmanager.yml apiVersion: v1 data: alertmanager.yml: | global: resolve_timeout: 5m smtp_from: [email protected] smtp_smarthost: 'smtp.xxx.com:465' smtp_auth_username: [email protected] smtp_auth_password: smtp_password smtp_require_tls: false route: receiver: email group_by: - alertname - kubernetes_namespace - kubernetes_pod group_wait: 10s group_interval: 10s repeat_interval: 1h inhibit_rules:重启alertmanager pod 一看又失败了 信息如下 原来是yaml中volumes的地方没修改过来 Warning FailedMount 25s kubelet, 172.16.0.216 Unable to attach or mount volumes: unmounted volumes=[alertmanager-conf], unattached volumes=[localtime alertmanager-conf default-token-z4nz6]: timed out waiting for the condition Warning FailedMount 20s (x9 over 2m28s) kubelet, 172.16.0.216 MountVolume.SetUp failed for volume "alertmanager-conf" : configmap references non-existent config key: config.yml修改完之后如下 再次重启alertmanager 更改完毕后 看到了Running 也收到了告警邮件 |
今日新闻 |
推荐新闻 |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |