Prometheus+Grafana部署实践笔记

Prometheus+Grafana部署实践笔记

参考

https://prometheus.io/docs/instrumenting/writing_exporters/
https://grafana.com/docs/grafana/latest/getting-started/build-first-dashboard/
https://grafana.com/docs/grafana/latest/alerting/alerting-rules/create-grafana-managed-rule/


安装

下载软件包

Prometheushttps://prometheus.io/download/
Grafanahttps://grafana.com/grafana/download

prometheus-2.41.0.windows-amd64.zip
grafana-9.3.2.windows-amd64.zip

Prometheus Exporter:
windows_exporterhttps://github.com/prometheus-community/windows_exporter
blackbox_exporterhttps://github.com/prometheus/blackbox_exporter

WinSW封装

Config File:

配置

Prometheus

prometheus.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
# my global config
global:
scrape_interval: 60s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 60s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).

# Alertmanager configuration
# alerting:
# alertmanagers:
# - static_configs:
# - targets:
# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

# 一个仅包含一个要抓取的端点的抓取配置:
# 这是普罗米修斯本身。
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]

# 79服务器节点状态监控数据
- job_name: 'ServerNodeStatus_79'
static_configs:
- targets: ['10.170.10.79:9182']
# 79服务器节点状态监控数据

# 78服务器节点状态监控数据
- job_name: 'ServerNodeStatus_78'
static_configs:
- targets: ['10.170.10.78:9182']

# 82服务器节点状态监控数据
- job_name: 'ServerNodeStatus_82[DB]'
static_configs:
- targets: ['10.170.10.82:9182']

# 10.170.10.78 服务器上的应用Http状态监控
- job_name: "AppStatus_78"
metrics_path: /probe #获取指标的访问路径
params:
module: [http_2xx] #使用GET http 200模块
file_sd_configs:
- refresh_interval: 1m #探测目标配置文件刷新间隔,修改AppStatus*.yml后无需重启blackbox。
files:
- "D:/A_Prometheus/Probe_*_78.yml"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.170.10.79:9115 #blackbox 开放地址

# 10.170.10.79 服务器上的应用Http状态监控
- job_name: "AppStatus_79"
metrics_path: /probe #获取指标的访问路径
params:
module: [http_2xx] #使用GET http 200模块
file_sd_configs:
- refresh_interval: 1m #探测目标配置文件刷新间隔,修改AppStatus*.yml后无需重启blackbox。
files:
- "D:/A_Prometheus/Probe_*_79.yml"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.170.10.79:9115 #blackbox 开放地址

# LIS平台接入的接口http GET状态监控
- job_name: "ServiceStatus_LIS"
metrics_path: /probe #获取指标的访问路径
params:
module: [http_2xx] #使用GET http 200模块
file_sd_configs:
- refresh_interval: 1m #探测目标配置文件刷新间隔,修改AppStatus*.yml后无需重启blackbox。
files:
- "D:/A_Prometheus/Probe_ServiceStatus_LIS.yml"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.170.10.79:9115 #blackbox 开放地址

# LIS平台接入的接口http POST状态监控
- job_name: "ServiceStatus_LIS_POST"
metrics_path: /probe #获取指标的访问路径
params:
module: [http_post_2xx] #使用POST http 200模块
file_sd_configs:
- refresh_interval: 1m #探测目标配置文件刷新间隔,修改AppStatus*.yml后无需重启blackbox。
files:
- "D:/A_Prometheus/Probe_ServiceStatus_LIS_POST.yml"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.170.10.79:9115 #blackbox 开放地址

# 采集信息的接口Http GET状态监控
- job_name: "ServiceStatus_COLL"
metrics_path: /probe #获取指标的访问路径
params:
module: [http_2xx] #使用GET http 200模块
file_sd_configs:
- refresh_interval: 1m #探测目标配置文件刷新间隔,修改AppStatus*.yml后无需重启blackbox。
files:
- "D:/A_Prometheus/Probe_ServiceStatus_COLL.yml"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.170.10.79:9115 #blackbox 开放地址

# 检验报告/采集信息中间库 TCP监控
- job_name: "DBStatus_LIS"
metrics_path: /probe #获取指标的访问路径
params:
module: [tcp_connect] #使用tcp_connect模块
file_sd_configs:
- refresh_interval: 1m #探测目标配置文件刷新间隔,修改AppStatus*.yml后无需重启blackbox。
files:
- "D:/A_Prometheus/Probe_DBStatus_LIS.yml"
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.170.10.79:9115 #blackbox 开放地址

windows_exporter

config.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
collectors:
enabled: "[defaults],memory,iis,process"
#仅启用服务收集器
collector:
service:
services-where: "Name='nginx' OR Name='Redis'"
iis:
site-whitelist: "Plat|PatLis"
app-whitelist: "Plat|PatLis"
process:
whitelist: "(dbprobe-local|afw|Enjoyor|enjoyor|PushDataWorkService|Redis|nginx).*"
log:
level: warn
telemetry:
addr: ":9182"
path: /metrics
#最大并发请求数。0禁用。
max-requests: 5

blackbox_exporter

blackbox.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
modules:
http_2xx:
prober: http
timeout: 60s
http:
method: GET
preferred_ip_protocol: "ip4"
no_follow_redirects: false
tls_config:
insecure_skip_verify: true
# fail_if_ssl: false
# fail_if_not_ssl: false
# fail_if_body_matches_regexp:
# - "Could not connect to database"
# fail_if_body_not_matches_regexp:
# - "Download the latest version here"
# fail_if_header_matches: # 验证未设置任何 Cookie
# - header: Set-Cookie
# allow_missing: true
# regexp: '.*'
# fail_if_header_not_matches:
# - header: Access-Control-Allow-Origin
# regexp: '(\*|example\.com)'
# ip_protocol_fallback: false # no fallback to "ip6"
http_post_2xx:
prober: http
http:
method: POST
preferred_ip_protocol: "ip4"
no_follow_redirects: false
tls_config:
insecure_skip_verify: true
headers:
Content-Type: application/json
body: '{}'
tcp_connect:
prober: tcp
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
grpc:
prober: grpc
grpc:
tls: true
preferred_ip_protocol: "ip4"
grpc_plain:
prober: grpc
grpc:
tls: false
service: "service1"
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
- send: "SSH-2.0-blackbox-ssh-check"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
icmp_ttl5:
prober: icmp
timeout: 5s
icmp:
preferred_ip_protocol: "ip4"
ttl: 5

Probe Type

web application :

Probe_AppStatus_78.yml
1
2
3
- targets:
#web application
- https://10.170.10.78:8466

database :

Probe_DBStatus_LIS.yml
1
2
3
- targets:
#数据库地址
- 10.110.10.110:1433

webservice:

Probe_ServiceStatus_COLL.yml
1
2
3
- targets:
#webservice
- http://192.168.33.99:8060/WebService/WebServce.asmx

http POST WebAPI :

Probe_ServiceStatus_LIS_POST.yml
1
2
3
- targets:
# http POST WebAPI
- http://192.134.24.120:8902/Interface/GetIndex

Grafana

配置访问协议和端口地址

custom.ini
1
2
3
4
5
6
7
[server]
# Protocol (http, https, h2, socket)
protocol = https

# The http port to use
http_port = 7070

配置Grafana默认主题

custom.ini
1
2
3
4
5
# Default UI theme ("dark" or "light")
default_theme = light

# Default locale (supported IETF language tag, such as en-US)
default_locale = zh-CN

Grafana SSL Config

Grafana Config:

Custom.ini
1
2
3
4
5
6
7
8
#################################### Server ####################################
[server]
# Protocol (http, https, h2, socket)
protocol = https
......
# https certs & key file
cert_file = D:\A_Prometheus\grafana-oss\cert\Grafana.crt
cert_key = D:\A_Prometheus\grafana-oss\cert\Grafana.key

如果是使用自建CA根证书签署Grafana自签名证书的,需要将根证书的CRT内容合并到Grafana的CRT中。

1
cat ROOTCA.crt > Grafana.crt

Grafana 通知模板配置

Grafana 自定义中文Alert模板

Grafana面板

地址:https://github.com/YuanjianZhang/Grafana_Deploy

作者

zhang

发布于

2022-11-23

更新于

2023-09-19

许可协议

CC BY-NC-SA 4.0

Your browser is out-of-date!

Update your browser to view this website correctly.&npsb;Update my browser now

×