Kubernetes PromQL (Prometheus Query Language) CPU aggregation walkthrough

您所在的位置:网站首页 container_cpu_usage_seconds_total Kubernetes PromQL (Prometheus Query Language) CPU aggregation walkthrough

Kubernetes PromQL (Prometheus Query Language) CPU aggregation walkthrough

2024-07-13 00:37| 来源: 网络整理| 查看: 265

Note:I am pretty much a beginner with PromQL but have been using a lot of graphite and influxdb queries. this is how I understand this query language and if I am wrong on any part please feel free to comment and I’ll correct it.

Prometheus comes with its own query language called PromQL, Understanding PromQL is difficult, let alone the scary syntax — especially if you are supposed to come up with queries on your own.

I‘m not going to cover how to install and configure Prometheus here, the easiest way is via a helm chart, I’m going to walk you through a query parts until we get the desired output.

I like to approach huge and scary tasks by breaking it into small chunks — it helps me understand exactly what I’m doing while getting confidence in accomplishing the goal.

So lets break it into very small chunks:

What do we have:Multiple pods running in replica / deployment often spreading into multiple hosts.

What do we want to monitor:We want to aggregate the cpu usage by a pod label called app.

Where do we start:cAdvisor (Container Advisor) has the resources (cpu, memory, etc) monitored and exported as metrics, every Kubernetes node runs kubelet which has cAdvisor is compiled into it.

For the sake of simplicity all the pods we want to collect metrics on are on the same namespace.

The tools:

Prometheus-server port-forwarded from the local computer.Simple cURL with jq that’s all you really need.Grafana — Visualization.

Query building:

What we want to end up with is a graph per cpu by the deployment:

Step 1 — Get the cpu usage per container:

We’ll start with the most simple metric container_cpu_usage_seconds_total if we only run this we’ll get all the containers in all namespaces and we will get a metric every second, this isn’t what we want. we want to aggregate these in order to get a “CPU per second” metric

Note how the metric name container_cpu_usage_seconds_total has a suffix of seconds_total, this indicates that the metric is an accumulator. if we want to get the usage per second we need to add a function that will produce that.

The rate(v range-vector)[time]function takes a range vector. it calculates the average over time.

Now we can filter by a name space and get the cpu usage per second on a 5 minutes average.

You can use the following to run your query from the command line:

make sure you have a port foraward to your promethues serverrun the following commandcurl -s http://127.0.0.1:9090/api/v1/query\?query\=container_cpu_usage_seconds_total | jq

which return a result like this:

a few things to notice here:

The cpu label value is cpu00, this means that the containers might be running on different cpu’s too.Each metric only has the pod_name but missing the pod labels, this means that we don’t have a label that can aggregate all pods on a deployment / replicaSet.

If we get the number of pods for that name space using kubectl get pods, we will get 11 pods, but the metrics above will show more entries since we are looking at containers not pods.

to aggregate the results we got by pod_name we add the function sum()

now it will match the number of pods we saw using the kubectl get pods, and we will get the cpu by pod.

If you want feel free to add another label “instance” (comma separated) to the sum by section, which will aggregate by host as well.

However, we are not entirely where we want to be.we want to combine all the pods cpu_usage per deployment / replicaSet, the common element that every pod member of deployment has is it’s labels.here the pod label we want is app, so now we need to find the labels of the pods to further aggregate them.

Step 2 — Get the pod labels:

We can get the pod labels via the kube_state_metrics by running the query:

labels do not produce a metric value so they have the value of 1 (exists)

Notice that here we have another label of “pod” which is the same as the “pod_name” from the container_cpu_usage_seconds_total query, we could use that as the joining element on both queries.

We don’t care about all the other labels right now, all we need is label_app and pod.

If we want to get only a few labels in a result, we need to use the by clause (similar to group by in SQL). in order to do that it has to be a part of an aggregation function, so we will use the max() function to return these labels, using max() will keep the values at 1.

Step3 — Prepare the results for a one-to-many match

Now we have 2 sets of results:

The many side — the cpu usage results containing a list of metrics, each with a label of “pod_name”.The one side — the pod labels results containing a list of metrics, each with a label of “pod” which is the same value as the “pod_name” from the previous results.

one set of results from kube_pod_labels as opposed to many results for cpu_usage.

The problem is when trying to match these 2 results (join) PromQL needs the exact labels to exists in the same set or else the combined result will be empty (no match).

In order to do that we need to replace the label pod_name with podto do that we will use the following function:

label_replace( , "", "$1", "", "(.+)")

so let’s plug these values into the place holders:

the label_replace is a bit misleading as it essentially adds a label rather than replacing one.

Part 4 — Join the results one-to-many match:

First we need to understand that the results sets type is a vector — we are looking at a set of timeseries not a single one , so in order to join these we need some sort of binary operation on them.

prometheus has the following operators:

Arithmetic Binary Operators: +, -, *, /, %, ^Comparison Binary Operators: ==, !=, >, >=,


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3