Cấu hình Prometheus và Grafana để giám sát Cluster Kubernetes
Chúng ta sẽ triển khai Prometheus Operator để thu thập metrics từ các node, pod và service trong cluster. Đây là bước nền tảng để có dữ liệu cho bảng điều khiển.
Tại sao: Kubernetes mặc định chỉ lưu metrics ngắn hạn, cần Prometheus để lưu trữ dài hạn và query linh hoạt cho các service DataOps của chúng ta.
Kết quả mong đợi: Pod Prometheus và Grafana chạy trạng thái Running, có thể truy cập dashboard qua port-forward.
Trước tiên, cài đặt Prometheus Operator bằng kubectl apply từ manifest chính thức của community.
kubectl apply -f https://github.com/prometheus-community/helm-charts/raw/main/prometheus-operator/charts/kube-prometheus-stack/templates/default/kustomization.yaml
# Nếu bạn dùng Helm (khuyên dùng):
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack -n monitoring --create-namespace
Chờ khoảng 2-3 phút để các pod được khởi tạo. Kiểm tra trạng thái:
kubectl get pods -n monitoring
Bạn sẽ thấy các pod như `prometheus`, `grafana`, `alertmanager` đang ở trạng thái `Running`.
Để cấu hình Prometheus thu thập metrics từ các service ML của chúng ta (training/inference), ta cần tạo ServiceMonitor. File này sẽ gắn nhãn vào các pod có label `app: ml-service`.
Đường dẫn file cấu hình: `/etc/k8s/configs/ml-service-monitor.yaml`
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: ml-service-monitor
namespace: monitoring
labels:
app: ml-service
spec:
selector:
matchLabels:
app: ml-service
endpoints:
- port: metrics
interval: 15s
path: /metrics
Áp dụng file cấu hình này vào cluster:
kubectl apply -f /etc/k8s/configs/ml-service-monitor.yaml
Verify: Vào trang Prometheus UI (thường ở port 9090), tab "Status" -> "Targets", bạn sẽ thấy target của service ML xuất hiện với trạng thái `UP`.
Thu thập log từ các container Training và Inference
Chúng ta sẽ cấu hình EFK Stack (Elasticsearch, Fluentd, Kibana) hoặc đơn giản hơn là Loki + Promtail để tập trung log. Ở đây dùng Loki vì nhẹ và tích hợp tốt với Grafana.
Tại sao: Log của quá trình training (thường rất dài và chứa traceback) cần được lưu trữ để debug lỗi, nhưng không nên ghi vào Prometheus.
Kết quả mong đợi: Log từ container `training-job` và `inference-service` hiển thị trên Grafana Loki Explorer.
Triển khai Loki và Promtail vào namespace `logging`. Chúng ta dùng Helm chart của Grafana.
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install loki grafana/loki-stack -n logging --create-namespace
helm install promtail grafana/promtail -n logging --values /etc/k8s/configs/promtail-values.yaml
File cấu hình `promtail-values.yaml` cần cấu hình để đọc log từ stdout của container và gắn label `job` dựa trên namespace.
Đường dẫn file: `/etc/k8s/configs/promtail-values.yaml`
config:
clients:
- url: http://loki.logging.svc.cluster.local:3100/loki/api/v1/push
positions:
filename: /var/lib/promtail/positions.yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
clients:
- url: http://loki.logging.svc.cluster.local:3100/loki/api/v1/push
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_app]
action: replace
target_label: job
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: container
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ml-service|training-job
- source_labels: [__name__]
action: labelmap
- source_labels: [__meta
Điều hướng series:
Mục lục: Series: Xây dựng nền tảng DataOps với DVC, MLflow và Kubernetes cho vòng đời AI
« Phần 8: Giám sát và ghi log hệ thống DataOps
Phần 9: Chiến lược mở rộng (Scaling) và bảo mật nâng cao »