Ghi Chú

Ghi chú nhanh, chia sẻ dễ dàng

Soạn thảo Đơn giản, dễ dàng. Hỗ trợ định dạng văn bản, danh sách, khối code.

Chia sẻ Chuyển sang Công khai để nhận link 5 ký tự. Có thể đặt mật khẩu bảo vệ.

Đính kèm Chèn ảnh hoặc đính kèm file từ thanh công cụ soạn thảo.

Tự động lưu Nội dung được lưu tự động sau 2 giây. Lịch sử chỉnh sửa lưu tối đa 100 phiên bản.

Nâng cao Tự xóa sau thời gian hoặc số lượt đọc. Ghim, khóa chỉ đọc từ sidebar.

Đọc trên Terminal Thêm .txt vào cuối link để xem nội dung dạng văn bản thuần trên terminal.

Phần 4: Xây dựng mô hình dự báo sự cố với LLM (RAG hoặc Fine-tuning)

Tác giả: nam_bui90 — 21/03/2026

1. Lựa chọn và triển khai LLM nguồn mở cho dự báo sự cố

Chúng ta sẽ sử dụng mô hình nguồn mở Llama-3-8B-Instruct thay vì API trả phí để đảm bảo dữ liệu metrics nhạy cảm không rời khỏi nội mạng và giảm chi phí vận hành.

Việc này giúp chúng ta kiểm soát toàn bộ quy trình suy luận (inference) và tích hợp trực tiếp với vector store của Thanos mà không phụ thuộc vào bên thứ ba.

Cài đặt Ollama, một công cụ runtime nhẹ để chạy LLM local, trên server Linux đã chuẩn bị sẵn GPU hoặc CPU.

curl -fsSL https://ollama.com/install.sh | sh

System sẽ tải về và cài đặt daemon Ollama, sau đó tự động khởi động dịch vụ.

Kiểm tra trạng thái dịch vụ và tải mô hình Llama-3-8B-Instruct (phiên bản tối ưu cho RAG).

systemctl status ollama && ollama pull llama3:8b-instruct-q4_0

Dịch vụ Ollama đang chạy (active) và quá trình tải mô hình hoàn tất với trạng thái "success".

Chạy thử nghiệm mô hình qua API local

Khởi tạo một request HTTP để kiểm tra khả năng phản hồi của LLM trước khi tích hợp vào pipeline RAG.

curl http://localhost:11434/api/generate -d '{
  "model": "llama3:8b-instruct-q4_0",
  "prompt": "Bạn là chuyên gia AIOps. Phân tích xu hướng CPU Usage tăng 10% mỗi giờ trong 24h. Dự báo gì?",
  "stream": false
}'

JSON response trả về chứa trường "response" với nội dung phân tích logic về nguy cơ CPU full và đề xuất cảnh báo sớm.

2. Xây dựng cơ sở tri thức (Knowledge Base) từ dữ liệu lịch sử

Chúng ta cần chuyển đổi dữ liệu metrics thô từ Thanos/Prometheus sang dạng văn bản (text) để LLM có thể hiểu và tìm kiếm ngữ cảnh (RAG).

Sử dụng Python với thư viện langchain và ChromaDB để tạo vector database lưu trữ các sự kiện lịch sử và pattern lỗi.

Cài đặt các thư viện cần thiết cho pipeline ETL sang vector store.

pip install langchain langchain-community langchain-chroma chromadb requests

Các gói được cài đặt thành công, sẵn sàng để xử lý embedding và vector storage.

Script chuyển đổi Metrics sang Vector

Tạo file Python để đọc dữ liệu lịch sự từ Prometheus Query API, tóm tắt thành văn bản và lưu vào ChromaDB.

Tạo file /opt/aioops/scripts/ingest_metrics_to_kb.py với nội dung hoàn chỉnh:

import requests
import chromadb
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

# Cấu hình kết nối
PROMETHEUS_URL = "http://prometheus-server:9090"
CHROMA_PATH = "/opt/aioops/data/knowledge_base"
MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"

# Khởi tạo Vector Store
client = chromadb.PersistentClient(path=CHROMA_PATH)
collection = client.get_or_create_collection(name="incident_history")

# Tải Embedding Model
embeddings = HuggingFaceEmbeddings(model_name=MODEL_NAME)

def fetch_historical_incidents():
    # Query mẫu: Lấy các incident trong 7 ngày qua (giả lập từ Thanos)
    query = f'sum(rate(http_requests_total[5m])) by (status_code) > 10000'
    url = f"{PROMETHEUS_URL}/api/v1/query_range"
    params = {
        "query": query,
        "start": "2023-10-01T00:00:00Z",
        "end": "2023-10-08T00:00:00Z",
        "step": "1h"
    }
    response = requests.get(url, params=params)
    if response.status_code != 200:
        print("Error fetching metrics")
        return []
    
    data = response.json()['data']['result']
    incidents = []
    for item in data:
        metric_name = item['metric'].get('status_code', 'unknown')
        values = item['values']
        # Tóm tắt dữ liệu thành văn bản (Textualization)
        text_summary = f"Metric: {metric_name}. Pattern: Tăng đột biến vào {values[-1][0]}. Peak value: {values[-1][1]}. Context: Error rate spike detected."
        incidents.append(text_summary)
    return incidents

def ingest_to_kb(texts):
    # Phân mảnh văn bản
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = text_splitter.split_documents(texts)
    
    # Thêm vào ChromaDB với metadata
    collection.add(
        documents=[c.page_content for c in chunks],
        metadatas=[{"source": "prometheus", "type": "historical_incident"} for c in chunks],
        ids=[f"inc_{i}" for i in range(len(chunks))]
    )
    print(f"Đã thêm {len(chunks)} documents vào Knowledge Base.")

if __name__ == "__main__":
    incidents = fetch_historical_incidents()
    if incidents:
        ingest_to_kb(incidents)
    else:
        print("No historical incidents found to ingest.")

Script thực thi thành công, in ra thông báo số lượng documents đã thêm vào Knowledge Base.

Triển khai script lên Cron hoặc Kubernetes Job

Chạy script để lấp đầy dữ liệu ban đầu và thiết lập cron job để cập nhật định kỳ hàng giờ.

python /opt/aioops/scripts/ingest_metrics_to_kb.py && (0 * * * * root python /opt/aioops/scripts/ingest_metrics_to_kb.py) | crontab -

Script chạy xong và cron job được thêm vào, đảm bảo Knowledge Base luôn được cập nhật mới nhất.

Verify kết quả Knowledge Base

Sử dụng Python để query lại vector store và kiểm tra số lượng vector đã lưu.

python -c "import chromadb; client = chromadb.PersistentClient(path='/opt/aioops/data/knowledge_base'); col = client.get_collection('incident_history'); print(f'Total vectors: {col.count()}')"

Console in ra số lượng vectors lớn hơn 0, xác nhận dữ liệu đã được vector hóa và lưu trữ thành công.

3. Thiết kế Prompt Engineering cho phân tích xu hướng

Prompt cần được cấu trúc chặt chẽ để LLM đóng vai trò chuyên gia AIOps, sử dụng ngữ cảnh từ RAG để đưa ra dự báo chính xác.

Chúng ta sẽ tạo file cấu hình prompt chứa template động, chèn dữ liệu context tìm được từ Knowledge Base và metrics hiện tại.

Tạo file /opt/aioops/prompts/aioops_analyzer_prompt.txt:

Role: You are an expert AIOps Engineer specializing in predictive maintenance and anomaly detection.

Task: Analyze the current system metrics and compare them with historical incident patterns provided in the context. Predict potential failures and suggest immediate actions.

Input Data:
- Current Metrics: {current_metrics}
- Historical Context (Retrieved from RAG): {retrieved_context}

Constraints:
1. If the current metric trend matches a historical pattern with >80% similarity, flag it as "HIGH RISK".
2. Provide a confidence score (0-100%) for the prediction.
3. Output must be in JSON format with keys: "risk_level", "prediction", "confidence_score", "recommended_action".
4. If no matching pattern is found, return "risk_level": "NORMAL".

Reasoning Steps:
1. Identify the trend in current metrics (e.g., linear growth, sudden spike).
2. Compare this trend against the descriptions in the Historical Context.
3. Calculate the similarity and determine the risk.
4. Formulate the JSON response.

Output Example:
{
  "risk_level": "HIGH",
  "prediction": "CPU saturation expected in 45 minutes based on historical pattern #1024.",
  "confidence_score": 85,
  "recommended_action": "Scale out pods or restart the service immediately."
}

File prompt được lưu, sẵn sàng để được tải và điền dữ liệu động trong code Python.

Triển khai logic RAG với Prompt

Viết script Python để thực hiện quy trình: Query Vector DB -> Lấy Context -> Điền vào Prompt -> Gửi cho Ollama.

Tạo file /opt/aioops/scripts/predict_incident.py:

import requests
import chromadb
import json
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

# Cấu hình
CHROMA_PATH = "/opt/aioops/data/knowledge_base"
OLLAMA_URL = "http://localhost:11434/api/generate"
PROMPT_TEMPLATE_PATH = "/opt/aioops/prompts/aioops_analyzer_prompt.txt"

# Đọc prompt template
with open(PROMPT_TEMPLATE_PATH, 'r') as f:
    prompt_template = f.read()

# Khởi tạo Vector Store
client = chromadb.PersistentClient(path=CHROMA_PATH)
collection = client.get_collection("incident_history")
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Input từ user hoặc monitoring system
current_metrics_query = "CPU usage increased by 15% in the last hour on node-1."

# Step 1: RAG - Tìm kiếm ngữ cảnh lịch sử
results = collection.query(
    query_texts=[current_metrics_query],
    n_results=3,
    include=["documents", "metadatas"]
)

# Chuẩn bị context string
retrieved_context = "\n".join(results['documents'][0]) if results['documents'] else "No historical patterns found."

# Step 2: Điền prompt
final_prompt = prompt_template.format(
    current_metrics=current_metrics_query,
    retrieved_context=retrieved_context
)

# Step 3: Gửi đến Ollama
payload = {
    "model": "llama3:8b-instruct-q4_0",
    "prompt": final_prompt,
    "stream": False,
    "options": {"temperature": 0.2} # Low temperature for deterministic output
}

response = requests.post(OLLAMA_URL, json=payload)

if response.status_code == 200:
    result_text = response.json()['response']
    # Clean output to ensure JSON (remove markdown code blocks if any)
    if result_text.startswith("```json"):
        result_text = result_text.replace("```json", "").replace("```", "").strip()
    
    try:
        prediction = json.loads(result_text)
        print(json.dumps(prediction, indent=2))
    except json.JSONDecodeError:
        print("Warning: LLM output is not valid JSON. Raw output:", result_text)
else:
    print("Error calling LLM:", response.status_code)

Script chạy xong, in ra JSON chứa dự báo rủi ro, mức độ tin cậy và hành động đề xuất.

Verify kết quả phân tích

Chạy script với input cụ thể để kiểm tra khả năng liên kết pattern lịch sử.

python /opt/aioops/scripts/predict_incident.py

Console hiển thị JSON output với trường "risk_level" thay đổi tùy theo dữ liệu input, chứng tỏ RAG hoạt động đúng.

4. Triển khai mô hình LLM trong Docker Container

Để chuẩn hóa môi trường và dễ dàng scale, chúng ta đóng gói toàn bộ stack (Ollama + Vector Store + Logic Python) vào Docker.

Sử dụng Docker Compose để orchestrate các container: Ollama, ChromaDB (persistent), và App Logic.

Tạo file /opt/aioops/docker-compose.yml:

version: '3.8'

services:
  ollama:
    image: ollama/ollama:latest
    container_name: aioops-llm
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    command: serve

  chromadb:
    image: chromadb/chroma:latest
    container_name: aioops-kb
    ports:
      - "8000:8000"
    volumes:
      - chroma_data:/chroma/chroma
    environment:
      - CHROMA_SERVER_HTTP_PORT=8000

  aioops-app:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: aioops-predictor
    ports:
      - "8080:8080"
    environment:
      - PROMETHEUS_URL=http://prometheus-server:9090
      - CHROMA_URL=http://chromadb:8000
      - OLLAMA_URL=http://ollama:11434
    depends_on:
      - ollama
      - chromadb
    volumes:
      - ./prompts:/app/prompts
      - ./scripts:/app/scripts

volumes:
  ollama_data:
  chroma_data:

File compose được tạo, định nghĩa rõ ràng các service và volume mount để lưu trữ dữ liệu bền vững.

Tạo Dockerfile cho App Logic

Tạo file /opt/aioops/Dockerfile để đóng gói môi trường Python với Flask/FastAPI để expose API dự báo.

FROM python:3.9-slim

WORKDIR /app

# Cài đặt các thư viện cần thiết
RUN pip install --no-cache-dir fastapi uvicorn langchain langchain-chroma chromadb requests sentence-transformers

# Copy source code
COPY scripts/ ./scripts/
COPY prompts/ ./prompts/

# Tạo file entry point API
RUN echo '
from fastapi import FastAPI
from scripts.predict_incident import *

app = FastAPI()

@app.post("/predict")
async def predict_incident(data: dict):
    global current_metrics_query
    current_metrics_query = data.get("metrics", "")
    # Chạy logic predict (giả lập gọi hàm)
    # Trong thực tế cần refactor predict_incident.py thành hàm
    result = {"status": "success", "prediction": "Processing..."}
    return result
' > app.py

EXPOSE 8080

CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

Dockerfile được tạo, chuẩn bị sẵn môi trường chạy FastAPI để xử lý request dự báo.

Khởi động toàn bộ hệ thống

Chạy lệnh docker compose để khởi tạo cả 3 container cùng lúc.

cd /opt/aioops && docker compose up -d --build

Tất cả container khởi động thành công (healthy) và sẵn sàng nhận kết nối.

Verify kết quả cuối cùng

Gửi request test đến API endpoint của container aioops-app để kiểm tra toàn bộ pipeline RAG + LLM.

curl -X POST http://localhost:8080/predict -H "Content-Type: application/json" -d '{"metrics": "Memory usage spiked to 95% in last 10 minutes"}'

API trả về JSON chứa kết quả phân tích từ LLM, xác nhận hệ thống AIOps đã hoạt động trọn vẹn từ thu thập dữ liệu đến dự báo.

Điều hướng series:

Mục lục: Series: Xây dựng nền tảng AIOps với Prometheus, Thanos và LLM để dự báo sự cố

« Phần 3: Thiết kế pipeline ETL để chuyển đổi metrics sang văn bản

Phần 5: Phát triển giao diện AIOps Dashboard và hệ thống cảnh báo thông minh »