Hệ thống SaaS nội bộ, air‑gapped, phân tích diff/PR của Java (chủ đạo) + Python + PHP; tự động tag owner, đề xuất fix, RCA, ước lượng tác động & thời gian fix; lưu 90 ngày raw diff; RAG cấp chức năng “Hỏi dự án”.
rca-rag/
├─ apps/
│ ├─ gateway/ # FastAPI app (webhooks, REST API)
│ │ ├─ main.py # FastAPI app, routers
│ │ ├─ webhook/
│ │ │ └─ github.py # GitHub webhook handler
│ │ └─ api/ # REST endpoints (health, metrics, admin)
│ ├─ ingestion/ # Ingestion worker
│ │ ├─ worker.py # Main worker loop
│ │ ├─ processors/ # Event processors
│ │ ├─ storage/ # Storage operations
│ │ └─ retention/ # Retention job worker
│ ├─ notifier/ # Notification service
│ │ ├─ client.py # Google Chat client
│ │ └─ formatters.py # Message formatting
│ ├─ shared/ # Shared code
│ │ ├─ config.py # Pydantic Settings
│ │ ├─ db/ # Database setup, models
│ │ ├─ storage/ # S3/MinIO abstraction
│ │ ├─ mq/ # Message queue abstraction
│ │ └─ utils/ # Utilities
├─ infra/
│ ├─ migrations/ # Alembic migrations
│ └─ sql/ # Raw SQL scripts
├─ deployments/
│ ├─ docker-compose.dev.yml
│ └─ docker/
│ ├─ Dockerfile.api
│ └─ Dockerfile.worker
├─ configs/
│ ├─ app.example.env # Environment template
│ ├─ logging.yaml
│ ├─ ruff.toml
│ └─ mypy.ini
├─ tests/ # Test suite
│ ├─ unit/
│ ├─ integration/
│ └─ fixtures/
├─ pyproject.toml
├─ requirements.txt
└─ README.md
GET /health - Basic health checkGET /health/ready - Readiness check (Kubernetes)GET /health/live - Liveness check (Kubernetes)GET /metrics - Prometheus metricsPOST /webhooks/github - GitHub webhook endpoint
push, pull_request, pull_request_review, check_runX-Hub-Signature-256 header for signature verificationPOST /admin/service-map - Update service map for a repository
{"repo_name": "org/repo", "service_map": {...}}GET /admin/service-map/{repo_name} - Get service map for a repositoryPOST /rag/query - Query codebase using RAG
{"question": "How does authentication work?", "repo": "org/repo", "top_k": 5, "files": ["app/auth/**"]}New to the project? See GETTING_STARTED.md for a comprehensive step-by-step guide with troubleshooting tips.
git clone <repo-url>
cd rca-rag
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e .
cp configs/app.example.env configs/.env
# Edit configs/.env with your settings
docker-compose -f deployments/docker-compose.dev.yml up -d
alembic -c infra/migrations/alembic.ini upgrade head
# Start API gateway
uvicorn apps.gateway.main:app --host 0.0.0.0 --port 8080
# Start ingestion worker (in separate terminal)
python -m apps.ingestion.worker
# Start analysis worker (in separate terminal)
python -m apps.analysis
# Start indexer worker in message queue mode (in separate terminal, optional)
python -m apps.indexer --mq
# Or manually index a repository
python -m apps.indexer <repo_id>
For detailed setup instructions, troubleshooting, and configuration options, see GETTING_STARTED.md.
Key configuration options in configs/.env:
DATABASE_URL: PostgreSQL connection stringMQ_TYPE: Message queue type (redis, kafka, rabbitmq)STORAGE_TYPE: Storage type (s3, minio)GITHUB_WEBHOOK_SECRET: GitHub webhook secret for signature verificationGOOGLE_CHAT_WEBHOOK_URL: Google Chat webhook URLRETENTION_DAYS: Days to retain raw diffs (default: 90)EMBEDDING_MODEL_NAME: Embedding model for RAG (default: all-MiniLM-L6-v2)RULES_CONFIG_PATH: Path to rules YAML configuration fileCVE_DB_PATH: Path to CVE database file (optional)INDEXING_ENABLED: Enable automatic indexing after analysis (default: true)ANALYSIS_TIMEOUT: Analysis timeout in seconds (default: 300)See configs/app.example.env for all available options.
# Run all tests
pytest
# Run with coverage
pytest --cov=apps --cov-report=html
# Run specific test file
pytest tests/unit/test_event_normalization.py
# Lint
ruff check .
# Type check
mypy apps
# Format code
ruff format .
# Create new migration
alembic -c infra/migrations/alembic.ini revision --autogenerate -m "description"
# Apply migrations
alembic -c infra/migrations/alembic.ini upgrade head
# Rollback
alembic -c infra/migrations/alembic.ini downgrade -1
FastAPI application handling:
Background worker processing:
Background worker processing:
Background worker for RAG indexing:
Run indexer worker:
# Message queue mode (automatic)
python -m apps.indexer --mq
# Manual mode (index specific repository)
python -m apps.indexer <repo_id>
Google Chat integration:
Scheduled job for:
The system supports multiple message queue backends:
Configure via MQ_TYPE environment variable.
Configure via STORAGE_TYPE environment variable.
The RAG (Retrieval-Augmented Generation) system allows querying the codebase using natural language.
curl -X POST http://localhost:8080/rag/query \
-H "Content-Type: application/json" \
-d '{
"question": "How does user authentication work?",
"repo": "org/repo",
"top_k": 5,
"files": ["app/auth/**"]
}'
{
"answer": "Based on the codebase: ...",
"citations": [
{
"path": "app/auth/AuthService.java",
"commit": "abc123...",
"lines": [10, 20],
"score": 0.95
}
],
"chunks": [...]
}
The system uses hybrid retrieval (BM25 + vector search) to find relevant code snippets and provides citations for traceability.
Prometheus metrics available at /metrics:
webhook_received_total: Total webhooks received by event type and statuswebhook_processing_seconds: Time spent processing webhooksStructured logging with Loguru:
Configuration in configs/logging.yaml.
Run retention job manually:
# Dry run
python -m apps.ingestion.retention.worker --dry-run
# Actual cleanup
python -m apps.ingestion.retention.worker
X-Hub-Signature-256Internal use only.