Architecture

Tech Stack

A modern, open-source stack designed for complete data sovereignty. Every component is self-hosted and containerized.

Data Sources
🗂
CSV
Letters
💻
Web/Phone
Scaleway Hosted on Scaleway
2 Orchestration Prefect
7 Catalog & Tests dbt Docs
11 Monitoring Prometheus + Grafana
3 Ingestion Upload + LLM + Nominatim
4 Landing Zone CH Bronze + RustFS
6 Transformations dbt
5 Data Warehouse CH Silver / Gold
10 API Platform Tyk + FastAPI
12 GIS Layer GeoServer + PostGIS
8 Reporting Grafana + CH Play
1 Infrastructure Docker + Caddy
13 UI Layer Upload + CH-UI + Hop
9 LLM & AI Pixtral (Scaleway)

Component Details

1

Infrastructure

Docker Docker
Caddy Caddy

17+ services running in Docker Compose with Caddy reverse proxy providing automatic HTTPS and clean URL routing to all services.

2

Data Orchestration

Prefect Prefect

Orchestrates the full pipeline: upload processing, LLM extraction, geocoding, ClickHouse loading, dbt builds, and GIS sync. Includes scheduled polling every 5 minutes.

Launch Prefect →
3

Ingestion

FastAPI FastAPI Upload
Pixtral Pixtral LLM
Nominatim Nominatim

Multi-channel data collection: CSV ingestion, drag-and-drop letter upload with AI-powered address extraction (Pixtral vision LLM), and self-hosted geocoding via Nominatim.

Launch Upload Portal →
4

Landing Zone

ClickHouse ClickHouse Bronze
RustFS RustFS

Raw data lands in the ClickHouse bronze layer. Uploaded letter files are stored in RustFS, a self-hosted S3-compatible object store.

Launch ClickHouse Play →
5

Data Warehouse

ClickHouse ClickHouse Silver / Gold

Silver layer provides cleaned and combined staging tables. Gold layer delivers analytics-ready dimensions, facts, and marts with 13+ models for coverage analysis, proximity scoring, and more.

Launch ClickHouse Play →
6

Transformations

dbt dbt

23+ SQL models transforming raw bronze data through silver staging to gold analytics. Includes 40+ automated data quality tests, incremental models, and custom generic tests.

Launch dbt Docs →
7

Data Catalog & Tests

dbt dbt Docs

Auto-generated data catalog with full lineage graph from bronze to gold. Shows table schemas, column descriptions, test coverage, and model dependencies.

Launch dbt Docs →
8

Reporting & Dashboards

Grafana Grafana
ClickHouse ClickHouse Play
CH-UI CH-UI

Five pre-built Grafana dashboards for API traffic, upload tracking, LLM extraction stats, geocoding health, and GIS monitoring. ClickHouse Play and CH-UI for ad-hoc SQL queries.

Launch Grafana →
9

LLM & AI

Pixtral Pixtral 12B
Scaleway Scaleway AI

Mistral's Pixtral 12B vision model hosted on Scaleway extracts street addresses from scanned handwritten letters as structured JSON with confidence scoring.

10

API Platform

Tyk Tyk Gateway
FastAPI FastAPI
Redis Redis

REST API with Swagger docs, authenticated via Tyk gateway with API key management and rate limiting (100 req/60s). Serves analytics, GIS endpoints, and letter uploads.

Launch API Docs →
11

Monitoring & Observability

Prometheus Prometheus
Grafana Grafana

Prometheus scrapes FastAPI metrics every 15 seconds. Grafana provides the visualization layer with five pre-built dashboards monitoring the entire platform.

Launch Prometheus →
12

GIS Layer

GeoServer GeoServer
GeoNode GeoNode
PostGIS PostGIS
Nominatim Nominatim

GeoServer publishes WMS/WFS layers for QGIS and ArcGIS. GeoNode provides a data catalog portal. PostGIS stores spatial data synced from ClickHouse. Nominatim handles self-hosted geocoding.

Launch GeoServer →
13

UI Layer

Upload Upload Portal
CH-UI CH-UI
Hop Apache Hop

Drag-and-drop letter upload portal, full-featured ClickHouse web interface (CH-UI), and Apache Hop visual ETL tool for building pipelines without code.

Launch Upload Portal →