A data story

Cloudy: estimating cloud and lightning over Sweden

Cloudy answers a practical question for Swedish locations: what is normal here, what has happened recently, and how much should that change the next few weeks? This deck follows the product as it is: the data we ingest, the cleanup rules, the read path, the normals view, the weekly outlook, the spatial estimate, and the deploy path.

SMHI metobs SMHI lightning PostgreSQL FastAPI Pydantic Backtested outlook React + TypeScript
Open the app

Scroll to begin ↓

PART 1

Where we are going

What the data gives us, and the two gaps we close: space between stations, and recency.

Roadmap

What the data can and cannot say

SMHI gives us cloud cover at 109 weather stations across Sweden. That tells us how cloudy it is at each station, but nothing about the air in between. A seasonal normal tells us how cloudy a place usually is in a given week of the year, but it is flat year over year and has no sense of what the last few weeks actually did. We close those two gaps with two kinds of model: spatial models fill in space, and a damped persistence model adds recency in time.

Cloud is known AT stations, unknown in between ? value unknown between stations The normal: how it usually is — no recency flat seasonal climatology, same every year spatial models fill space damped model add time
Two gaps in the raw data, and the model family that fills each.

Roadmap

The map: space on one axis, time on the other

This grid is the map for the deck. The horizontal axis is time: on the left, the seasonal normal; on the right, recent conditions. The vertical axis is space: at the top, values at stations; at the bottom, estimates at any point. The shipped product fills the grid with small, explicit pieces: normals, damped persistence, kNN spatial estimates, and their composition. Lightning is area-based, so it uses its own simpler shape: count strike-days in a circle and compare them to the seasonal normal.

TIME → normal / how it usually is now / recency SPACE ↓ at a station at any point normals we already have this damped persistence Prediction part — adds time spatial kNN / nearest Spatial part — adds space combined done — kNN × damped Lightning is area-based and sits outside this grid — normals only so far.
Each piece improves one axis. The point-and-recency case is the composition of kNN spatial normals and the damped weekly outlook.

PART 2

The data

Two live SMHI sources: what each contains, what it does not, and how it is served.

Data

Two live sources, each isolated

The system serves two SMHI observation feeds. Each has its own ingest module and carries a (source, source_version) tag inside its natural key, so feeds can never collide. Cloud is a station time series. Lightning is a country-wide event stream. Everything else in the app is derived from those two sources.

SourceWhat it isProvidersource tag
SMHI cloud cover (param 16)hourly cloud % at stationsSMHI metobssmhi-metobs
SMHI lightningper-discharge strike eventsSMHIsmhi-lightning
Live ingest modules: ingest/cloud.py and ingest/lightning.py. The read path and model views are built from these tables and their rollups; no proxy weather source is served to users.

Data

SMHI cloud cover: the served base

This is the only source actually served to users. It gives hourly total cloud cover at each station, but only at the station: there is nothing in between, nothing sub-hourly, and no wind, temperature, or humidity to explain the value.

SMHI cloud (param 16) WHY served ground truth HAS hourly cloud_pct 0–100 per station, quality flag G/Y kept NOT nothing between stations, no sub-hourly, no wind/temp/humidity, no cloud type or base height SIZE 10.18M rows / 2.2 GB  ·  2015-01-01 → 2026-06-13 109 active of 459 stations  ·  4.8% of hours NULL (gaps kept) Two feeds: immutable corrected-archive + rolling latest-months (upserted)
The card states why we keep the source, what each row holds, what it cannot tell us, and its size on disk.

Data

SMHI lightning: point events everywhere

Unlike cloud, lightning covers the whole country, not just stations: one row per discharge. It is the storm-activity signal. It carries no cloud cover, no stored density raster, and no altitude.

SMHI lightning (CSV) WHY storm signal, high-resolution point events country-wide HAS per-discharge ts_utc (µs), lat/lon, signed peak_current_ka, multiplicity, cloud_indicator (0 = cloud-to-ground), raw quality geometry NOT no cloud %, no stored density field (re-aggregated on query), no altitude SIZE 4.10M rows / 1.0 GB  ·  2015-01-01 → 2026-06-11 1.74M cloud-to-ground (~42%) Density is re-aggregated per query, never stored as a raster
The card states why we keep lightning, what each discharge row holds, what it lacks, and its size on disk.

Data

Exploration is the raw-data workbench

The app has two exploration pages beside Normals and Predictions. Exploration shows cloud and lightning as synchronized time-series charts for the selected location. Map shows lightning events in the current viewport and time window. These are the pages that make the raw feeds inspectable before they become seasonal normals or weekly outlooks.

/app/?view=explore&location=Stockholm Exploration Cloud + lightning chart Same window, same aggregation, two feeds Auto / Week / Month / Year cloud lightning /app/?view=map&location=Stockholm Map Lightning in the viewport Map query follows pan, zoom, and time range viewport-scoped Events are fetched by map bounds and visible time window, not pre-rendered as a raster.
The exploration pages are deliberately direct: one chart view for cloud and lightning over time, one map view for lightning events in space.

Data

Rollups: the chart page never scans raw rows

The Exploration chart asks for cloud at a chosen resolution — anywhere from hourly to yearly. Aggregating cloud_hourly on every request would be slow, so on each ingest we precompute per-station buckets at six fixed resolutions and store them in cloud_rollups. The read path then serves those rows directly: a chart is a lookup, not an aggregation. Each bucket also keeps its counts — observed, expected, missing — so a gappy week is reported as gappy instead of silently averaging over holes. Because it materializes six resolutions across every station and every bucket, this derived table is the largest on disk, which the next slide shows.

# ingest/cloud.py — rebuilt from cloud_hourly on every ingest
ROLLUP_RESOLUTIONS = ("hour", "6h", "day", "week", "month", "year")


class CloudRollup(SQLModel, table=True):
    station_id: int
    resolution: str  # one of ROLLUP_RESOLUTIONS
    bucket_start: datetime
    observed_count: int
    expected_count: int
    missing_count: int
    mean_cloud_pct: float | None
    p05_cloud_pct: float | None
    p50_cloud_pct: float | None
    p95_cloud_pct: float | None
One row per (station, resolution, bucket): counts plus mean and percentiles, so the chart reads a summary instead of scanning hours.

Data

Where the bytes actually go

The live database holds two raw sources plus the precomputed serving rollups. The chart below is a snapshot of table size on disk, in megabytes; longer bars are larger tables. The top bar, cloud_rollups, is derived data, not raw input, and is the single largest table at 5.7 GB because it stores every station at six serving resolutions.

cloud_rollups cloud_hourly lightning_events 5729 2205 1022 size on disk (MB)
Each bar is one live Postgres table. The orange top bar is precomputed rollups, not raw data; the derived serving table outweighs the raw inputs because it is optimized for the chart read path.

PART 3

Cleaning the data

Where raw values are normalized, where they are preserved, and how writes stay repeatable.

Cleanup

One place turns raw values into cloud percent

Every raw cloud reading passes through a single function before it becomes a percentage. Sentinels (113, 9999, -9999), negatives, anything over 100, and octa readings above 8 all return None. The 113 code means the sky was obscured or not observable, so we treat it as unknown rather than as overcast. Octas convert with value / 8 * 100. Because the fix lives in one place, there is no second code path to keep in sync.

MISSING_SENTINELS = frozenset({113, 9999, -9999})


def normalize_cloud_pct(raw, *, octas=False):
    if raw is None or raw == "":
        return None
    try:
        value = float(raw)
    except (TypeError, ValueError):
        return None
    if value in MISSING_SENTINELS or value < 0:
        return None
    if octas:
        if value > 8:
            return None
        return value / 8.0 * 100.0
    if value > 100:
        return None
    return value
core/units.py — the only place raw SMHI values become cloud percent.

Cleanup

Lightning is kept raw

Strike events get no value sanitizer. We do not filter peak current or any other field against a physical range, because the raw measurements are the signal we want to keep. The only thing we drop is structurally malformed CSV lines: a row that fails to parse is caught, counted, and skipped. Everything that parses is stored verbatim, including the raw quality geometry, so it stays available for later use.

for line in csv.DictReader(f, delimiter=";"):
    try:
        ts = datetime(int(line["year"]), ...)
        row = {"peak_current_ka": float(line["peakCurrent"]), ...}
        rows.append(row)
    except (KeyError, ValueError, TypeError):
        skipped += 1
ingest/lightning.py parse_rows — malformed lines are counted, not cleaned.

Cleanup

Ingest is idempotent and incremental

Each ingest unit is replaced or upserted inside one transaction, so re-running a day or a station does not duplicate rows or leave a half-written state. The (source, source_version) pair is part of every natural key. Corrected cloud archives use delete-then-insert; rolling cloud updates use upsert; lightning replaces one day and refreshes its rollup. New data lands without rewriting the full archive.

with engine.begin() as conn:  # one transaction: replace the whole day
    conn.execute(
        delete(LightningEvent).where(
            LightningEvent.day == day,
            LightningEvent.source_version == SOURCE_VERSION,
        )
    )
    if rows:
        conn.execute(insert(LightningEvent), rows)
    refresh_sweden_daily_rollups(conn, day, day)
ingest/lightning.py — delete, insert, and rollup refresh in one transaction.

PART 4

Architecture

The stack and the decisions that keep the read path fast and the Python/TypeScript contract mechanical.

Architecture

Typed requests, typed responses

Pydantic validates every request against bounds and enums before any query runs. A latitude outside the Sweden envelope, a radius that is not 50 or 100, or a longitude without its latitude all return a 422 instead of a bad result. Responses are plain TypedDicts, so they describe shape at type-check time and cost nothing at runtime.

class PredictionsCloudQuery(BaseModel):
    model_config = {"populate_by_name": True}

    lat: Annotated[float | None, Field(ge=54.0, le=70.0)] = None
    lon: Annotated[float | None, Field(ge=9.0, le=26.0)] = None
    radius_km: Literal[50, 100] = 50

    @model_validator(mode="after")
    def _location_is_a_pair(self) -> Self:
        if (self.lat is None) ^ (self.lon is None):
            raise ValueError("lat and lon must be provided together")
        return self
predictions/query.py — Sweden envelope (lat 54–70, lon 9–26) and paired-coordinate check.

Architecture

TypeScript types generated from Python

The frontend does not hand-write API types. FastAPI emits an OpenAPI schema from the same response models, and openapi-typescript turns that schema into schema.gen.ts, which the frontend imports. The generation runs offline — no server, no database — so it works the same in CI and on a laptop. Rename a field in Python and the dependent TypeScript stops compiling.

# frontend/scripts/gen-api.sh
( cd "${BACKEND_DIR}" && uv run python -c \
    "import json, cloudy.api as a; print(json.dumps(a.create_app().openapi(), indent=2))" \
) > "${SCHEMA_JSON}"

node -e "JSON.parse(require('fs').readFileSync(process.argv[1],'utf8'))" "${SCHEMA_JSON}"

pnpm exec openapi-typescript "${SCHEMA_JSON}" -o "${SCHEMA_TS}"
create_app().openapi() → openapi.json → validate → openapi-typescript → schema.gen.ts (1606 lines).

Architecture

Precompute at ingestion

The Sweden-wide normal is a percentile scan over about 10 million rows, which takes roughly 10 seconds when run live — too slow for a request. So we run that scan once per ingest, off the request path, and write the result to a table. The read path then serves the materialized rows. Located queries touch only a few stations, so those stay live and sub-second.

def refresh_sweden_normals(
    engine, source="smhi-metobs", source_version="1.0"
) -> int:
    written = 0
    with engine.begin() as conn:
        for period in ("day", "month", "year"):
            bucket = f"EXTRACT({_PERIOD_FIELD[period]} FROM ts_utc)::int"
            rows = conn.execute(
                text(_NORMAL_SQL.format(bucket=bucket, station_filter=_SWEDEN_FILTER))
            ).all()
            conn.execute(delete(CloudNormal).where(...))
            if rows:
                conn.execute(insert(CloudNormal), [... for row in rows])
                written += len(rows)
    return written
cloud_normals: one row per (scope, period, bucket) with mean, p10/p50/p90, and clear/partial/overcast shares.

Architecture

Cache behind a Protocol

A real deployment would use a shared cache like Redis. Here it is an in-memory LRU. The route code never names either one: it talks to a Cache Protocol whose values are JSON strings only. Swapping the backend is a new implementation plus a config value, not a change to any route.

class Cache(Protocol):
    def get(self, key: str) -> str | None: ...

    def set(self, key: str, value: str, ttl_s: int) -> None: ...


@lru_cache  # one cache instance per process
def get_cache() -> Cache:
    backend = get_settings().cache_backend
    if backend == "memory":
        return MemoryCache()
    raise ValueError(f"unknown cache backend: {backend!r} (supported: memory)")
core/cache.py — MemoryCache is an OrderedDict LRU with lazy TTL, maxsize 1024. Routes cache under composed keys, e.g. clim:cloud:{lat}:{lon}:{radius}:{period}, with CACHE_TTL_S = 3600.
Values are JSON strings only, so a shared backend like Redis drops in without touching the contract.

Goals

Two goals, two different problems

Cloud and lightning need different shapes of answer. Cloud is a value at a point, and the nearest stations may be far away, so the work is estimating across distance. Lightning is regional — a strike lands somewhere in an area, and how far that is from a station does not matter — so the work is counting in a circle. Recency (the damped model) is explored for both, but cloud is the clean case.

Cloud at a pointquery pointmust estimate here from far stationsLightning in an areacount strikes in the area; station distance irrelevant
Left: a query point with lines to its few distant stations — cloud must be inferred across distance. Right: a circle over scattered strikes — lightning is just a count inside the area. Recency (the damped model) is explored for both; cloud is the clean case.

PART 5

Prediction: adding recency

A 1-2 week outlook that nudges the seasonal normal toward what the recent weeks actually did.

Prediction

The damped model in one number

A seasonal normal says how cloudy a week usually is, but it has no idea what just happened. The fix is small: take the normal for the week and add a fraction of how far the recent weeks have run above or below it. That fraction is alpha, the lag-k autocorrelation of weekly anomalies. Weekly anomalies persist (lag-1 is about 0.3 across Sweden, higher in some places); monthly ones barely do. We clamp alpha to [0, 1]: floored at 0 so a noisy negative value cannot flip the signal, capped at 1 so we never amplify an anomaly. At alpha = 0 the forecast collapses back to the normal.

recent gap a = recent − normal
persistence αh = clamp( Σ atat+hΣ at2 , 0, 1 )
forecast ŷ = normal + αh · a
The whole model is one fitted number per lead, α (alpha): the share of a recent anomaly that still holds h weeks out, measured as the lag-h autocorrelation of weekly anomalies and clamped to [0, 1]. At α = 0 the forecast is exactly the normal — the floor that stops it ever scoring worse than climatology.
normal (this week)recent week rangap aαforecast = normal + α·a
65%80% — cloudier+150.30  (lead 1)65 + 0.30×15 = 69.5%
65%50% — clearer−150.3065 − 0.30×15 = 60.5%
65%80% — cloudier+150.10  (lead 2)65 + 0.10×15 = 66.5%
65%80% — cloudier+150.00  (no persistence)65 + 0 = 65% — the normal
One week, one +15-point surprise, read four ways: a bigger α leans harder on the surprise; a longer lead carries a smaller α, so the forecast melts back toward the normal; and where history shows no persistence (α = 0) it returns the normal unchanged.
def fit_alpha(anomalies, lead):
    present = [a for a in anomalies if a is not None]
    if len(present) <= lead:
        return 0.0
    mean = fmean(present)
    var = sum((a - mean) ** 2 for a in present) / len(present)
    if var == 0:
        return 0.0
    pairs = [
        (a, b)
        for t in range(len(anomalies) - lead)
        if (a := anomalies[t]) is not None and (b := anomalies[t + lead]) is not None
    ]
    cov = sum((a - mean) * (b - mean) for a, b in pairs) / len(pairs)
    return max(0.0, min(1.0, cov / var))  # floor 0, cap 1
predictions/persistence.py: forecast = normal + alpha x recent anomaly. The series sits on a gap-free weekly grid, so a missing week is None and a lag never steps across a hole.

Prediction

Tested with a causal backtest

To trust the outlook we score it the way it would actually run. At each weekly origin we rebuild the normal from only the weeks up to that origin, so a past forecast is never measured against data from its own future. We let about two years of weeks accumulate first as warm-up, then start scoring. The baseline is the normal itself, which always predicts an anomaly of zero. Skill is the fraction by which the model cuts the baseline's error: 1 minus model MAE over baseline MAE.

climatology = {woy: total[woy] / count[woy] for woy in total}
causal = [
    None if v is None else v - climatology[woys[i]]
    for i, v in enumerate(values[: origin + 1])
]
prediction = predict(causal, woys, origin, lead)

target = actual - climatology[target_woy]
model_err.append(abs(target - prediction))
base_err.append(abs(target))  # the normal predicts zero anomaly

skill = 1.0 - fmean(model_err) / base_mae
predictions/outlook.py rolling-origin backtest. MIN_TRAIN_WEEKS = 104 (~2 years) of warm-up; leads 1 and 2, beyond which alpha falls to ~0.

Prediction

What the backtest shows

Averaged over Sweden the gain is small but consistent: lead-1 median skill is +1.9%, and the model beats the normal at 98.2% of stations. Stockholm is a clearer example. The chart plots rolling 52-week mean absolute error, not raw cloud: gray is the seasonal normal, blue is the damped outlook, and lower is better. Across the backtest the normal is off by 23.0 points on average; the damped outlook is off by 16.6.

01020304020192020202120222023202420252026mean abs error (cloud %)seasonal normal errordamped model error
Stockholm, lead-1: rolling 52-week mean absolute error of the seasonal normal (gray) versus the damped forecast (blue); lower is better. The shaded band is the error the model removes. Over 474 weeks the normal is off by 23.0 cloud-% points on average, the model by 16.6 — a 28% cut for this station.

PART 6

Spatial: cloud at a point

Estimate the cloud normal anywhere in Sweden from the nearest stations, climbing three rungs of precision.

Spatial

kNN is the shipped spatial estimate

For a location without a station, the useful signal is nearby stations. The product estimates the local cloud normal from the 5 nearest stations: nearest-station normal as the simple floor, kNN average as the shipped estimate, and a learned model as a benchmark check. The first two are direct statistics on real station observations. The benchmark exists to answer one question: does learning beat the average?

5 nearest stations carry the signal query point 31 km / 318° 47 km / 41° 52 km / 74° 66 km / 153° 71 km / 226° proxy label, not served Three rungs, same neighbours 1 · nearest station normal 2 · kNN average of 5 3 · learned benchmark increasing precision
Left: a query point and its 5 nearest stations with distance and bearing. Right: the same neighbours feed the simple baseline, the shipped kNN estimate, and a benchmark model.
DEFAULT_NEIGHBOURS = 5, chosen so the point is triangulated and a single missing station does not break the estimate.

Spatial

A location never sees itself

The evaluation uses the same neighbour rule as serving: when the origin is a station, that station is excluded from its own neighbour list. That gives leave-station-out scoring directly from the data shape. Whole stations go to disjoint folds, and serving reuses the same feature writers, so the benchmark is measured on the same inputs the shipped estimate uses.

def nearest_neighbours(points, k=DEFAULT_NEIGHBOURS):
    neighbours = {}
    for origin in points:
        ranked = sorted(
            (
                (other.id, haversine_km(origin.lat, origin.lon, other.lat, other.lon))
                for other in points
                if other.id != origin.id
            ),
            key=lambda pair: pair[1],
        )
        neighbours[origin.id] = ranked[:k]
    return neighbours
features.py: the origin station is filtered out, so a location can never use itself as a neighbour.

The benchmark model was LightGBM with 400 trees, learning rate 0.05, 31 leaves, fit on MAE (regression_l1). Features are the 5 nearest stations' cloud values plus distance, bearing, lat/lon, and seasonal sin/cos. It was a path we explored and then discarded: the result on the next slide is kept as evidence, but the learned model itself is not in the codebase — only the kNN estimate and the shared feature writers it was measured against ship.

Spatial

The benchmark did not beat kNN

The deciding score is against held-out station observations, because that is the data the product serves. On that score the learned benchmark and kNN are a near-tie, and kNN is slightly better: 6.20 pp median weekly MAE versus 6.36. Each bar is weekly median MAE in cloud percentage points; lower is better.

0481216 kNN average — shipped learned GBM nearest station regional climatology 6.20 6.36 7.51 15.65 weekly median MAE (cloud percentage points) — lower is better
Station-graded, leave-station-out evaluation over 109 stations. The kNN average (6.20) and learned benchmark (6.36) are close, but the benchmark wins at only 38% of stations. Both beat nearest-station (7.51); regional climatology sits far back at 15.65.
Grading against a proxy label made the learned model look better than it was. Grading against station truth changed the decision: ship kNN and discard the learned model. The benchmark number is kept as evidence for that choice, but the model is not part of the codebase — no extra dependency for a worse estimate.

PART 7

Lightning: area, not point

Strike chance is regional, so we count strikes in a circle; only normals exist so far.

Lightning

Strike-day probability over observed days

Lightning is regional, not tied to a station: a discharge lands somewhere in an area, and how far it is from a weather station does not matter. So we count strikes inside a circle (default radius 10 km, secondary 25 km) and work in lightning-days — calendar days with at least one strike in the circle. The probability for a month is lightning-days divided by the days actually observed, not by the days on the calendar, so missing coverage cannot inflate the number.

0.60.30P(any strike day)JanFebMarAprMayJunJulAugSepOctNovDecobserved so farclimatology tail
Each bar is the chance of at least one strike day in the circle that month, peaking in summer. The current month (June) is split: the solid lower segment is lightning-days already observed, the faded upper segment is the climatology estimate for the days left. The denominator is real observed days, so gaps in coverage do not inflate the probability.

The current-month figure is a linear extrapolation expressed in expected lightning-days: the days observed so far plus a climatology tail, where the tail is the monthly lightning-day rate times the days remaining. It is an expected count, not a compounded probability. The damped-persistence machinery does run on weekly lightning-days, but lightning is bursty and seasonal, so that output is shown only as indicative.

Lightning is served as climatology only. The archive contains observed strike events, not historical thunderstorm forecasts, so there are no forecast/outcome pairs for a near-term lightning model. A count model on the events alone did not show Brier skill over climatology, so the product keeps the honest baseline.

PART 8

Deploy: three services, one shape

The deployment is deliberately plain: static frontend, container API, serverless Postgres, and a raw archive cache for ingest.

Deploy

Terraform describes shape; Actions ships code

The deployed system has four moving parts. Cloudflare Pages serves the deck and React app. Fly.io runs the FastAPI container. Neon stores Postgres. Cloudflare R2 stores the gitignored SMHI raw archive so scheduled ingest jobs can replay files before downloading anything missing. Terraform creates and wires the infrastructure; GitHub Actions deploys app code only after tests pass.

Cloudflare Pages deck + React app Fly.io FastAPI container Neon Postgres R2 raw archive SMHI replay cache /api/v1 DATABASE_URL ingest job Terraform wires service URLs and secrets. Deploy and ingest are separate GitHub Actions workflows.
Browser traffic goes Pages → Fly → Neon. Ingest jobs also use R2 so the raw archive survives outside local disk.
# infra/terraform/main.tf
module "neon" {
    source = "./modules/neon"
}

module "backend_fly" {
    source       = "./modules/backend_fly"
    database_url = module.neon.database_url
}

module "frontend_pages" {
    source  = "./modules/frontend_pages"
    api_url = module.backend_fly.backend_url
}
The root module keeps the edges explicit: Neon connection string into Fly, Fly public URL into the Pages build.

PART 9

What's next

Where the work goes next, now that space and time are both covered.

Next

What's actually next

Space and time are already combined: the kNN average gives the normal at a point, and the damped step adds recency. The bottom-right corner is filled by composition, so the next work is operational freshness, lightning, and denser cloud inputs.

TIME normal / how it usually is now / recency SPACE at a station at any point Normals done Damped persistence done kNN spatial done Point + recency done — kNN × damped Real next: auto-refresh as data lands (every 24 h, no retraining) push lightning past climatology sharper cloud = denser data (satellite / high-res, not a model)
The 2×2 map of space (rows: at a station vs at any point) against time (columns: the usual normal vs recency now). All four corners are covered — the bottom-right by composing the kNN spatial normal with the damped recency step, rather than by a separate joint model.

Three directions are open. Keep it current automatically. Every model here is a cheap recomputation, not a trained artifact — the normals are averages, α is one autocorrelation, the lightning rate is a count. A scheduled job can rebuild affected rollups, refit α, and regenerate the backtest as new SMHI data lands. Push lightning past climatology. It is the thinnest corner, still normals plus an indicative damped nudge. Sharpen the cloud estimate with denser inputs. kNN×damped is a good fit for sparse station data; materially sharper cloud needs satellite cloud or a high-resolution analysis served directly, not more complexity on the same station set.