A data story

Cloudy: estimating cloud and lightning over Sweden

Cloudy answers a practical question for Swedish locations: what is normal here, what has happened recently, and how much should that change the next few weeks? This deck follows the product as it is: the data we ingest, the cleanup rules, the read path, the normals view, the weekly outlook, the spatial estimate, and the deploy path.

SMHI metobs SMHI lightning PostgreSQL FastAPI Pydantic Backtested outlook React + TypeScript

Open the app

Scroll to begin ↓

PART 1

Where we are going

What the data gives us, and the two gaps we close: space between stations, and recency.

Roadmap

What the data can and cannot say

SMHI gives us cloud cover at 109 weather stations across Sweden. That tells us how cloudy it is at each station, but nothing about the air in between. A seasonal normal tells us how cloudy a place usually is in a given week of the year, but it is flat year over year and has no sense of what the last few weeks actually did. We close those two gaps with two kinds of model: spatial models fill in space, and a damped persistence model adds recency in time.

Two gaps in the raw data, and the model family that fills each.

Roadmap

The map: space on one axis, time on the other

This grid is the map for the deck. The horizontal axis is time: on the left, the seasonal normal; on the right, recent conditions. The vertical axis is space: at the top, values at stations; at the bottom, estimates at any point. The shipped product fills the grid with small, explicit pieces: normals, damped persistence, kNN spatial estimates, and their composition. Lightning is area-based, so it uses its own simpler shape: count strike-days in a circle and compare them to the seasonal normal.

Each piece improves one axis. The point-and-recency case is the composition of kNN spatial normals and the damped weekly outlook.

PART 2

The data

Two live SMHI sources: what each contains, what it does not, and how it is served.

Data

Two live sources, each isolated

The system serves two SMHI observation feeds. Each has its own ingest module and carries a (source, source_version) tag inside its natural key, so feeds can never collide. Cloud is a station time series. Lightning is a country-wide event stream. Everything else in the app is derived from those two sources.

Source	What it is	Provider	source tag
SMHI cloud cover (param 16)	hourly cloud % at stations	SMHI metobs	`smhi-metobs`
SMHI lightning	per-discharge strike events	SMHI	`smhi-lightning`

Live ingest modules: ingest/cloud.py and ingest/lightning.py. The read path and model views are built from these tables and their rollups; no proxy weather source is served to users.

Data

SMHI cloud cover: the served base

This is the only source actually served to users. It gives hourly total cloud cover at each station, but only at the station: there is nothing in between, nothing sub-hourly, and no wind, temperature, or humidity to explain the value.

The card states why we keep the source, what each row holds, what it cannot tell us, and its size on disk.

Data

SMHI lightning: point events everywhere

Unlike cloud, lightning covers the whole country, not just stations: one row per discharge. It is the storm-activity signal. It carries no cloud cover, no stored density raster, and no altitude.

The card states why we keep lightning, what each discharge row holds, what it lacks, and its size on disk.

Data

Exploration is the raw-data workbench

The app has two exploration pages beside Normals and Predictions. Exploration shows cloud and lightning as synchronized time-series charts for the selected location. Map shows lightning events in the current viewport and time window. These are the pages that make the raw feeds inspectable before they become seasonal normals or weekly outlooks.

The exploration pages are deliberately direct: one chart view for cloud and lightning over time, one map view for lightning events in space.

Data

Rollups: the chart page never scans raw rows

The Exploration chart asks for cloud at a chosen resolution — anywhere from hourly to yearly. Aggregating cloud_hourly on every request would be slow, so on each ingest we precompute per-station buckets at six fixed resolutions and store them in cloud_rollups. The read path then serves those rows directly: a chart is a lookup, not an aggregation. Each bucket also keeps its counts — observed, expected, missing — so a gappy week is reported as gappy instead of silently averaging over holes. Because it materializes six resolutions across every station and every bucket, this derived table is the largest on disk, which the next slide shows.

# ingest/cloud.py — rebuilt from cloud_hourly on every ingest
ROLLUP_RESOLUTIONS = ("hour", "6h", "day", "week", "month", "year")


class CloudRollup(SQLModel, table=True):
    station_id: int
    resolution: str  # one of ROLLUP_RESOLUTIONS
    bucket_start: datetime
    observed_count: int
    expected_count: int
    missing_count: int
    mean_cloud_pct: float | None
    p05_cloud_pct: float | None
    p50_cloud_pct: float | None
    p95_cloud_pct: float | None

One row per (station, resolution, bucket): counts plus mean and percentiles, so the chart reads a summary instead of scanning hours.

Data

Where the bytes actually go

The live database holds two raw sources plus the precomputed serving rollups. The chart below is a snapshot of table size on disk, in megabytes; longer bars are larger tables. The top bar, cloud_rollups, is derived data, not raw input, and is the single largest table at 5.7 GB because it stores every station at six serving resolutions.

Each bar is one live Postgres table. The orange top bar is precomputed rollups, not raw data; the derived serving table outweighs the raw inputs because it is optimized for the chart read path.

PART 3

Cleaning the data

Where raw values are normalized, where they are preserved, and how writes stay repeatable.

Cleanup

One place turns raw values into cloud percent

Every raw cloud reading passes through a single function before it becomes a percentage. Sentinels (113, 9999, -9999), negatives, anything over 100, and octa readings above 8 all return None. The 113 code means the sky was obscured or not observable, so we treat it as unknown rather than as overcast. Octas convert with value / 8 * 100. Because the fix lives in one place, there is no second code path to keep in sync.

MISSING_SENTINELS = frozenset({113, 9999, -9999})


def normalize_cloud_pct(raw, *, octas=False):
    if raw is None or raw == "":
        return None
    try:
        value = float(raw)
    except (TypeError, ValueError):
        return None
    if value in MISSING_SENTINELS or value < 0:
        return None
    if octas:
        if value > 8:
            return None
        return value / 8.0 * 100.0
    if value > 100:
        return None
    return value

core/units.py — the only place raw SMHI values become cloud percent.

Cleanup

Lightning is kept raw

Strike events get no value sanitizer. We do not filter peak current or any other field against a physical range, because the raw measurements are the signal we want to keep. The only thing we drop is structurally malformed CSV lines: a row that fails to parse is caught, counted, and skipped. Everything that parses is stored verbatim, including the raw quality geometry, so it stays available for later use.

for line in csv.DictReader(f, delimiter=";"):
    try:
        ts = datetime(int(line["year"]), ...)
        row = {"peak_current_ka": float(line["peakCurrent"]), ...}
        rows.append(row)
    except (KeyError, ValueError, TypeError):
        skipped += 1

ingest/lightning.py parse_rows — malformed lines are counted, not cleaned.

Cleanup

Ingest is idempotent and incremental

Each ingest unit is replaced or upserted inside one transaction, so re-running a day or a station does not duplicate rows or leave a half-written state. The (source, source_version) pair is part of every natural key. Corrected cloud archives use delete-then-insert; rolling cloud updates use upsert; lightning replaces one day and refreshes its rollup. New data lands without rewriting the full archive.

with engine.begin() as conn:  # one transaction: replace the whole day
    conn.execute(
        delete(LightningEvent).where(
            LightningEvent.day == day,
            LightningEvent.source_version == SOURCE_VERSION,
        )
    )
    if rows:
        conn.execute(insert(LightningEvent), rows)
    refresh_sweden_daily_rollups(conn, day, day)

ingest/lightning.py — delete, insert, and rollup refresh in one transaction.

PART 4

Architecture

The stack and the decisions that keep the read path fast and the Python/TypeScript contract mechanical.

Architecture

Typed requests, typed responses

Pydantic validates every request against bounds and enums before any query runs. A latitude outside the Sweden envelope, a radius that is not 50 or 100, or a longitude without its latitude all return a 422 instead of a bad result. Responses are plain TypedDicts, so they describe shape at type-check time and cost nothing at runtime.

class PredictionsCloudQuery(BaseModel):
    model_config = {"populate_by_name": True}

    lat: Annotated[float | None, Field(ge=54.0, le=70.0)] = None
    lon: Annotated[float | None, Field(ge=9.0, le=26.0)] = None
    radius_km: Literal[50, 100] = 50

    @model_validator(mode="after")
    def _location_is_a_pair(self) -> Self:
        if (self.lat is None) ^ (self.lon is None):
            raise ValueError("lat and lon must be provided together")
        return self

predictions/query.py — Sweden envelope (lat 54–70, lon 9–26) and paired-coordinate check.

Architecture

TypeScript types generated from Python

The frontend does not hand-write API types. FastAPI emits an OpenAPI schema from the same response models, and openapi-typescript turns that schema into schema.gen.ts, which the frontend imports. The generation runs offline — no server, no database — so it works the same in CI and on a laptop. Rename a field in Python and the dependent TypeScript stops compiling.

# frontend/scripts/gen-api.sh
( cd "${BACKEND_DIR}" && uv run python -c \
    "import json, cloudy.api as a; print(json.dumps(a.create_app().openapi(), indent=2))" \
) > "${SCHEMA_JSON}"

node -e "JSON.parse(require('fs').readFileSync(process.argv[1],'utf8'))" "${SCHEMA_JSON}"

pnpm exec openapi-typescript "${SCHEMA_JSON}" -o "${SCHEMA_TS}"

create_app().openapi() → openapi.json → validate → openapi-typescript → schema.gen.ts (1606 lines).

Architecture

Precompute at ingestion

The Sweden-wide normal is a percentile scan over about 10 million rows, which takes roughly 10 seconds when run live — too slow for a request. So we run that scan once per ingest, off the request path, and write the result to a table. The read path then serves the materialized rows. Located queries touch only a few stations, so those stay live and sub-second.

def refresh_sweden_normals(
    engine, source="smhi-metobs", source_version="1.0"
) -> int:
    written = 0
    with engine.begin() as conn:
        for period in ("day", "month", "year"):
            bucket = f"EXTRACT({_PERIOD_FIELD[period]} FROM ts_utc)::int"
            rows = conn.execute(
                text(_NORMAL_SQL.format(bucket=bucket, station_filter=_SWEDEN_FILTER))
            ).all()
            conn.execute(delete(CloudNormal).where(...))
            if rows:
                conn.execute(insert(CloudNormal), [... for row in rows])
                written += len(rows)
    return written

cloud_normals: one row per (scope, period, bucket) with mean, p10/p50/p90, and clear/partial/overcast shares.

Architecture

Cache behind a Protocol

A real deployment would use a shared cache like Redis. Here it is an in-memory LRU. The route code never names either one: it talks to a Cache Protocol whose values are JSON strings only. Swapping the backend is a new implementation plus a config value, not a change to any route.

class Cache(Protocol):
    def get(self, key: str) -> str | None: ...

    def set(self, key: str, value: str, ttl_s: int) -> None: ...


@lru_cache  # one cache instance per process
def get_cache() -> Cache:
    backend = get_settings().cache_backend
    if backend == "memory":
        return MemoryCache()
    raise ValueError(f"unknown cache backend: {backend!r} (supported: memory)")

core/cache.py — MemoryCache is an OrderedDict LRU with lazy TTL, maxsize 1024. Routes cache under composed keys, e.g. clim:cloud:{lat}:{lon}:{radius}:{period}, with CACHE_TTL_S = 3600.

Values are JSON strings only, so a shared backend like Redis drops in without touching the contract.

Goals

Two goals, two different problems

Cloud and lightning need different shapes of answer. Cloud is a value at a point, and the nearest stations may be far away, so the work is estimating across distance. Lightning is regional — a strike lands somewhere in an area, and how far that is from a station does not matter — so the work is counting in a circle. Recency (the damped model) is explored for both, but cloud is the clean case.

Left: a query point with lines to its few distant stations — cloud must be inferred across distance. Right: a circle over scattered strikes — lightning is just a count inside the area. Recency (the damped model) is explored for both; cloud is the clean case.

PART 5

Prediction: adding recency

A 1-2 week outlook that nudges the seasonal normal toward what the recent weeks actually did.

Prediction

The damped model in one number

A seasonal normal says how cloudy a week usually is, but it has no idea what just happened. The fix is small: take the normal for the week and add a fraction of how far the recent weeks have run above or below it. That fraction is alpha, the lag-k autocorrelation of weekly anomalies. Weekly anomalies persist (lag-1 is about 0.3 across Sweden, higher in some places); monthly ones barely do. We clamp alpha to [0, 1]: floored at 0 so a noisy negative value cannot flip the signal, capped at 1 so we never amplify an anomaly. At alpha = 0 the forecast collapses back to the normal.

recent gap a = recent − normal

persistence α_h = clamp( Σ a_t a_t+hΣ a_t² , 0, 1 )

forecast ŷ = normal + α_h · a

The whole model is one fitted number per lead, α (alpha): the share of a recent anomaly that still holds h weeks out, measured as the lag-h autocorrelation of weekly anomalies and clamped to [0, 1]. At α = 0 the forecast is exactly the normal — the floor that stops it ever scoring worse than climatology.

normal (this week)	recent week ran	gap a	α	forecast = normal + α·a
65%	80% — cloudier	+15	0.30 (lead 1)	65 + 0.30×15 = 69.5%
65%	50% — clearer	−15	0.30	65 − 0.30×15 = 60.5%
65%	80% — cloudier	+15	0.10 (lead 2)	65 + 0.10×15 = 66.5%
65%	80% — cloudier	+15	0.00 (no persistence)	65 + 0 = 65% — the normal

One week, one +15-point surprise, read four ways: a bigger α leans harder on the surprise; a longer lead carries a smaller α, so the forecast melts back toward the normal; and where history shows no persistence (α = 0) it returns the normal unchanged.

def fit_alpha(anomalies, lead):
    present = [a for a in anomalies if a is not None]
    if len(present) <= lead:
        return 0.0
    mean = fmean(present)
    var = sum((a - mean) ** 2 for a in present) / len(present)
    if var == 0:
        return 0.0
    pairs = [
        (a, b)
        for t in range(len(anomalies) - lead)
        if (a := anomalies[t]) is not None and (b := anomalies[t + lead]) is not None
    ]
    cov = sum((a - mean) * (b - mean) for a, b in pairs) / len(pairs)
    return max(0.0, min(1.0, cov / var))  # floor 0, cap 1

predictions/persistence.py: forecast = normal + alpha x recent anomaly. The series sits on a gap-free weekly grid, so a missing week is None and a lag never steps across a hole.

Prediction

Tested with a causal backtest

To trust the outlook we score it the way it would actually run. At each weekly origin we rebuild the normal from only the weeks up to that origin, so a past forecast is never measured against data from its own future. We let about two years of weeks accumulate first as warm-up, then start scoring. The baseline is the normal itself, which always predicts an anomaly of zero. Skill is the fraction by which the model cuts the baseline's error: 1 minus model MAE over baseline MAE.

climatology = {woy: total[woy] / count[woy] for woy in total}
causal = [
    None if v is None else v - climatology[woys[i]]
    for i, v in enumerate(values[: origin + 1])
]
prediction = predict(causal, woys, origin, lead)

target = actual - climatology[target_woy]
model_err.append(abs(target - prediction))
base_err.append(abs(target))  # the normal predicts zero anomaly

skill = 1.0 - fmean(model_err) / base_mae

predictions/outlook.py rolling-origin backtest. MIN_TRAIN_WEEKS = 104 (~2 years) of warm-up; leads 1 and 2, beyond which alpha falls to ~0.

Prediction

What the backtest shows

Averaged over Sweden the gain is small but consistent: lead-1 median skill is +1.9%, and the model beats the normal at 98.2% of stations. Stockholm is a clearer example. The chart plots rolling 52-week mean absolute error, not raw cloud: gray is the seasonal normal, blue is the damped outlook, and lower is better. Across the backtest the normal is off by 23.0 points on average; the damped outlook is off by 16.6.

Stockholm, lead-1: rolling 52-week mean absolute error of the seasonal normal (gray) versus the damped forecast (blue); lower is better. The shaded band is the error the model removes. Over 474 weeks the normal is off by 23.0 cloud-% points on average, the model by 16.6 — a 28% cut for this station.

PART 6

Spatial: cloud at a point

Estimate the cloud normal anywhere in Sweden from the nearest stations, climbing three rungs of precision.

Spatial

kNN is the shipped spatial estimate

For a location without a station, the useful signal is nearby stations. The product estimates the local cloud normal from the 5 nearest stations: nearest-station normal as the simple floor, kNN average as the shipped estimate, and a learned model as a benchmark check. The first two are direct statistics on real station observations. The benchmark exists to answer one question: does learning beat the average?

Left: a query point and its 5 nearest stations with distance and bearing. Right: the same neighbours feed the simple baseline, the shipped kNN estimate, and a benchmark model.

DEFAULT_NEIGHBOURS = 5, chosen so the point is triangulated and a single missing station does not break the estimate.

Spatial

A location never sees itself

The evaluation uses the same neighbour rule as serving: when the origin is a station, that station is excluded from its own neighbour list. That gives leave-station-out scoring directly from the data shape. Whole stations go to disjoint folds, and serving reuses the same feature writers, so the benchmark is measured on the same inputs the shipped estimate uses.

def nearest_neighbours(points, k=DEFAULT_NEIGHBOURS):
    neighbours = {}
    for origin in points:
        ranked = sorted(
            (
                (other.id, haversine_km(origin.lat, origin.lon, other.lat, other.lon))
                for other in points
                if other.id != origin.id
            ),
            key=lambda pair: pair[1],
        )
        neighbours[origin.id] = ranked[:k]
    return neighbours

features.py: the origin station is filtered out, so a location can never use itself as a neighbour.

The benchmark model was LightGBM with 400 trees, learning rate 0.05, 31 leaves, fit on MAE (regression_l1). Features are the 5 nearest stations' cloud values plus distance, bearing, lat/lon, and seasonal sin/cos. It was a path we explored and then discarded: the result on the next slide is kept as evidence, but the learned model itself is not in the codebase — only the kNN estimate and the shared feature writers it was measured against ship.

Spatial

The benchmark did not beat kNN

The deciding score is against held-out station observations, because that is the data the product serves. On that score the learned benchmark and kNN are a near-tie, and kNN is slightly better: 6.20 pp median weekly MAE versus 6.36. Each bar is weekly median MAE in cloud percentage points; lower is better.

Station-graded, leave-station-out evaluation over 109 stations. The kNN average (6.20) and learned benchmark (6.36) are close, but the benchmark wins at only 38% of stations. Both beat nearest-station (7.51); regional climatology sits far back at 15.65.

Grading against a proxy label made the learned model look better than it was. Grading against station truth changed the decision: ship kNN and discard the learned model. The benchmark number is kept as evidence for that choice, but the model is not part of the codebase — no extra dependency for a worse estimate.

PART 7

Lightning: area, not point

Strike chance is regional, so we count strikes in a circle; only normals exist so far.

Lightning

Strike-day probability over observed days

Lightning is regional, not tied to a station: a discharge lands somewhere in an area, and how far it is from a weather station does not matter. So we count strikes inside a circle (default radius 10 km, secondary 25 km) and work in lightning-days — calendar days with at least one strike in the circle. The probability for a month is lightning-days divided by the days actually observed, not by the days on the calendar, so missing coverage cannot inflate the number.

Each bar is the chance of at least one strike day in the circle that month, peaking in summer. The current month (June) is split: the solid lower segment is lightning-days already observed, the faded upper segment is the climatology estimate for the days left. The denominator is real observed days, so gaps in coverage do not inflate the probability.

The current-month figure is a linear extrapolation expressed in expected lightning-days: the days observed so far plus a climatology tail, where the tail is the monthly lightning-day rate times the days remaining. It is an expected count, not a compounded probability. The damped-persistence machinery does run on weekly lightning-days, but lightning is bursty and seasonal, so that output is shown only as indicative.

Lightning is served as climatology only. The archive contains observed strike events, not historical thunderstorm forecasts, so there are no forecast/outcome pairs for a near-term lightning model. A count model on the events alone did not show Brier skill over climatology, so the product keeps the honest baseline.

PART 8

Deploy: three services, one shape

The deployment is deliberately plain: static frontend, container API, serverless Postgres, and a raw archive cache for ingest.

Deploy

Terraform describes shape; Actions ships code

The deployed system has four moving parts. Cloudflare Pages serves the deck and React app. Fly.io runs the FastAPI container. Neon stores Postgres. Cloudflare R2 stores the gitignored SMHI raw archive so scheduled ingest jobs can replay files before downloading anything missing. Terraform creates and wires the infrastructure; GitHub Actions deploys app code only after tests pass.

Browser traffic goes Pages → Fly → Neon. Ingest jobs also use R2 so the raw archive survives outside local disk.

# infra/terraform/main.tf
module "neon" {
    source = "./modules/neon"
}

module "backend_fly" {
    source       = "./modules/backend_fly"
    database_url = module.neon.database_url
}

module "frontend_pages" {
    source  = "./modules/frontend_pages"
    api_url = module.backend_fly.backend_url
}

The root module keeps the edges explicit: Neon connection string into Fly, Fly public URL into the Pages build.

PART 9

What's next

Where the work goes next, now that space and time are both covered.

What's actually next

Space and time are already combined: the kNN average gives the normal at a point, and the damped step adds recency. The bottom-right corner is filled by composition, so the next work is operational freshness, lightning, and denser cloud inputs.

The 2×2 map of space (rows: at a station vs at any point) against time (columns: the usual normal vs recency now). All four corners are covered — the bottom-right by composing the kNN spatial normal with the damped recency step, rather than by a separate joint model.

Three directions are open. Keep it current automatically. Every model here is a cheap recomputation, not a trained artifact — the normals are averages, α is one autocorrelation, the lightning rate is a count. A scheduled job can rebuild affected rollups, refit α, and regenerate the backtest as new SMHI data lands. Push lightning past climatology. It is the thinnest corner, still normals plus an indicative damped nudge. Sharpen the cloud estimate with denser inputs. kNN×damped is a good fit for sparse station data; materially sharper cloud needs satellite cloud or a high-resolution analysis served directly, not more complexity on the same station set.