A data story

Cloudy: estimating cloud and lightning over Sweden

Cloudy answers a practical question for Swedish locations: what is normal here, what has happened recently, and how much should that change the next few weeks? This deck follows the product as it is: the data I ingest, the cleanup rules, the read path, the normals view, the weekly outlook, the spatial estimate, and the deploy path.

SMHI metobs SMHI lightning PostgreSQL FastAPI Pydantic Backtested outlook React + TypeScript
Open the app

Scroll to begin ↓

Overview

What I built, end to end

The whole project answers one question for any spot in Sweden: what is normal here, what just happened, and how much should that change the next few weeks? Six steps get there. Two of them carry a model I built, measured, and then threw away — kept here only as the evidence behind a decision.

  1. 1

    Backend architecture

    A FastAPI service over Postgres. The request path only reads; the expensive Sweden-wide scan runs once, at ingestion.

    Typed end to endPrecompute at ingestCache behind a Protocol
  2. 2

    Data ingestion

    Two live SMHI feeds — hourly cloud cover at stations, and country-wide lightning strikes — cleaned in one place and rolled up for fast reads.

    Idempotent + incrementalOne cleanup functionRollups at six resolutions
  3. 3

    Exploration screens

    Synchronized cloud and lightning charts for a location, plus a lightning map that queries only what is in the current viewport.

    Viewport-scoped queriesNo pre-rendered raster
  4. 4

    Normals

    The seasonal baseline — and, for a place with no station, the cloud normal estimated from its nearest stations.

    kNN of 5 nearest — shipped
    Alternative trieda learned GBM was benchmarked against the kNN estimate; on held-out stations it only tied it — no significant per-station gain — so the simpler kNN ships and the model is dropped.
  5. 5

    Predictions

    A one-to-two week outlook that nudges the normal toward whatever the last few weeks actually did, backtested causally.

    Damped persistence — shippedCausal backtest
    Alternative triedanalog forecasting was tested against damped persistence; it scored negative skill (−1.5% vs +2.1%, beating the normal at 24% of stations vs 98%), so it is dropped.
  6. 6

    Deployment

    Terraform describes the infrastructure; GitHub Actions ships the code after tests pass. Serving and ingestion stay separate processes.

    Pages + Fly + Neon + R2Deploy after tests

PART 1

Where I'm going

What the data gives me, and the two gaps I close: space between stations, and recency.

Roadmap

What the data can and cannot say

SMHI gives me cloud cover at 109 weather stations across Sweden. That tells me how cloudy it is at each station, but nothing about the air in between. A seasonal normal tells me how cloudy a place usually is in a given week of the year, but it is flat year over year and has no sense of what the last few weeks actually did. I close those two gaps with two kinds of model: spatial models fill in space, and a damped persistence model adds recency in time.

Cloud is known AT stations, unknown in between ? value unknown between stations The normal: how it usually is — no recency flat seasonal climatology, same every year spatial models fill space damped model add time
Two gaps in the raw data, and the model family that fills each.

Roadmap

The map: space on one axis, time on the other

This grid is the map for the deck. The horizontal axis is time: on the left, the seasonal normal; on the right, recent conditions. The vertical axis is space: at the top, values at stations; at the bottom, estimates at any point. The shipped product fills the grid with small, explicit pieces: normals, damped persistence, kNN spatial estimates, and their composition. Lightning is area-based, so it uses its own simpler shape: count strike-days in a circle and compare them to the seasonal normal.

TIME → normal / how it usually is now / recency SPACE ↓ at a station at any point normals I already have this damped persistence Prediction part — adds time spatial kNN / nearest Spatial part — adds space combined done — kNN × damped Lightning is area-based and sits outside this grid — normals only so far.
Each piece improves one axis. The point-and-recency case is the composition of kNN spatial normals and the damped weekly outlook.

PART 2

The data

Two live SMHI sources: what each contains, what it does not, and how it is served.

Data

Two live sources, each isolated

The system serves two SMHI observation feeds. Each has its own ingest module and carries a (source, source_version) tag inside its natural key, so feeds can never collide. Cloud is a station time series. Lightning is a country-wide event stream. Everything else in the app is derived from those two sources.

SourceWhat it isProvidersource tag
SMHI cloud cover (param 16)hourly cloud % at stationsSMHI metobssmhi-metobs
SMHI lightningper-discharge strike eventsSMHIsmhi-lightning
Live ingest modules: ingest/cloud.py and ingest/lightning.py. The read path and model views are built from these tables and their rollups.

Data

SMHI cloud cover: the served base

This is the only source actually served to users. It gives hourly total cloud cover at each station, but only at the station: there is nothing in between, nothing sub-hourly, and no wind, temperature, or humidity to explain the value.

SMHI cloud (param 16) WHY served ground truth HAS hourly cloud_pct 0–100 per station, quality flag G/Y kept NOT nothing between stations, no sub-hourly, no wind/temp/humidity, no cloud type or base height SIZE 9.92M rows / 2.2 GB  ·  2015-01-01 → 2026-03-01 109 active of 459 stations  ·  4.8% of hours NULL (gaps kept) Two feeds: immutable corrected-archive + rolling latest-months (upserted)
The card states why I keep the source, what each row holds, what it cannot tell me, and its size on disk.

Data

SMHI lightning: point events everywhere

Unlike cloud, lightning covers the whole country, not just stations: one row per discharge. It is the storm-activity signal. It carries no cloud cover, no stored density raster, and no altitude.

SMHI lightning (CSV) WHY storm signal, high-resolution point events country-wide HAS per-discharge ts_utc (µs), lat/lon, signed peak_current_ka, multiplicity, cloud_indicator (0 = cloud-to-ground), raw quality geometry NOT no cloud %, no stored density field (re-aggregated on query), no altitude SIZE 4.11M rows / 1.4 GB  ·  2015-01-01 → 2026-06-16 1.74M cloud-to-ground (~42%) Density is re-aggregated per query, never stored as a raster
The card states why I keep lightning, what each discharge row holds, what it lacks, and its size on disk.

Data

Exploration is the raw-data workbench

The app has two exploration pages beside Normals and Predictions. Exploration shows cloud and lightning as synchronized time-series charts for the selected location. Map shows lightning events in the current viewport and time window. These are the pages that make the raw feeds inspectable before they become seasonal normals or weekly outlooks.

/app/?view=explore&location=Stockholm Exploration Cloud + lightning chart Same window, same aggregation, two feeds Auto / Week / Month / Year cloud lightning /app/?view=map&location=Stockholm Map Lightning in the viewport Map query follows pan, zoom, and time range viewport-scoped Events are fetched by map bounds and visible time window, not pre-rendered as a raster.
The exploration pages are deliberately direct: one chart view for cloud and lightning over time, one map view for lightning events in space.

Data

Rollups: the chart page never scans raw rows

The Exploration chart asks for cloud at a chosen resolution — anywhere from hourly to yearly. Aggregating cloud_hourly on every request would be slow, so on each ingest I precompute per-station buckets at six fixed resolutions and store them in cloud_rollups. The read path then serves those rows directly: a chart is a lookup, not an aggregation. Each bucket also keeps its counts — observed, expected, missing — so a gappy week is reported as gappy instead of silently averaging over holes. Because it materializes six resolutions across every station and every bucket, this derived table is the largest on disk, which the next slide shows.

# ingest/cloud.py — rebuilt from cloud_hourly on every ingest
ROLLUP_RESOLUTIONS = ("hour", "6h", "day", "week", "month", "year")


class CloudRollup(SQLModel, table=True):
    station_id: int
    resolution: str  # one of ROLLUP_RESOLUTIONS
    bucket_start: datetime
    observed_count: int
    expected_count: int
    missing_count: int
    mean_cloud_pct: float | None
    p05_cloud_pct: float | None
    p50_cloud_pct: float | None
    p95_cloud_pct: float | None
One row per (station, resolution, bucket): counts plus mean and percentiles, so the chart reads a summary instead of scanning hours.

Data

Where the bytes actually go

The live database holds two raw sources plus the precomputed serving rollups. The chart below is a snapshot of table size on disk, in megabytes; longer bars are larger tables. The top bar, cloud_rollups, is derived data, not raw input, and is the single largest table at 4.0 GB because it stores every station at six serving resolutions.

cloud_rollups cloud_hourly lightning_events 4047 2246 1388 size on disk (MB)
Each bar is one live Postgres table. The orange top bar is precomputed rollups, not raw data; the derived serving table outweighs the raw inputs because it is optimized for the chart read path.

PART 3

Cleaning the data

Where raw values are normalized, where they are preserved, and how writes stay repeatable.

Cleanup

One place turns raw values into cloud percent

Every raw cloud reading passes through a single function before it becomes a percentage. Sentinels (113, 9999, -9999), negatives, anything over 100, and octa readings above 8 all return None. The 113 code means the sky was obscured or not observable, so I treat it as unknown rather than as overcast. Octas convert with value / 8 * 100. Because the fix lives in one place, there is no second code path to keep in sync.

MISSING_SENTINELS = frozenset({113, 9999, -9999})


def normalize_cloud_pct(raw, *, octas=False):
    if raw is None or raw == "":
        return None
    try:
        value = float(raw)
    except (TypeError, ValueError):
        return None
    if value in MISSING_SENTINELS or value < 0:
        return None
    if octas:
        if value > 8:
            return None
        return value / 8.0 * 100.0
    if value > 100:
        return None
    return value
core/units.py — the only place raw SMHI values become cloud percent.

Cleanup

Lightning is kept raw

Strike events get no value sanitizer. I do not filter peak current or any other field against a physical range, because the raw measurements are the signal I want to keep. The only thing I drop is structurally malformed CSV lines: a row that fails to parse is caught, counted, and skipped. Everything that parses is stored verbatim, including the raw quality geometry, so it stays available for later use.

for line in csv.DictReader(f, delimiter=";"):
    try:
        ts = datetime(int(line["year"]), ...)
        row = {"peak_current_ka": float(line["peakCurrent"]), ...}
        rows.append(row)
    except (KeyError, ValueError, TypeError):
        skipped += 1
ingest/lightning.py parse_rows — malformed lines are counted, not cleaned.

Cleanup

Ingest is idempotent and incremental

Each ingest unit is replaced or upserted inside one transaction, so re-running a day or a station does not duplicate rows or leave a half-written state. The (source, source_version) pair is part of every natural key. Corrected cloud archives use delete-then-insert; rolling cloud updates use upsert; lightning replaces one day and refreshes its rollup. New data lands without rewriting the full archive.

with engine.begin() as conn:  # one transaction: replace the whole day
    conn.execute(
        delete(LightningEvent).where(
            LightningEvent.day == day,
            LightningEvent.source_version == SOURCE_VERSION,
        )
    )
    if rows:
        conn.execute(insert(LightningEvent), rows)
    refresh_sweden_daily_rollups(conn, day, day)
ingest/lightning.py — delete, insert, and rollup refresh in one transaction.

PART 4

Architecture

The stack and the decisions that keep the read path fast and the Python/TypeScript contract mechanical.

Architecture

Typed requests, typed responses

Pydantic validates every request against bounds and enums before any query runs. A latitude outside the Sweden envelope, a radius that is not 50 or 100, or a longitude without its latitude all return a 422 instead of a bad result. Responses are plain TypedDicts, so they describe shape at type-check time and cost nothing at runtime.

class PredictionsCloudQuery(BaseModel):
    model_config = {"populate_by_name": True}

    lat: Annotated[float | None, Field(ge=54.0, le=70.0)] = None
    lon: Annotated[float | None, Field(ge=9.0, le=26.0)] = None
    radius_km: Literal[50, 100] = 50

    @model_validator(mode="after")
    def _location_is_a_pair(self) -> Self:
        if (self.lat is None) ^ (self.lon is None):
            raise ValueError("lat and lon must be provided together")
        return self
predictions/query.py — Sweden envelope (lat 54–70, lon 9–26) and paired-coordinate check.

Architecture

TypeScript types generated from Python

The frontend does not hand-write API types. FastAPI emits an OpenAPI schema from the same response models, and openapi-typescript turns that schema into schema.gen.ts, which the frontend imports. The generation runs offline — no server, no database — so it works the same in CI and on a laptop. Rename a field in Python and the dependent TypeScript stops compiling.

# frontend/scripts/gen-api.sh
( cd "${BACKEND_DIR}" && uv run python -c \
    "import json, cloudy.api as a; print(json.dumps(a.create_app().openapi(), indent=2))" \
) > "${SCHEMA_JSON}"

node -e "JSON.parse(require('fs').readFileSync(process.argv[1],'utf8'))" "${SCHEMA_JSON}"

pnpm exec openapi-typescript "${SCHEMA_JSON}" -o "${SCHEMA_TS}"
create_app().openapi() → openapi.json → validate → openapi-typescript → schema.gen.ts (1333 lines).

Architecture

Precompute at ingestion

The Sweden-wide normal is a percentile scan over about 10 million rows, which takes roughly 10 seconds when run live — too slow for a request. So I run that scan once per ingest, off the request path, and write the result to a table. The read path then serves the materialized rows. Located queries touch only a few stations, so those stay live and sub-second.

def refresh_sweden_normals(
    engine, source="smhi-metobs", source_version="1.0"
) -> int:
    written = 0
    with engine.begin() as conn:
        for period in ("day", "month", "year"):
            bucket = f"EXTRACT({_PERIOD_FIELD[period]} FROM ts_utc)::int"
            rows = conn.execute(
                text(_NORMAL_SQL.format(bucket=bucket, station_filter=_SWEDEN_FILTER))
            ).all()
            conn.execute(delete(CloudNormal).where(...))
            if rows:
                conn.execute(insert(CloudNormal), [... for row in rows])
                written += len(rows)
    return written
cloud_normals: one row per (scope, period, bucket) with mean, p10/p50/p90, and clear/partial/overcast shares.

Architecture

Cache behind a Protocol

A real deployment would use a shared cache like Redis. Here it is an in-memory LRU. The route code never names either one: it talks to a Cache Protocol whose values are JSON strings only. Swapping the backend is a new implementation plus a config value, not a change to any route.

class Cache(Protocol):
    def get(self, key: str) -> str | None: ...

    def set(self, key: str, value: str, ttl_s: int) -> None: ...


@lru_cache  # one cache instance per process
def get_cache() -> Cache:
    backend = get_settings().cache_backend
    if backend == "memory":
        return MemoryCache()
    raise ValueError(f"unknown cache backend: {backend!r} (supported: memory)")
core/cache.py — MemoryCache is an OrderedDict LRU with lazy TTL, maxsize 1024. Routes cache under composed keys, e.g. clim:cloud:{lat}:{lon}:{radius}:{period}, with CACHE_TTL_S = 3600.
Values are JSON strings only, so a shared backend like Redis drops in without touching the contract.

Goals

Two goals, two different problems

Cloud and lightning need different shapes of answer. Cloud is a value at a point, and the nearest stations may be far away, so the work is estimating across distance. Lightning is regional — a strike lands somewhere in an area, and how far that is from a station does not matter — so the work is counting in a circle. Recency (the damped model) is explored for both, but cloud is the clean case.

Cloud at a pointquery pointmust estimate here from far stationsLightning in an areacount strikes in the area; station distance irrelevant
Left: a query point with lines to its few distant stations — cloud must be inferred across distance. Right: a circle over scattered strikes — lightning is just a count inside the area. Recency (the damped model) is explored for both; cloud is the clean case.

PART 5

Normals: the cloud at a point

Estimate the seasonal cloud normal anywhere in Sweden from the nearest stations — kNN as the shipped estimate, with a learned model tried as a benchmark.

Normals

kNN is the shipped spatial estimate

For a location without a station, the useful signal is nearby stations. Two rungs ship, both direct statistics on the 5 nearest stations' real observations: the nearest-station normal as the simple floor, and an inverse-distance kNN of those stations as the shipped estimate. A third rung, a learned model, was tried as a benchmark to answer one question — does learning beat the kNN? — but it did not pan out, so it never shipped.

5 nearest stations carry the signal query point 31 km / 318° 47 km / 41° 52 km / 74° 66 km / 153° 71 km / 226° Three rungs tried, two shipped 1 · nearest station normal 2 · inverse-distance kNN 3 · learned model — dropped increasing complexity
Left: a query point and its 5 nearest stations with distance and bearing. Right: the same neighbours feed the two shipped rungs — the simple baseline and the kNN estimate — plus a learned model that was tried as a benchmark and dropped.
DEFAULT_NEIGHBOURS = 5, chosen so the point is triangulated and a single missing station does not break the estimate.

Normals

A location never sees itself

The evaluation uses the same neighbour rule as serving: when the origin is a station, that station is excluded from its own neighbour list. That gives leave-station-out scoring directly from the data shape. Whole stations go to disjoint folds, and serving reuses the same feature writers, so the benchmark is measured on the same inputs the shipped estimate uses.

def nearest_neighbours(points, k=DEFAULT_NEIGHBOURS):
    neighbours = {}
    for origin in points:
        ranked = sorted(
            (
                (other.id, haversine_km(origin.lat, origin.lon, other.lat, other.lon))
                for other in points
                if other.id != origin.id
            ),
            key=lambda pair: pair[1],
        )
        neighbours[origin.id] = ranked[:k]
    return neighbours
features.py: the origin station is filtered out, so a location can never use itself as a neighbour.

The benchmark model was LightGBM with 400 trees, learning rate 0.05, 31 leaves, fit on MAE (regression_l1). Features are the 5 nearest stations' cloud values plus distance, bearing, lat/lon, and seasonal sin/cos. It was a path I explored and then discarded: the result on the next slide is kept as evidence, but the learned model itself is not in the codebase — only the kNN estimate and the shared feature writers it was measured against ship.

Normals

For the seasonal normal, the benchmark only tied kNN

The spatial estimate guesses a point’s seasonal cloud normal — the typical cloud each week of the year — from the nearest stations. A learned model (LightGBM), benchmarked on held-out stations, only ties the shipped kNN.

0246 nearest station kNN equal-weight kNN inverse-distance — shipped learned GBM — dropped 4.64 4.30 4.17 3.95 — lower median, worse on average
Estimatormedian errorbeats shipped atverdict
nearest station4.6441%worse
kNN equal-weight4.3033%worse
kNN inverse-distance — shipped4.17
learned GBM — dropped3.9544%tie — not a win
Bars are median error per station (cloud %-points; lower is better). The GBM’s 3.95 looks lower than the shipped 4.17, but that’s skew, not skill: it loses at 56% of stations, its losses are bigger than its wins (1.86 vs 1.21 pp; one station off by 14.6), and its average error is actually worse (5.76 vs 5.25) — no real gain, a tie. I ship inverse-distance because it does beat the plain equal-weight average, at two-thirds of stations. Numbers from cloudy spatial-backtest.

Normals

For one week’s cloud, still no gain — so I dropped it

I also tried a tougher spatial test: estimate a station’s cloud for a single week from its neighbours’ readings that same week. Here too the learned model is no better than a one-line inverse-distance average.

One week’s cloud, from the neighbours’ same-week readings 02468 median error per station (pp) — lower is better nearest station kNN of same-week readings kNN inverse-distance learned GBM — dropped 7.46 6.43 6.19 6.29 — no gain
Same 109 stations, one week at a time; lower is better. The shipped seasonal-normal estimate isn’t shown — it doesn’t try to track a single week. The GBM (6.29) doesn’t beat inverse-distance (6.19).
Why drop a model that looks competitive? On honest station data it isn’t better — a tie on the seasonal normal, no gain on a single week — so LightGBM isn’t worth its weight. The one real free win, inverse-distance weighting, I did take: it’s now the shipped kNN. The learned model adds nothing on top.

PART 6

Prediction: adding recency

A 1-2 week outlook that nudges the seasonal normal toward what the recent weeks actually did.

Prediction

The damped model in one number

A seasonal normal says how cloudy a week usually is, but it has no idea what just happened. The fix is small: take the normal for the week and add a fraction of how far the recent weeks have run above or below it. That fraction is alpha, the lag-k autocorrelation of weekly anomalies. Weekly anomalies persist (lag-1 is about 0.3 across Sweden, higher in some places); monthly ones barely do. I clamp alpha to [0, 1]: floored at 0 so a noisy negative value cannot flip the signal, capped at 1 so I never amplify an anomaly. At alpha = 0 the forecast collapses back to the normal.

recent gap a = recent − normal
persistence αh = clamp( Σ atat+hΣ at2 , 0, 1 )
forecast ŷ = normal + αh · a
The whole model is one fitted number per lead, α (alpha): the share of a recent anomaly that still holds h weeks out, measured as the lag-h autocorrelation of weekly anomalies and clamped to [0, 1]. At α = 0 the forecast is exactly the normal — the floor that stops it scoring worse than climatology on average.
normal (this week)recent week rangap aαforecast = normal + α·a
65%80% — cloudier+150.30  (lead 1)65 + 0.30×15 = 69.5%
65%50% — clearer−150.3065 − 0.30×15 = 60.5%
65%80% — cloudier+150.10  (lead 2)65 + 0.10×15 = 66.5%
65%80% — cloudier+150.00  (no persistence)65 + 0 = 65% — the normal
One week, one +15-point surprise, read four ways: a bigger α leans harder on the surprise; a longer lead carries a smaller α, so the forecast melts back toward the normal; and where history shows no persistence (α = 0) it returns the normal unchanged.
def fit_alpha(anomalies, lead):
    present = [a for a in anomalies if a is not None]
    if len(present) <= lead:
        return 0.0
    mean = fmean(present)
    var = sum((a - mean) ** 2 for a in present) / len(present)
    if var == 0:
        return 0.0
    pairs = [
        (a, b)
        for t in range(len(anomalies) - lead)
        if (a := anomalies[t]) is not None and (b := anomalies[t + lead]) is not None
    ]
    cov = sum((a - mean) * (b - mean) for a, b in pairs) / len(pairs)
    return max(0.0, min(1.0, cov / var))  # floor 0, cap 1
predictions/persistence.py: forecast = normal + alpha x recent anomaly. The series sits on a gap-free weekly grid, so a missing week is None and a lag never steps across a hole.

Prediction

Tested with a causal backtest

To trust the outlook I score it the way it would actually run. At each weekly origin I rebuild the normal from only the weeks up to that origin, so a past forecast is never measured against data from its own future. I let about two years of weeks accumulate first as warm-up, then start scoring. The baseline is the normal itself, which always predicts an anomaly of zero. Skill is the fraction by which the model cuts the baseline's error: 1 minus model MAE over baseline MAE.

climatology = {woy: total[woy] / count[woy] for woy in total}
causal = [
    None if v is None else v - climatology[woys[i]]
    for i, v in enumerate(values[: origin + 1])
]
prediction = predict(causal, woys, origin, lead)

target = actual - climatology[target_woy]
model_err.append(abs(target - prediction))
base_err.append(abs(target))  # the normal predicts zero anomaly

skill = 1.0 - fmean(model_err) / base_mae
predictions/outlook.py rolling-origin backtest. MIN_TRAIN_WEEKS = 104 (~2 years) of warm-up; leads 1 and 2, beyond which alpha falls to ~0.

Prediction

What the backtest shows

Averaged over Sweden the gain is small but consistent: lead-1 median skill is +2.1%, and the model beats the normal at 98.2% of stations. The nearest station to Stockholm — Berga, 29 km south — is a clearer (and unusually favorable) example. The chart plots rolling 52-week mean absolute error, not raw cloud: gray is the seasonal normal, blue is the damped outlook, and lower is better. Across the backtest the normal is off by 23.2 points on average; the damped outlook is off by 16.5.

0102030402017201820192020202120222023202420252026mean abs error (cloud %)seasonal normal errordamped model error
Berga (nearest station to Stockholm, 29 km), lead-1: rolling 52-week mean absolute error of the seasonal normal (gray) versus the damped forecast (blue); lower is better. The shaded band is the error the model removes. Over 459 weeks the normal is off by 23.2 cloud-% points on average, the model by 16.5 — a 29% cut. Berga is one of the country's most persistent sites; the Sweden-wide median cut is ~2%.

Prediction

Tried and dropped: analog forecasting

Damped persistence has a floor built in: at α = 0 it returns the normal, so on average it cannot do worse than climatology. Analog forecasting has no such floor. It works on anomalies (weekly cloud minus the seasonal normal), so it matches the recent departure from normal rather than the season itself: it takes the last four weeks of anomalies, ranks every earlier four-week window in the same season by how closely its week-by-week values line up (smallest squared difference), and averages what followed the few closest matches — never peeking past the forecast date. A reasonable idea, so I held it to the same causal rolling-origin backtest as the damped model, scored against real station observations.

modelmedian skill (lead 1)beats the normal at
damped persistence — shipped+2.1%98.2% of stations
analog forecasting — dropped−1.5%23.9% of stations
Skill is the fraction by which a model cuts the seasonal normal's mean absolute error, so the normal itself scores 0%. Analog lands below zero — on most stations it is worse than just predicting the normal — while damped persistence stays consistently above it. Damped is re-scored on current data (+2.1%); analog's −1.5% is from the original benchmark run, before it was removed.
A model that loses to the baseline only adds code and risk, so analog was measured and then removed rather than shipped. The number is kept as evidence for the decision; the model itself is not in the codebase. Only damped persistence ships.

PART 7

Lightning: area, not point

Strike chance is regional, so I count strikes in a circle; only normals exist so far.

Lightning

Strike-day probability over observed days

Lightning is regional, not tied to a station: a discharge lands somewhere in an area, and how far it is from a weather station does not matter. So I count strikes inside a circle (default radius 10 km, secondary 25 km) and work in lightning-days — calendar days with at least one strike in the circle. The probability for a month is lightning-days divided by the days actually observed, not by the days on the calendar, so missing coverage cannot inflate the number.

0.60.30P(any strike day)JanFebMarAprMayJunJulAugSepOctNovDecobserved so farclimatology tail
Each bar is the chance of at least one strike day in the circle that month, peaking in summer. The current month (June) is split: the solid lower segment is lightning-days already observed, the faded upper segment is the climatology estimate for the days left. The denominator is real observed days, so gaps in coverage do not inflate the probability.

The current-month figure is a linear extrapolation expressed in expected lightning-days: the days observed so far plus a climatology tail, where the tail is the monthly lightning-day rate times the days remaining. It is an expected count, not a compounded probability. The damped-persistence machinery does run on weekly lightning-days — pooled over a wider 25 km by default — but lightning is bursty and seasonal, so that output is shown only as indicative.

Lightning is served as climatology only. The archive contains observed strike events, not historical thunderstorm forecasts, so there are no forecast/outcome pairs for a near-term lightning model. A count model on the events alone did not show Brier skill over climatology, so the product keeps the honest baseline.

PART 8

Deploy: four moving parts, one shape

The deployment is deliberately plain: static frontend, container API, serverless Postgres, and a raw archive cache for ingest.

Deploy

Terraform describes shape; Actions ships code

The deployed system has four moving parts. Cloudflare Pages serves the deck and React app. Fly.io runs the FastAPI container. Neon stores Postgres. Cloudflare R2 stores the gitignored SMHI raw archive so scheduled ingest jobs can replay files before downloading anything missing. Terraform creates and wires the infrastructure; GitHub Actions deploys app code only after tests pass.

Cloudflare Pages deck + React app Fly.io FastAPI container Neon Postgres R2 raw archive SMHI replay cache /api/v1 DATABASE_URL ingest job Terraform wires service URLs and secrets. Deploy and ingest are separate GitHub Actions workflows.
Browser traffic goes Pages → Fly → Neon. Ingest jobs also use R2 so the raw archive survives outside local disk.
# infra/terraform/main.tf
module "neon" {
    source = "./modules/neon"
}

module "backend_fly" {
    source       = "./modules/backend_fly"
    database_url = module.neon.database_url
}

module "frontend_pages" {
    source  = "./modules/frontend_pages"
    api_url = module.backend_fly.backend_url
}
The root module keeps the edges explicit: Neon connection string into Fly, Fly public URL into the Pages build.

PART 9

What's next

Where the work goes next, now that space and time are both covered.

Next

What's next

Space and time are already combined: the inverse-distance kNN gives the normal at a point, and the damped step adds recency. The bottom-right corner is filled by composition, so the next work is operational freshness, lightning, and denser cloud inputs.

TIME normal / how it usually is now / recency SPACE at a station at any point Normals done Damped persistence done kNN spatial done Point + recency done — kNN × damped Real next: auto-refresh as data lands (every 24 h, no retraining) push lightning past climatology sharper cloud = denser data (satellite / high-res, not a model)
The 2×2 map of space (rows: at a station vs at any point) against time (columns: the usual normal vs recency now). All four corners are covered — the bottom-right by composing the kNN spatial normal with the damped recency step, rather than by a separate joint model.

Three directions are open.

  • Keep it current automatically. Every model here is a cheap recomputation, not a trained artifact — the normals are averages, α is one autocorrelation, the lightning rate is a count. A scheduled job can rebuild affected rollups, refit α, and regenerate the backtest as new SMHI data lands.
  • Push lightning past climatology. It is the thinnest corner, still normals plus an indicative damped nudge.
  • Sharpen the cloud estimate with denser inputs. kNN×damped is a good fit for sparse station data; materially sharper cloud needs satellite cloud or a high-resolution analysis served directly, not more complexity on the same station set.