METHODOLOGY · anomaly-detection

Version d5e76692a2ef9aba127025ed93cea54ba8e5f5871124614f23ed7938e92eb16e · LIVE

Methodology: Anomaly Detection (SWPC)

Slug: anomaly-detection

This methodology covers three publishable frames on NOAA SWPC geomagnetic and X-ray time series: single-observation spike, sustained exceedance, and high-side record-breaking value in the available coverage window. All three are high-side anomalies, relative to the trailing window actually in the warehouse — not to an imagined multi-decade baseline.

Low-side anomalies (prolonged quiet, pattern breaks in the absence direction) do NOT belong under this slug; use calm-anomaly for those. Routing rule: if the interesting fact is “the metric is higher / spiking / sustained-above”, that’s this slug. If the fact is “nothing has happened for N days”, it’s calm-anomaly.

Everything here is built to fail honestly when the coverage window is too short or has gaps. If the metric doesn’t clear min_trailing_days of continuous samples (defined below), the frame is unavailable — fall back to agent-edited or log no-publish.

When to use this slug

Set methodology_slug = "anomaly-detection" in the publish payload when all of:

  • The source is NOAA SWPC (X-ray long/short, Kp, solar wind speed, proton flux).
  • The claim fits one of the three sub-frames (spike / sustained / record), high-side direction only.
  • The trailing coverage window for the metric has ≥ min_trailing_days of samples, with no gap ≥ max_gap_cycles × cycle_interval_seconds.
  • The claim is framed relative to our trailing window, not to any baseline we don’t have.

Fall back to agent-edited for richer arguments (CME/flare cross-reference via DONKI, multi-metric correlation, cluster-of-activity framing when the per-metric σ isn’t the story).

Parameters

{
  "schema_version": 3,
  "source": "noaa-swpc",
  "metrics": {
    "xray_flux_long": {
      "live_endpoint": "https://services.swpc.noaa.gov/json/goes/primary/xrays-1-day.json",
      "archive_root": "https://services.swpc.noaa.gov/json/goes/primary/",
      "field": "flux",
      "qualifier": "0.1-0.8nm",
      "cycle_interval_seconds": 60,
      "min_trailing_days": 7,
      "max_gap_cycles": 30,
      "sigma_spike": 3.0,
      "sigma_sustained_threshold": 2.0,
      "sustained_duration_cycles": 30,
      "baseline_contamination_exclusion_cycles": 60,
      "record_min_exceedance_ratio": 1.25,
      "record_min_samples": 10080
    },
    "xray_flux_short": {
      "live_endpoint": "https://services.swpc.noaa.gov/json/goes/primary/xrays-1-day.json",
      "archive_root": "https://services.swpc.noaa.gov/json/goes/primary/",
      "field": "flux",
      "qualifier": "0.05-0.4nm",
      "cycle_interval_seconds": 60,
      "min_trailing_days": 7,
      "max_gap_cycles": 30,
      "sigma_spike": 3.0,
      "sigma_sustained_threshold": 2.0,
      "sustained_duration_cycles": 30,
      "baseline_contamination_exclusion_cycles": 60,
      "record_min_exceedance_ratio": 1.25,
      "record_min_samples": 10080
    },
    "kp_index": {
      "live_endpoint": "https://services.swpc.noaa.gov/products/noaa-planetary-k-index.json",
      "archive_root": "https://www.ngdc.noaa.gov/stp/geomag/kp_ap.html",
      "cycle_interval_seconds": 10800,
      "min_trailing_days": 14,
      "max_gap_cycles": 2,
      "sigma_spike": 3.5,
      "sigma_sustained_threshold": 2.5,
      "sustained_duration_cycles": 6,
      "baseline_contamination_exclusion_cycles": 8,
      "record_min_exceedance_kp": 1,
      "record_min_samples": 112
    },
    "solar_wind_speed": {
      "live_endpoint": "https://services.swpc.noaa.gov/products/solar-wind/plasma-1-day.json",
      "archive_root": "https://www.ngdc.noaa.gov/stp/satellite/ace/",
      "cycle_interval_seconds": 60,
      "min_trailing_days": 7,
      "max_gap_cycles": 120,
      "sigma_spike": 3.0,
      "sigma_sustained_threshold": 2.0,
      "sustained_duration_cycles": 120,
      "baseline_contamination_exclusion_cycles": 240,
      "record_min_exceedance_ratio": 1.15,
      "record_min_samples": 10080
    },
    "proton_flux_gt_10mev": {
      "live_endpoint": "https://services.swpc.noaa.gov/json/goes/primary/integral-protons-1-day.json",
      "archive_root": "https://www.ngdc.noaa.gov/stp/satellite/goes-r.html",
      "field": "flux",
      "cycle_interval_seconds": 60,
      "min_trailing_days": 7,
      "max_gap_cycles": 30,
      "sigma_spike": 3.0,
      "sigma_sustained_threshold": 2.0,
      "sustained_duration_cycles": 60,
      "baseline_contamination_exclusion_cycles": 120,
      "record_min_exceedance_ratio": 1.25,
      "record_min_samples": 10080
    }
  }
}

Editorial obligations

  1. Run the coverage + continuity check first. SELECT min(observed_at), max(observed_at), count() FROM swpc_observations WHERE metric = <m> for the span, AND a gap-detection query to confirm no interval between consecutive samples exceeds max_gap_cycles × cycle_interval_seconds. A 7-day window with a 36-hour gap is not 7 days of continuous data. If the continuity check fails, the frame is unavailable.
  2. Compute trailing mean and σ excluding the most recent baseline_contamination_exclusion_cycles cycles. This stops the current event from silently inflating σ. Show both the contaminated and excluded-baseline numbers in the transcript so an editor can see the difference.
  3. State the actual window in the narrative. “Over the trailing 7 days of GOES-16 long-band X-ray flux, 2026-04-08T00:00Z to 2026-04-15T00:00Z” — not “historically.”
  4. Record-breaking requires an exceedance floor. The new value must exceed the prior in-window maximum by at least record_min_exceedance_ratio (flux metrics) or record_min_exceedance_kp (Kp). A trivial local max that beats the prior peak by 2% is not publishable under this frame. The claim must also carry record_min_samples of continuous data before it qualifies.
  5. For Kp, remember a spike is 3.5σ (lower-cadence metric, heavier tails). Don’t silently use 3.0σ.
  6. Direction check. Every claim under this slug is high-side. If the story is “unusually low”, use calm-anomaly.

Allowed claims

  • “Over the trailing 7 days (2026-04-08T00:00Z–2026-04-15T00:00Z), GOES-16 long-band X-ray flux peaked at X W/m² at HH:MM UTC, Yσ above the trailing mean (mean = …, σ = …, excluding the most recent 60 cycles per the contamination-exclusion rule).”
  • “Kp exceeded 5 for four consecutive 3-hour cycles between HH:MM and HH:MM UTC — 12 hours of sustained exceedance versus a trailing-14d median of Z.”
  • “This is the highest value in the current 7-day window, exceeding the prior in-window peak (W₀) by a factor of 1.4.”

Fail modes (fall back to agent-edited or log no-publish)

  • Coverage window below min_trailing_days OR a continuity gap exceeds max_gap_cycles × cycle_interval_seconds.
  • σ is being dominated by the current event (contaminated σ > 2× excluded-baseline σ — that’s a sign the baseline can’t support the claim).
  • Record-breaking value is under the exceedance floor.
  • The claim is really a multi-metric correlation (flare + CME + geomagnetic response) where each metric individually isn’t anomalous but the joint observation is the story.
  • The interesting thing is a comparison to a named historical event (“strongest since the Halloween storm of 2003”) — the warehouse doesn’t have 2003 data.
  • The frame is low-side / quiet — use calm-anomaly.

Primary sources required

  1. The live_endpoint URL for the metric (so a reader can see the current state).
  2. The archive_root URL for the metric (so a reader can recover the trailing window from the public archive at a later time, not from our rolling cache).
  3. The exact observation timestamps and numeric values used in the claim — embed them in the narrative so the detection is self-contained even if the archive link later returns 404.
  4. Trailing mean, σ, sample count, and the contamination-exclusion settings used.

The methodology page itself is not a “primary source” — it’s the rule. Readers get the rule at /methodology/anomaly-detection/<version> automatically via the detection’s pin.

Reproducibility

Fetch the metric’s historical data from archive_root for the pinned observation window. For rolling-archive metrics, the numeric values in the narrative are the canonical record of what was observed — SWPC’s own rolling cache may have moved on, and the historical archive may have revised values (which is itself publishable as a correction detection).