METHODOLOGY · milestone-first-of-kind

Version 0072cb865d1f4d7a313d57f631158eba003d219aeba99946dd9e2e3d35373fff · LIVE

Methodology: Milestone and First-of-Kind

Slug: milestone-first-of-kind

This methodology covers two publishable frames on the NASA Exoplanet Archive:

  1. Round-number milestone crossings — the distinct-planet count in the archive crosses a preconfigured threshold between one polling cycle and the next.
  2. First-of-kind attribute combinations — a planet appears whose combination of (discoverymethod, pl_rade_bucket, st_teff_bucket) has never appeared in the archive before.

Both frames are set-membership checks against the archive’s own data at a specific observation timestamp. The NASA Exoplanet Archive’s ps table is live — rows are added, revised, and (rarely) removed as upstream references update. Reproducing a detection therefore requires the ADQL query AND the observation timestamp AND the specific row values captured in the detection narrative — a later query alone will not necessarily reach the same verdict, and that discrepancy is itself worth noting.

When to use this slug

Set methodology_slug = "milestone-first-of-kind" in the publish payload when all of:

  • The source is NASA Exoplanet Archive.
  • The claim is either a round-number crossing or a first-of-kind combination in the buckets defined below.
  • The exoplanet_first_seen materialization is populated for the planet in question (i.e. first_seen_at is not bootstrap-only). If freshness is still bootstrap-only, the claim cannot be grounded — log no-publish and move on.

Fall back to agent-edited when the story doesn’t fit these shapes (e.g. a host-star outlier reasoned from multiple ps columns, or a parameter-near-sensitivity-limit argument).

Parameters

The JSON block below is canonical. Editing any value produces a new methodology version hash; detections under the old version keep their old pin.

{
  "schema_version": 3,
  "source": {
    "name": "nasa-exoplanet-archive",
    "tap_endpoint": "https://exoplanetarchive.ipac.caltech.edu/TAP/sync",
    "table": "ps",
    "distinct_entity_column": "pl_name",
    "freshness_table": "exoplanet_first_seen"
  },
  "milestones": {
    "round_number_thresholds": [6000, 7000, 8000, 9000, 10000, 12500, 15000, 20000, 25000, 30000],
    "count_query": "SELECT COUNT(DISTINCT pl_name) FROM ps"
  },
  "first_of_kind": {
    "attribute_space": ["discoverymethod", "pl_rade_bucket", "st_teff_bucket"],
    "pl_rade_buckets": {
      "sub_earth": "pl_rade <= 1.0",
      "earth_like": "pl_rade > 1.0 AND pl_rade <= 1.5",
      "super_earth": "pl_rade > 1.5 AND pl_rade <= 2.0",
      "mini_neptune": "pl_rade > 2.0 AND pl_rade <= 4.0",
      "neptune": "pl_rade > 4.0 AND pl_rade <= 8.0",
      "sub_saturn": "pl_rade > 8.0 AND pl_rade <= 10.0",
      "jupiter": "pl_rade > 10.0"
    },
    "st_teff_buckets": {
      "m_dwarf": "st_teff <= 3700",
      "k_dwarf": "st_teff > 3700 AND st_teff <= 5200",
      "sun_like": "st_teff > 5200 AND st_teff <= 6000",
      "f_type": "st_teff > 6000 AND st_teff <= 7500",
      "a_or_hotter": "st_teff > 7500"
    }
  }
}

Editorial obligations

  1. Verify the coverage window. Run the distinct-count query at the current observation time. For first-of-kind claims, verify that no prior row in ps has the same (discoverymethod, pl_rade_bucket, st_teff_bucket) tuple. A first-of-kind detection is only valid if the entire upstream archive has been scanned for prior members.
  2. Pair the fact with the asymmetry that makes it surprising. “First sub-Earth discovered by radial velocity around a Sun-like host” only lands because there are 225 transit-discovered sub-Earths in the same bucket — cite that count.
  3. Link the exact ADQL. Include the query text in the publish payload’s primary sources so a reader can paste it into TAP and reach the same row.
  4. Don’t fabricate bucket edges. The buckets above are canonical. If a planet sits on a boundary (e.g. pl_rade = 1.0), the rule above is “sub_earth” — don’t round.

Allowed claims

  • “The NASA Exoplanet Archive now lists N confirmed planets” (round-number milestone).
  • “Planet X is the first (attribute-combo) entry in the archive” (first-of-kind, with the asymmetric-count pair).
  • “This combination had N prior entries” (contextual count, always cite the query).

Fail modes (fall back to agent-edited)

  • The planet’s first_seen_at is bootstrap-only — cannot distinguish a new addition from an archive update.
  • The combination’s attribute values are all NULL in the archive for the row — can’t place it in a bucket.
  • The claim involves rates, projections, or doubling thresholds — those branches of the parent methodology are deferred; use agent-edited with explicit caveats instead.

Primary sources required

  1. Exoplanet Archive TAP endpoint URL with the exact ADQL query.
  2. Observation timestamp (when our ingester saw the row, down to the second).
  3. The full row values for the planet that triggered the frame — pl_name, discoverymethod, pl_rade, st_teff, disc_year, and any discovery-reference column. Embed these in the narrative so the detection is self-contained.
  4. For milestone crossings: the distinct-count value at the observation timestamp AND the value from the prior polling cycle, both explicit in the narrative.
  5. The upstream archive’s own discovery reference (disc_refname), if present in the row.

Reproducibility

The ps table is live — a reader’s re-query at a later time may return a different count (archive revisions), a different attribute value (refit), or a different first-appearance verdict (another planet in the same bucket published between our observation and theirs). The narrative’s embedded row values and the observation timestamp are the canonical record; divergence from a later query is expected and, if substantive, publishable as a new correction detection per invariant #2.