<!-- Served as markdown to crawler UA: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com) -->
<!-- Canonical HTML: /methodology -->

# Methodology — Hamer Intelligence Services

This page documents the ingestion, extraction, geolocation, and classification pipeline behind every event, briefing, and forecast on this site. It exists so you can audit our outputs before relying on them.

## 1. Inputs (where the data comes from)

HIS ingests from open, publicly accessible feeds. Every source is listed on /sources with its category, refresh interval, and current operational status.

- **Wire and news feeds**: Reuters, AP, BBC, Al Jazeera, AFP, France 24, NHK, plus regional outlets via RSS.
- **Humanitarian situation reports**: ReliefWeb (UN OCHA), ICRC, MSF, OCHA HDX.
- **Government and defense releases**: US DoD, UK MoD, Ukrainian General Staff, Israeli IDF, NATO, plus national press services.
- **Defense and security publications**: Janes, IISS, SIPRI, RUSI, CSIS, Chatham House, ICG (cited and linked, not republished).
- **OSINT investigation outlets**: Bellingcat, Conflict Intelligence Team, Oryx (cited and linked).
- **Telegram OSINT channels**: conflict-zone monitoring channels covering Russia-Ukraine, Israel-Gaza, Syria, Yemen, Sudan, Sahel, Myanmar, and others.
- **ADS-B transponder data**: Live military aircraft tracking via OpenSky Network and ADS-B Exchange.
- **AIS maritime data**: Vessel tracking via AISStream (Pro tier).
- **Reference datasets**: ACLED for historical event classification baselines, SIPRI Arms Transfers Database, World Bank conflict indicators.
- **Financial filings**: SEC EDGAR (10-K, 10-Q, 8-K) for the corporate exposure analysis surface.

Refresh intervals range from 30 seconds (ADS-B/AIS streams) to 24 hours (annual filings). The live status of every source is published at /sources.

## 2. Extraction pipeline

Every incoming item passes through a multi-stage AI pipeline running on OpenAI's GPT-5.1 model family. The pipeline is deterministic in structure (the prompts, schema, and validation rules are fixed in code), with the model handling natural-language extraction.

1. **Filtering.** Items are first triaged for relevance. Off-topic items are dropped. Roughly 60–80% of raw inputs are filtered out at this stage.
2. **Structured extraction.** Surviving items are passed to GPT-5.1 with a strict JSON schema asking for: event type, actors involved, location, date and time, casualties or damage, weapons or platforms involved, and a confidence rating for each field.
3. **Geolocation resolution.** Free-text locations are resolved against a gazetteer (Nominatim/OSM with manual overrides for disputed names and recent renamings). Items that cannot be resolved to coordinates with at least settlement-level precision are kept as text-only events and excluded from the map.
4. **Severity rating.** Each event is rated low / medium / high / critical using the rubric documented below.
5. **Deduplication.** New events are compared against the prior 72 hours of events using location proximity, event-type match, actor overlap, and embedding similarity on the description.
6. **Storage and indexing.** Final events are written to the database with full source attribution: original URL, source name, fetch timestamp, and the raw text the extraction was based on.

## 3. Severity rubric

- **Low**: Reported activity with no confirmed casualties or significant materiel damage.
- **Medium**: Confirmed casualties under 10, localised infrastructure damage, single munition strike with confirmed impact, vessel boarded/seized without sinking. Most daily conflict events fall here.
- **High**: Casualties between 10 and 100, multi-platform strike package, sustained engagement over hours, named operation begin/end, large-scale displacement event, ship sunk.
- **Critical**: Casualties above 100, strategic-weapon use, capital-city strike, major infrastructure (port, airport, power grid) destroyed, cross-border escalation, WMD-related event.

The rubric is calibrated against ACLED's event classification scheme so historical comparisons remain coherent.

## 4. What we do not do

- We do not perform original on-the-ground reporting. Every event traces to an open-source publication.
- We do not handle or republish classified material.
- We do not synthesise events. If a source did not report it, it is not in our database.
- We do not award severity ratings the model cannot defend against the rubric.
- We do not silently retract events. Corrections are versioned and visible in the per-event change log.

## 5. Known limitations

- **Source-language coverage skew.** English, Russian, Ukrainian, Hebrew, Arabic, French, and Spanish are well-covered. Burmese, Amharic, Tigrinya, Pashto, and several other conflict-relevant languages are under-covered relative to conflict intensity.
- **Telegram-channel selection bias.** Channels are selected for breadth; every channel has a perspective. Multi-source corroboration is required before publishing high-severity events, but bias remains.
- **First-report latency vs. correction latency.** The map prioritises freshness; high-severity events that later prove inaccurate may stay visible for up to 4 hours before the next briefing cycle re-evaluates them.
- **ADS-B / AIS gaps.** Coverage is dense over Europe, North America, and major shipping lanes; sparse over Africa, central Asia, and parts of the Pacific. Absence of a track is not evidence of absence.
- **Forecast calibration.** Strategic forecasts are model-generated and have not been back-tested against held-out events at publication time.

## 6. Reproducibility and audit

- Every event page exposes its source URL, fetch timestamp, raw extracted text, and structured fields.
- The full source list with operational status is at /sources.
- The site index for AI ingestion is at /llms.txt; the full content dump is at /llms-full.txt.
- The XML sitemap is at /sitemap.xml.

URL: https://hamerintel.com/methodology
