ENSOscope

ENSO Index Skill - Hindcast Verification

Temporal correlation between the hindcast ensemble-mean NINO index and observations, as a function of start month and lead time, over the 1993-2026 hindcast period.

What am I looking at?

Each heatmap below grades how skilful a forecast model has been at predicting ENSO, by replaying its past forecasts (hindcasts) and comparing them against what was actually observed. Higher, redder numbers mean more reliable; pale, lower numbers mean less reliable.

1 · Replay past forecasts 2 · Compare to reality 3 · Score how close replay the model'spast hindcasts what actuallyhappened correlation,0 (none) to 1 (perfect)

That score is the skill - 1.0 is perfect, about 0.6+ is useful, near 0 is no skill. The heatmap below shows it for every start month and lead time.

SST = sea-surface temperature, the ocean signal that defines El Niño and La Niña.

Niño 3.4 / Niño 3 / Niño 4 = standard boxes in the tropical Pacific where that SST is averaged. Niño 3.4 is the headline ENSO index; Niño 3 also carries the rainfall signal we use to flag extreme events.

Lead time (L1-L6, the rows) = how many months ahead the forecast looks. L1 = next month, L6 = six months ahead. Skill naturally fades at longer leads.

Start month (the columns) = the calendar month the forecast was launched from. The dip around boreal spring is the well-known "spring predictability barrier."

Correlation (0 to 1) = how closely the model's forecasts tracked reality across all hindcast years. 1.0 = perfect, about 0.6+ = useful, near 0 = no skill. For example, 0.85 at L3 means the model reliably anticipated ENSO three months ahead.

0.0
1.0 temporal correlation (hindcast mean vs obs)

Loading skill heatmaps…

The actual track - hindcast vs observations

A correlation number is abstract, so here is the track behind it: observed (black) versus the hindcast ensemble mean at 3-month lead (orange), 1993-2026. The model catches every major El Niño and La Niña. Switch the model in the toolbar above; pick a region below.

Observations: SST indices from COBE-SST 2 (Hirahara et al. 2014); the Niño 3 rainfall index from GPCP (Adler et al. 2003). Full citations with DOIs on the Methodology and data sources page.

Intensity discrimination - does the model get the strength right?

Correlation says the model tracks the index, but for action what matters is the intensity class. Only a handful of strong or extreme events are on record, so a per-class reliability score would be meaningless. Instead we pool to moderate vs strong + extreme and ask: is the model's forecast probability of the stronger category higher when the event truly was stronger? A large gap between the paired bars means yes. Pooled over all leads and start months.

Loading…

How to read this