ENSO Index Skill - Hindcast Verification
Temporal correlation between the hindcast ensemble-mean NINO index and observations, as a function of start month and lead time, over the 1993-2026 hindcast period.
What am I looking at?
Each heatmap below grades how skilful a forecast model has been at predicting ENSO, by replaying its past forecasts (hindcasts) and comparing them against what was actually observed. Higher, redder numbers mean more reliable; pale, lower numbers mean less reliable.
That score is the skill - 1.0 is perfect, about 0.6+ is useful, near 0 is no skill. The heatmap below shows it for every start month and lead time.
SST = sea-surface temperature, the ocean signal that defines El Niño and La Niña.
Niño 3.4 / Niño 3 / Niño 4 = standard boxes in the tropical Pacific where that SST is averaged. Niño 3.4 is the headline ENSO index; Niño 3 also carries the rainfall signal we use to flag extreme events.
Lead time (L1-L6, the rows) = how many months ahead the forecast looks. L1 = next month, L6 = six months ahead. Skill naturally fades at longer leads.
Start month (the columns) = the calendar month the forecast was launched from. The dip around boreal spring is the well-known "spring predictability barrier."
Correlation (0 to 1) = how closely the model's forecasts tracked reality across all hindcast years. 1.0 = perfect, about 0.6+ = useful, near 0 = no skill. For example, 0.85 at L3 means the model reliably anticipated ENSO three months ahead.
Loading skill heatmaps…
The actual track - hindcast vs observations
A correlation number is abstract, so here is the track behind it: observed (black) versus the hindcast ensemble mean at 3-month lead (orange), 1993-2026. The model catches every major El Niño and La Niña. Switch the model in the toolbar above; pick a region below.
Observations: SST indices from COBE-SST 2 (Hirahara et al. 2014); the Niño 3 rainfall index from GPCP (Adler et al. 2003). Full citations with DOIs on the Methodology and data sources page.
Intensity discrimination - does the model get the strength right?
Correlation says the model tracks the index, but for action what matters is the intensity class. Only a handful of strong or extreme events are on record, so a per-class reliability score would be meaningless. Instead we pool to moderate vs strong + extreme and ask: is the model's forecast probability of the stronger category higher when the event truly was stronger? A large gap between the paired bars means yes. Pooled over all leads and start months.
Loading…
How to read this
- Each cell is the correlation, across hindcast years, between the ensemble-mean NINO index and the observed index for that start month (column) and lead time (row).
- Higher (red) = the model reliably tracks observed ENSO at that start/lead. Skill drops at longer leads and across the boreal-spring predictability barrier.
- Forecast probabilities elsewhere on the site use the raw ensemble fraction in each ENSO class, with each member bias- and amplitude-corrected to the hindcast (z = (anomaly − μhindcast) / σhindcast); no statistical calibration.