How it works
This tool scrapes the GOV.UK Fuel Finder API — the same data source behind the government's consumer fuel price finder — and stores it in a PostgreSQL database to build a historical price record that the API itself doesn't provide.
The GOV.UK API only serves live snapshots: current prices at the time of the request. There is no way to retrieve yesterday's prices or see how prices have changed over time. By scraping regularly and storing every price change, we build a time-series dataset that enables trend analysis, regional comparisons, and anomaly detection.
Data source
The Fuel Finder API is a government service that aggregates fuel prices reported by petrol station operators. It covers approximately 7,400 stations across England, Wales, Scotland and Northern Ireland, reporting prices for up to six fuel types.
| Code | Fuel type | Category |
|---|---|---|
E10 | Unleaded (standard, up to 10% ethanol) | Petrol |
E5 | Super Unleaded (up to 5% ethanol) | Petrol |
B7_STANDARD | Diesel (standard, up to 7% biodiesel) | Diesel |
B7_PREMIUM | Premium Diesel | Diesel |
B10 | Diesel (up to 10% biodiesel) | Diesel |
HVO | Hydrotreated Vegetable Oil (renewable) | Diesel |
The scraper
Fuel Finder API (GOV.UK)
│
│ OAuth2 client credentials → bearer token (1h TTL)
│ GET /api/v1/pfs/fuel-prices?batch-number=N (500 stations/batch, ~15 batches)
│ GET /api/v1/pfs?batch-number=N (station details)
▼
┌─────────────────────┐
│ Scraper │ Python — api_client.py + scrape.py
│ │
│ 1. Authenticate │ OAuth2 client credentials
│ 2. Fetch batches │ Paginate through all stations + prices
│ 3. Upsert stations │ Insert/update station records
│ 4. Insert prices │ Append-only, deduplicated
│ 5. Detect anomalies│ Flag suspicious prices (not filter)
│ 6. Refresh view │ Rebuild current_prices snapshot
│ 7. Backup to S3 │ Raw JSON (optional)
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ PostgreSQL │
│ │
│ stations │ ~7,400 fuel stations with lat/lng
│ fuel_prices │ append-only price change events
│ brand_aliases │ raw → canonical brand mapping
│ brand_categories │ brand → forecourt type
│ postcode_regions │ postcode → ONS region
│ postcode_lookups │ postcodes.io enrichment cache
│ current_prices │ materialised view (live snapshot)
└─────────────────────┘
Scrape modes
Full scrape
Fetches all stations and all prices from scratch (~15 batches of 500 stations each). Used for the initial load and daily refreshes. Updates station metadata (addresses, amenities, opening times) and inserts any new prices.
Incremental scrape
Uses the effective-start-timestamp API parameter to fetch only prices that have changed since the last successful scrape. Much faster — typically returns a few hundred changes instead of ~24,000 prices. Ideal for frequent polling (every 30 minutes).
Auto mode
Checks the database for the most recent successful scrape. If one exists, runs incremental; otherwise runs a full scrape. This is the default — set it up on a schedule and forget about it.
How prices are stored
The fuel_prices table is append-only, but with deduplication. When a scrape runs:
- For each (station, fuel type) pair, the scraper looks up the most recently stored price
- If the new price is the same as the stored price, it's skipped
- If the price has changed, a new row is inserted with the current timestamp
- The new price is checked against anomaly rules and flagged if suspicious
This means every row in fuel_prices represents a genuine price change event, keeping storage lean and making time-series analysis straightforward.
Anomaly detection
On insert, each price is checked against three rules. Suspicious prices are flagged, not filtered — the data is preserved but marked for review.
| Flag | Trigger |
|---|---|
price_below_floor | Price below 80p/litre |
price_above_ceiling | Price above 300p/litre |
likely_decimal_error | Price looks like pounds instead of pence (e.g. 1.45 instead of 145.0) |
large_price_jump | Price changed by more than 30% from previous |
Brand normalisation
The API provides brand names inconsistently — ESSO, Esso, esso all appear for the same company. Three layers of normalisation clean this up:
- Brand aliases — bulk mapping of raw API strings to canonical names (e.g.
"TESCO"→"Tesco"). Managed via the Data tab. - Station overrides — per-station corrections for edge cases where the API brand is wrong or the alias isn't granular enough.
- Resolution order —
station_override > brand_alias > raw_brand_name. Overrides always win.
Raw brand values are never modified in the database. Normalisation is applied in the materialised view, so any change can be reversed.
Forecourt categories
Stations are classified into categories based on their canonical brand name, not the API's is_supermarket_service_station flag (which is unreliable — it flags BP, Texaco, and Maxol as supermarkets).
| Category | Examples | Description |
|---|---|---|
| Supermarket | Tesco, Asda, Sainsburys, Morrisons, Waitrose | Supermarket-operated forecourts |
| Major Oil | Shell, BP, Esso, Texaco, Jet, Gulf | Major oil company brands |
| Motorway Operator | Welcome Break, EG On The Move, Applegreen | Motorway service area operators |
| Fuel Group | Motor Fuel Group, Rontec, Harvest Energy | Fuel wholesalers / groups |
| Convenience | Spar, Circle K, Maxol | Convenience store forecourts |
| Independent | (any unmapped brand) | Independent operators, default |
| Motorway | (any station with motorway flag) | Motorway flag always takes priority |
Regional mapping
Each station's postcode is mapped to an ONS-style region using the first 1–2 letters of the postcode (the postcode area). For example, SW1A → SW → London, M1 → M → North West.
This enables regional price comparisons (e.g. "London is 3p/litre more expensive than the North East") without requiring geocoding.
Postcodes.io enrichment
Each unique station postcode is looked up via postcodes.io, a free and open API for UK postcode data. The results are cached in a postcode_lookups table, providing:
- Authoritative coordinates — fixes ~85 stations where the Fuel Finder API reports incorrect lat/lng (e.g. sign errors placing stations in the North Sea)
- Administrative geography — local authority district, county, ward, parish
- Parliamentary constituency — for political analysis
- Rural/urban classification — ONS RUC 2021 categories (e.g. "Urban major conurbation", "Rural village in a sparse setting")
- Statistical areas — LSOA, MSOA, built-up area
Postcodes that postcodes.io doesn't recognise are recorded (so they aren't retried) and flagged in the Data tab as potential data quality issues. Coordinates for these can be manually corrected.
Licensing
Postcodes.io source code is available under the MIT Licence. The underlying postcode data is used under the following terms:
- Great Britain postcode data is used under the OS OpenData licence
- Northern Ireland postcode data (BT prefix) is used under the ONSPD licence
Contains Ordnance Survey data © Crown copyright and database right 2025.
Contains Royal Mail data © Royal Mail copyright and database right 2025.
Contains National Statistics data © Crown copyright and database right 2025.
Contains NRS data © Crown copyright and database right 2025.
The materialised view
current_prices is a PostgreSQL materialised view that provides the latest price per station per fuel type. It joins together:
- The most recent price from
fuel_prices - Station details from
stations - Canonical brand name (via aliases and overrides)
- Forecourt type (via
brand_categories) - Region (via
postcode_regions) - Human-friendly fuel names (via
fuel_type_labels) - Authoritative coordinates, admin district, constituency, rural/urban (via
postcode_lookups)
The view is refreshed after each scrape run and when you press Refresh View on the Data tab. All dashboard, map, search and API queries read from this view for fast, consistent results.
Running on a schedule
For production use, the scraper is designed to run on AWS Lambda with EventBridge Scheduler:
- Every 30 minutes: incremental scrape (fetch changed prices only)
- Daily at 03:00 UTC: full scrape (refresh all station metadata + prices)
Raw JSON responses from each scrape are optionally backed up to S3 for audit and replay purposes.