⛽ Fuel Finder

How it works

This tool scrapes the GOV.UK Fuel Finder API — the same data source behind the government's consumer fuel price finder — and stores it in a PostgreSQL database to build a historical price record that the API itself doesn't provide.

The GOV.UK API only serves live snapshots: current prices at the time of the request. There is no way to retrieve yesterday's prices or see how prices have changed over time. By scraping regularly and storing every price change, we build a time-series dataset that enables trend analysis, regional comparisons, and anomaly detection.

Data source

The Fuel Finder API is a government service that aggregates fuel prices reported by petrol station operators. It covers approximately 7,400 stations across England, Wales, Scotland and Northern Ireland, reporting prices for up to six fuel types.

CodeFuel typeCategory
E10Unleaded (standard, up to 10% ethanol)Petrol
E5Super Unleaded (up to 5% ethanol)Petrol
B7_STANDARDDiesel (standard, up to 7% biodiesel)Diesel
B7_PREMIUMPremium DieselDiesel
B10Diesel (up to 10% biodiesel)Diesel
HVOHydrotreated Vegetable Oil (renewable)Diesel

The scraper

Fuel Finder API (GOV.UK)
    │
    │  OAuth2 client credentials → bearer token (1h TTL)
    │  GET /api/v1/pfs/fuel-prices?batch-number=N  (500 stations/batch, ~15 batches)
    │  GET /api/v1/pfs?batch-number=N               (station details)
    ▼
┌─────────────────────┐
│    Scraper           │  Python — api_client.py + scrape.py
│                     │
│  1. Authenticate    │  OAuth2 client credentials
│  2. Fetch batches   │  Paginate through all stations + prices
│  3. Upsert stations │  Insert/update station records
│  4. Insert prices   │  Append-only, deduplicated
│  5. Detect anomalies│  Flag suspicious prices (not filter)
│  6. Refresh view    │  Rebuild current_prices snapshot
│  7. Backup to S3    │  Raw JSON (optional)
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│    PostgreSQL        │
│                     │
│  stations           │  ~7,400 fuel stations with lat/lng
│  fuel_prices        │  append-only price change events
│  brand_aliases      │  raw → canonical brand mapping
│  brand_categories   │  brand → forecourt type
│  postcode_regions   │  postcode → ONS region
│  postcode_lookups   │  postcodes.io enrichment cache
│  current_prices     │  materialised view (live snapshot)
└─────────────────────┘

Scrape modes

Full scrape

Fetches all stations and all prices from scratch (~15 batches of 500 stations each). Used for the initial load and daily refreshes. Updates station metadata (addresses, amenities, opening times) and inserts any new prices.

Incremental scrape

Uses the effective-start-timestamp API parameter to fetch only prices that have changed since the last successful scrape. Much faster — typically returns a few hundred changes instead of ~24,000 prices. Ideal for frequent polling (every 30 minutes).

Auto mode

Checks the database for the most recent successful scrape. If one exists, runs incremental; otherwise runs a full scrape. This is the default — set it up on a schedule and forget about it.

How prices are stored

The fuel_prices table is append-only, but with deduplication. When a scrape runs:

  1. For each (station, fuel type) pair, the scraper looks up the most recently stored price
  2. If the new price is the same as the stored price, it's skipped
  3. If the price has changed, a new row is inserted with the current timestamp
  4. The new price is checked against anomaly rules and flagged if suspicious

This means every row in fuel_prices represents a genuine price change event, keeping storage lean and making time-series analysis straightforward.

Anomaly detection

On insert, each price is checked against three rules. Suspicious prices are flagged, not filtered — the data is preserved but marked for review.

FlagTrigger
price_below_floorPrice below 80p/litre
price_above_ceilingPrice above 300p/litre
likely_decimal_errorPrice looks like pounds instead of pence (e.g. 1.45 instead of 145.0)
large_price_jumpPrice changed by more than 30% from previous

Brand normalisation

The API provides brand names inconsistently — ESSO, Esso, esso all appear for the same company. Three layers of normalisation clean this up:

  1. Brand aliases — bulk mapping of raw API strings to canonical names (e.g. "TESCO""Tesco"). Managed via the Data tab.
  2. Station overrides — per-station corrections for edge cases where the API brand is wrong or the alias isn't granular enough.
  3. Resolution orderstation_override > brand_alias > raw_brand_name. Overrides always win.

Raw brand values are never modified in the database. Normalisation is applied in the materialised view, so any change can be reversed.

Forecourt categories

Stations are classified into categories based on their canonical brand name, not the API's is_supermarket_service_station flag (which is unreliable — it flags BP, Texaco, and Maxol as supermarkets).

CategoryExamplesDescription
SupermarketTesco, Asda, Sainsburys, Morrisons, WaitroseSupermarket-operated forecourts
Major OilShell, BP, Esso, Texaco, Jet, GulfMajor oil company brands
Motorway OperatorWelcome Break, EG On The Move, ApplegreenMotorway service area operators
Fuel GroupMotor Fuel Group, Rontec, Harvest EnergyFuel wholesalers / groups
ConvenienceSpar, Circle K, MaxolConvenience store forecourts
Independent(any unmapped brand)Independent operators, default
Motorway(any station with motorway flag)Motorway flag always takes priority
Categories are managed on the Data tab. After changes, hit Refresh View to rebuild the snapshot.

Regional mapping

Each station's postcode is mapped to an ONS-style region using the first 1–2 letters of the postcode (the postcode area). For example, SW1ASWLondon, M1MNorth West.

This enables regional price comparisons (e.g. "London is 3p/litre more expensive than the North East") without requiring geocoding.

Postcodes.io enrichment

Each unique station postcode is looked up via postcodes.io, a free and open API for UK postcode data. The results are cached in a postcode_lookups table, providing:

Postcodes that postcodes.io doesn't recognise are recorded (so they aren't retried) and flagged in the Data tab as potential data quality issues. Coordinates for these can be manually corrected.

Licensing

Postcodes.io source code is available under the MIT Licence. The underlying postcode data is used under the following terms:

Contains Ordnance Survey data © Crown copyright and database right 2025.
Contains Royal Mail data © Royal Mail copyright and database right 2025.
Contains National Statistics data © Crown copyright and database right 2025.
Contains NRS data © Crown copyright and database right 2025.

The materialised view

current_prices is a PostgreSQL materialised view that provides the latest price per station per fuel type. It joins together:

The view is refreshed after each scrape run and when you press Refresh View on the Data tab. All dashboard, map, search and API queries read from this view for fast, consistent results.

Running on a schedule

For production use, the scraper is designed to run on AWS Lambda with EventBridge Scheduler:

Raw JSON responses from each scrape are optionally backed up to S3 for audit and replay purposes.