Scotland’s Energy Performance Certificate Data

Energy Performance Certificates (EPCs) detail the:

“energy efficiency of a property, and includes recommendations for improving energy efficiency and reducing energy bills. An EPC is required when a property is constructed, sold or rented out to a new tenant.”

Scottish domestic EPC (DEPC) and non-domestic EPC (NDEPC) data is available for search and download from the Scottish Energy Performance Certificate Register hosted by the Energy Saving Trust.

Using PostgreSQL, the March 2023 extract of the Scottish EPC data, covering the first 10 years of EPC data from 2013 to 2022, was cleaned, enriched and explored.

Initial data exploration

Totals

The total number of Scottish EPC records was found to be:

DEPC: 1,562,252– inc. 10 records cited as being in England
– inc. 12 duplicate records
NDEPC: 59,453

Unique identifiers

Two forms of numeric unique building reference number are used in EPC records:

building_reference_number
or property_uprn
specific to EPCs only
osg_reference_number
or osg_uprn
see UPRN

Attribution

EPC records have a varying numbers of attributes and both database friendly and people friendly attribute names.

DEPC – 104 attributes
NDEPC – 35 attributes

Common attribution

Geographic attributes are common attributes across DEPC and NDEPC records.

Different attribution

The differences between the DEPC and NDEPC attribution can be grouped into the following broad areas:

DEPCNDEPC
Ratingsenergy ratings


– environment ratings
energy_performance_bands


– asset ratings
Building data– construction age band
– tenure
– built form
– extensions
– rooms and corridors
– lighting and fireplaces
– glazing
– floor area, height and level
– building environment
– floor area
Energy data – consumption
– costs, savings and tariffs
– efficiencies
– environment impact
– emissions
– demand
– insulation impact
– fuel, solar and wind
– heating
– emissions
– energy values
– standards
– energy sources

Only DEPC records differentiate the type of EPC assessment employed.

Geographies

The EPC address1, address2 and post_town attributes are free text, resulting in the mixed use of upper and lower case for data entry along with spelling mistakes.

The word cloud below shows the variation in incorrect spelling within the post_town attribute for 5 of the 8 Scottish cities.

Perth and Inverness were spelt correctly but sometimes replaced with the name of the respective shire. Dunfermline was spelt incorrectly in all instances, most often appearing as ‘DUNFERMLIME’ (2336 times).

As well as post codes, Government Statistical Service (GSS) codes for some administrative geographies have been included in EPC records:

data_zone – 2001 data zones (concatenated code and inter zone name)
– Small number of pre-GSS codes used.
data_zone_2011– 2011 data zones (concatenated code and inter zone name).
constituency– Pre-GSS codes used.
local_authority– No GSS codes used with DEPCs
– Pre-GSS codes used with NDEPCs

Controlled vocabularies

All categorical building and energy EPC attributes use controlled vocabularies for data entry choices during data collection.

Time

The charts below show the number of DEPC and NDEPC inspections per month between 2013 and 2022. The impact of COVID-19 can be clearly seen. A similar pattern was also seen with lodgement dates.

A very small number of DEPC and NDEPC records were found to have:

– incomplete or invalid dates
– inspection dates prior to 2013
– lodgement dates occurring before inspection dates

Null values

The chart below shows DEPC and NDEPC attributes with null values and their percentages of null values.

Across both DEPC and NDEPC records, geographic attributes are common attributes with null values.

Geographic data enrichment

The first geographic data enrichment was the addition of coordinates to the EPC data by joining with the Ordnance Survey UPRN data via the osg_reference_number/osg_uprn attribute.

Of those EPC records with UPRNs, greater than 99% of those UPRNs were found to be valid.

Further geographic enrichment was then possible for those EPC records with valid UPRNs and their associated coordinates:

– addition of GSS codes and names via spatial joins and lookup tables of other boundary GIS data.
– creation of a clean concatenated full address field
– addition of data quality flags for comparisons between derived/enriched and collected boundary geographies

Leave a comment