Skip to contents

Introduction

The price quote functions will generally access and download the price quote data, but changes to the variables over the years and the nature of the data means that you’ll probably have to do some wrangling to get what you need - assuming what you’re interested in is the average retail price of stuff. The relevant functions are:

The data is accessed from the url stored in cpi_price_quotes_url(). The addresses of data 2010-2019 are stored in archive_urls().

The data is returned as is - in order to look at prices we need to exclude invalid records - pre February 2025 this is ones with a validity not 3 or 4, post February 2025 `validity is FALSE.

Sample code

Historic data

Lets look at some old bread prices.

library(ggplot2)

y <- 2013:2014

bread <- purrr::map(y, \(y) {
  get_cpi_price_quotes_archive(y, foodonly = TRUE) |> 
    dplyr::filter(item_id %in% c(210102, 210111))
}) |> dplyr::bind_rows()
#> Acquiring pricequote2013.zip
#> Reading price_quote_2013_q1.csv
#> Reading price_quote_2013_q2.csv
#> Reading price_quote_2013_q3.csv
#> Reading price_quote_2013_q4.csv
#> Acquiring pricequote2014.zip
#> Reading price_quote_2014_q1.csv
#> Reading price_quote_2014_q2.csv
#> Reading price_quote_2014_q3.csv
#> Reading price_quote_2014_q4.csv

bread |> 
  dplyr::filter(validity %in% c(3,4)) |> 
  dplyr::group_by(lubridate::year(quote_date), item_desc) |>
  dplyr::summarise(avg = mean(price)) |> 
  knitr::kable()
#> `summarise()` has grouped output by 'lubridate::year(quote_date)'. You can
#> override using the `.groups` argument.
lubridate::year(quote_date) item_desc avg
2013 LARGE LOAF-WHITE-UNSLICED-800G 1.396095
2013 WHITE SLICED LOAF BRANDED 750G 1.249862
2013 WHITE SLICED LOAF BRANDED 800G 1.240884
2014 LARGE LOAF-WHITE-UNSLICED-800G 1.370320
2014 WHITE SLICED LOAF BRANDED 800G 1.187626

Recent data

We can compare with some more recent data:


# If you want to get more than 1 year you can do eg rep(2024:2025, 12) and
# rep(1:12, 2). But you don't necessarily need purrr::map2
y <- rep(2024)
m <- rep(1:12)

more_bread <- purrr::map2(y, m, \(y,m) {
  get_cpi_price_quotes(y, m, foodonly = FALSE) |> 
    dplyr::filter(item_id %in% c(210102, 210111))
}) |> dplyr::bind_rows()
#> Reading  2024 1
#> Reading  2024 2
#> Reading  2024 3
#> Reading  2024 4
#> Reading  2024 5
#> Reading  2024 6
#> Reading  2024 7
#> Reading  2024 8
#> Reading  2024 9
#> Reading  2024 10
#> Reading  2024 11
#> Reading  2024 12

more_bread |> 
    dplyr::filter(validity %in% c(3,4)) |> 
  dplyr::group_by(quote_date, item_desc) |>
  dplyr::summarise(avg = mean(price)) |> 
  ggplot2::ggplot() +
  ggplot2::geom_line(ggplot2::aes(x = quote_date, y = avg, colour = item_desc))
#> `summarise()` has grouped output by 'quote_date'. You can override using the
#> `.groups` argument.

The validity variable changed in Feb 2025:


# Doing the same as more_bread, but without purrr::map2
latest_bread <- get_cpi_price_quotes(2025, 1:4, foodonly = FALSE) |> 
  dplyr::filter(item_id %in% c(210102, 210111)) |> 
  dplyr::mutate(valid = dplyr::case_when(quote_date <= "2025-01-01" & validity %in% c(3,4) ~ "Y",
                                  quote_date >= "2025-02-01" & validity == 1 ~ "Y",
                                  .default = "N"))
#> Reading  2025 1
#> Reading  2025 2
#> Reading  2025 3
#> Reading  2025 4

latest_bread |> 
  dplyr::filter(valid == "Y", quote_date >= "2013-01-01") |>
  dplyr::group_by(quote_date, item_desc) |> 
  dplyr::summarise(avg_price = mean(price)) |> 
  knitr::kable()
#> `summarise()` has grouped output by 'quote_date'. You can override using the
#> `.groups` argument.
quote_date item_desc avg_price
2025-01-01 LGE LOAF-WHTE-UNSLED-750-800G 1.721831
2025-01-01 WHITE SLICED LOAF BRANDED 750G 1.304189
2025-02-01 LGE LOAF-WHTE-UNSLED-750-800G 1.716102
2025-02-01 WHITE SLICED LOAF BRANDED 750G 1.324653
2025-03-01 LGE LOAF-WHTE-UNSLED-750-800G 1.747910
2025-03-01 WHITE SLICED LOAF BRANDED 750G 1.314762
2025-04-01 LGE LOAF-WHTE-UNSLED-750-800G 1.768971
2025-04-01 WHITE SLICED LOAF BRANDED 750G 1.314459

Sample data

Latest (post Jan 2025) data

mm23::get_cpi_price_quotes(2025, 3) |> dplyr::slice_sample(n = 5) |> t() |> knitr::kable()
#> Reading  2025 3
cs_id CP0119101 CP0116201 CP0112202 CP0118101 CP0112303
cs_desc PRE-COOKED DISHES BASED ON MEAT, FISH, VEGETABLES, PASTA OR OTHER CEREALS, CHILLED OR FROZEN ORANGES, TANGERINES AND SIMILAR CITRUS FRUITS, FRESH MEAT OF PIGS, FRESH, CHILLED OR FROZEN SUGAR PORK, DRIED, SALTED OR SMOKED
quote_date 2025-03-01 2025-03-01 2025-03-01 2025-03-01 2025-03-01
item_id 212918 212725 210707 212101 210808
item_desc FRZN READY-COOKD MEAL SERVES 1 SMALL TYPE ORANGES PER PACK/KG HOME KILLED PORK CHOP/STEAK SUGAR -GRANULATED-WHITE-PER KG BACON-BACK-PER KG
validity FALSE TRUE TRUE TRUE TRUE
shop_code 9808 807 46 2 60
price 0.00 2.50 16.65 1.09 10.76
indicator_box M Q NA NA NA
region 13 9 12 3 6
shop_type 3 1 1 3 2
shop_weight 1 7 1 1 1

Pre Feb 2025 data

mm23::get_cpi_price_quotes(2021, 6) |> dplyr::slice_sample(n = 5) |> t() |> knitr::kable()
#> Reading  2021 6
cs_id NA NA NA NA NA
cs_desc NA NA NA NA NA
quote_date 2021-06-01 2021-06-01 2021-06-01 2021-06-01 2021-06-01
item_id 211808 210403 220326 211808 310316
item_desc CHILLED POT DESSERT 125-200G HOME KILLED BEEF-LEAN MINCE KG TAKEAWAY CHICKEN & CHIPS CHILLED POT DESSERT 125-200G GIN PER NIP
validity 3 3 3 1 3
shop_code 814 814 59 941 70
price 1.85 6.50 5.80 0.00 4.50
indicator_box NA NA NA T NA
region 9 11 3 10 3
shop_type 1 1 2 1 1
shop_weight 1 1 1 1 1

Raw data formats

Raw data format (Feb 2025 onwards)

COLUMN HEADING DESCRIPTION
QUOTE_DATE The year and month in which the individual price observation was collected.
CS_ID Unique identification number of the Consumption Segment or Item collected. Value is equal to ITEM_ID if no CS alternative is available.
CS_DESC Description of the Consumption Segment or Item collected.
ITEM_ID Unique identification number of the Item collected.
ITEM_DESC Description of the Item collected.
VALIDITY The status of the individual price quote:
TRUE = Quote included in the current months index.
FALSE = Quote not included in the current months index.
SHOP_CODE Shop code from which the individual price quote was obtained.
PRICE Observed price on date of collection (in £GBP).
INDICATOR_BOX Indicator used to highlight any item or price change:
C = Comparable – change in product being priced, new product is similar to the previous product.
M = Missing – not sold at shop.
N = Non-comparable – change in product being priced, new product is not comparable to the previous product.
P = Current price unavailable for the period.
Q = Supplementary message has been provided (the message is not published due to ONS disclosure rules).
R = Recovery of price at end of sale or special offer.
S = Sale price or special offer.
T = Temporarily out of stock.
W = Size change.
PRICE_RELATIVE_CPI The quote index for the individual price observation. (PRICE/BASE_PRICE_CPI).
PRICE_RELATIVE_RPI The quote index for the individual price observation. (PRICE/BASE_PRICE_RPI).
LOG_PRICE_RELATIVE_CPI The natural logarithm of the individual price relative (PRICE_RELATIVE_CPI).
LOG_PRICE_RELATIVE_RPI The natural logarithm of the individual price relative (PRICE_RELATIVE_RPI).
STRATUM_WEIGHT The stratum weight applicable to the item/stratum combination for the period.
STRATUM_TYPE Describes the items stratification method:
0 = Not stratified.
1 = Stratified by REGION.
2 = Stratified by REGION and SHOP_TYPE.
3 = Stratified by SHOP_TYPE.
REGION Approximation to Government Office Regions (GOR):
1 = Catalogue collections (head office).
2 = London.
3 = SE.
4 = SW.
5 = East Anglia.
6 = East Midlands.
7 = West Midlands.
8 = Yorks & Humber.
9 = NW.
10 = North.
11 = Wales.
12 = Scotland.
13 = Northern Ireland.
SHOP_TYPE Shop type indicator:
1 = Multiple (10 or more outlets).
2 = Independents (less than 10 outlets).
3 = Item not stratified by retailer.
SHOP_WEIGHT The relative weight (replication factor) for the shop/item combination.
BASE_PRICE_CPI Observed or imputed base period price (CPI).
BASE_PRICE_RPI Observed or imputed base period price (RPI).
STRATUM_CELL Stratum cell of the individual price quote. This is dependent on STRATUM_TYPE (see above):
If STRATUM_TYPE = 0, STRATUM_CELL = 0
If STRATUM_TYPE = 1, STRATUM_CELL = REGION (as derived above)
If STRATUM_TYPE = 3, STRATUM_CELL = SHOP_TYPE (as derived above)
If STRATUM_TYPE = 2 and SHOP_TYPE = 1, then STRATUM_CELL = REGION
If STRATUM_TYPE = 2 and SHOP_TYPE = 2, then STRATUM_CELL = REGION +13
TEMPORAL Quotes collected on the Friday before the main index day (usually Tuesday) and excluded from the calculation of RPI.
TRUE = Temporal quotes
FALSE = Non-temporal quote
Use VALIDITY = TRUE and TEMPORAL = FALSE to filter quotes that have contributed to the construction of the RPI index.
INDEX_ALGORITHM_RPI The formula method used for RPI calculations (as set out in section 3.4 of the CPIH Technical Manual, 2019 edition):
1 = Dutot (RA).
2 = Carli (AR).

Raw data format (Pre Feb 2025)

Column Heading Description
quote_date year and month of data collection (yyyymm)
item_id identification of the item collected
validity validity of the quote (only codes 3 and 4 used, except where indicator_box = ‘N’)
3 = Validated by system
4 = Accepted internally by ONS staff
shop_code code of shop that the price was collected from
price price of the item collected in the quote month
indicator_box indicator used to highlight any item or price change
S = Sale
R = Recovery
M = Missing
C = Comparable
N = Non comparable
T = Temporarily out of stock
X = Comparable item on sale
Z = Non comparable item on sale
price_relative collected price (current month) divided by base price
stratum_weight the weight of the stratum that the price quote contributes to
stratum_type type of stratification used
0 = not stratified
1 = region.
2 = region and shop
3 =shop
end_date where end = 999999 price contributed to index for period otherwise price was omitted
region where the location is
1 = Catalogue collections
2 = London
3 = SE
4 = SW
5 = East Anglia
6 = East Midlands
7 = West Midlands
8 = Yorks & Humber
9 = NW
10 = North
11 = Wales
12 = Scotland
13 = NI
shop_type either multiple or independents.
1 = Multiple (10 or more outlets)
2 = Independents (less than 10 outlets)
shop_weight estimated number of a specific supermarket within a region
base_price price of an item at the base date (January of that year).
base_validity validity in the base month (see validity above)
stratum_cell type of stratification used
There are 3 types of stratum,
by shop, by region and by region and shop.
The stratum types and cells are as follows,
Not stratified = 0
By shop;
1 = Multiple (10 or more outlets)
2 = Independents (less than 10 outlets)
By region,
2 = London
3 = SE
4 = SW
5 = East Anglia
6 = East Midlands
7 = West Midlands
8 = Yorks & Humber
9 = NW
10 = North
11 = Wales
12 = Scotland
13 = NI
By region and shop type,
If the shop type is a multiple, then the stratum cell is
equal to the regional stratum code.
If the shop type is an independent,
then the stratum cell is equal to the multiple code + 13