# Introducing the pct package

### Obtaining and reproducing data from the Propensity to Cycle Tool (PCT)

#### 2023-02-16

Set eval=TRUE to run this code when knitting:

knitr::opts_chunk$set(eval = FALSE) ## Introduction The goal of the pct package is to increase the accessibility and reproducibility of the outputs from the Propensity to Cycle Tool (PCT), a research project and web application hosted at www.pct.bike. The tool is one of just ~300 central government websites exempt from the requirement to transition to the .gov.uk domain name, and is a recommended source of evidence in the preparation of Local Walking and Cycling plans (LCWIPs), as outlined in technical guidance1, supporting the Cycling and Walking Infrastructure Strategy (CWIS), an amendment of the Infrastructure Act 2015. For an overview of the data provided by the PCT, clicking on the previous link and trying it out is a great place to start. An academic paper on the PCT provides detail on the motivations and methods underlying the project2. Since work on the package began in 2015, for example during the ODI Leeds Hack my Route hackathon (see an early prototype of the tool here), the features and demand for the PCT have evolved substantially. In early 2019, for example, the School travel layer was added to the main PCT site to provide evidence nationwide on the potential benefits of scenarios of cycling uptake, and where safe routes to school should be prioritised3. In fact, a major aim of the PCT was to enable people to extend the tool2: We envision stakeholders in local government modifying scenarios for their own purposes, and that academics in relevant fields may add new features and develop new use cases of the PCT. Motivated by this vision of adaptable transport planning tools, this introductory vignette demonstrates how the package works with an example from the Isle of Wight, an island just off the southern coast of Britain, with a population of ~140,000 people. Before demonstrating some of the package’s key functions, it’s worth providing a little context. ## Why this package? The Propensity to Cycle Tool was commissioned by the UK’s Department for Transport to help planners and others prioritise investment and policies to get people cycling, as outlined in the Government report National propensity to cycle: full report with annexes4. However, the academic team leading the project had a wider sub-aim: of making transport evidence more accessible, encouraging evidence-based transport policies, and encouraging a more democratic transport planning process, and that means open transport data and open source transport modelling tools2. The code base underlying the PCT is publicly available (see github.com/npct). However, the code hosted there is not easy to run or reproduce, which is where this package comes in: it provides quick access to the data underlying the PCT and enables some of the key results to be reproduced quickly. It was developed primarily for educational purposes (including for upcoming PCT training courses) but it may be useful for people to build on the the methods, for example to create a scenario of cycling uptake in their town/city/region. In summary, if you want to know how PCT works, be able to reproduce some of its results, and build scenarios of cycling uptake to inform transport policies enabling cycling in cities worldwide, this package is for you! ## Installation You can install the development version of the package as follows: remotes::install_github("ITSLeeds/pct") Load the package as follows: library(pct) We will also use the following packages in this tutorial: library(sf) library(dplyr) library(stplanr) library(leaflet) library(ggplot2) library(pbapply) # Get PCT data From feedback, we hear that the use of the data is critical in decision making. Therefore, one area where the package could be useful is making the data “easily” available to be processed. To download the data within www.pct.bike, we have added a suite of functions: • get_pct() • get_pct_rnet() • get_pct_zones() • get_pct_lines() • get_pct_centroids() • get_pct_routes_fast() • get_pct_routes_quiet() There are other get_() functions that get official data underlying the PCT, as we will see in a later section. For now, let’s see how the functions work. To get the centroids in Isle of Wight at the lower-resolution (smaller files) MSOA level (LSOA level data is returned by default or by replacing msoa with lsoa in the code below) you would run: wight_centroids = get_pct_centroids(region = "isle-of-wight", geography = "msoa") wight_zones = get_pct_zones(region = "isle-of-wight", geography = "msoa") Let’s verify that the data gave us what we would expect to see: plot(wight_centroids[, "bicycle"]) plot(wight_zones[, "bicycle"]) The results are indeed as we would expect, with the centroid data showing points and the zone data showing zones. The zones with higher cycling levels are in the more densely populated south of the island, as we would expect. Likewise, the following command downloads the desire lines for the Isle of Wight: wight_lines_pct = get_pct_lines(region = "isle-of-wight", geography = "msoa") The rest of the get_pct_ functions are similar to the above two examples and download data from www.pct.bike. However, the base of these functions is get_pct(), which takes the following arguments: • base_url = "https://github.com/npct/pct-outputs-regional-R/raw/master": just in case if you wanted to download the data from a similar server • purpose = "commute": soon there will be “schools” and maybe other modes, but currently commute is the only option. • geography = "msoa": MSOA or LSOA • region = NULL: regions within pct::pct_regions • layer = NULL: one of z (zones), c (centroids), l (desire lines), rf (routes fast), rq (routes quiet) or rnet. • even extension = ".Rds" as PCT data is available in various formats. For the purpose of this package we have made the default option of “Rds”. # Comparing downloaded data with the PCT web app To compare the downloaded data with data in the PCT web app, we will take a subset of the wight_lines_pct dataset. The top 30 travelled desire lines by number of commuters who use cycling as their main mode is taken in the following code chunk. The reason for selecting the top 30 will become apparent (the wight_lines_30 object is provided in the PCT package): wight_lines_30 = wight_lines_pct %>% top_n(30, bicycle) The resulting wight_lines_pct and wight_lines_30 datasets are available in the package. We’ll use the smaller one for speed. Note: these contain many variables, three of which (the number of people cycling, driving and walking along the desire lines from the 2011 Census) are shown below for the Isle of Wight: lwd = wight_lines_30$all / mean(wight_lines_30$all) * 5 plot(wight_lines_30[c("bicycle", "car_driver", "foot")], lwd = lwd) To provide another view of the data, focus on cycling, let’s create a leaflet map: pal = colorNumeric(palette = "RdYlBu", domain = wight_lines_30$bicycle)
leaflet(data = wight_lines_30) %>%
color = ~ pal(bicycle)) %>%
addLegend(pal = pal, values = ~bicycle)

There was a reason for selecting the top 30 lines: it mirrors the view of the desire lines available from the PCT web application for the island, available at www.pct.bike/m/?r=isle-of-wight (note that Straight Lines is selected from the Cycling Flows dropdown menu in the image below, and by default shows the top 30 flows by number of bicycle trips).

## Getting Census data

The previous section showed that data downloaded with get_pct*() functions get the results generated by the PCT. However, they do not reproduce the results generated by the PCT, starting from first principles and publicly available, official data. Underlying the PCT is origin-destination data from the 2011 Census. The MSOA-level data is open access, so we only provide access to this dataset. The following command gets the origin-destination data for the Isle of Wight:

wight_od_all = get_od(region = "wight")
summary(wight_od_all$geo_code1 %in% wight_centroids$geo_code)
summary(wight_od_all$geo_code2 %in% wight_centroids$geo_code)

Note that all the origin codes match the Isle of Wight centroid codes, but most of the destination zones do not. This is because many people on the island work outside the island. get_od() by default returns only OD pairs in which the commute trips originate from the area entered.

To make the dataset smaller and simpler, let’s subset it so it only contains OD pairs in which the origin and destination are in the island (the resulting wight_od data is provided in the package):

wight_od = wight_od_all %>%
filter(geo_code2 %in% wight_centroids$geo_code) To convert the results to geographic desire lines, we can use the function od2line() from the stplanr package: wight_lines = od2line(wight_od, wight_centroids) nrow(wight_lines) sum(wight_lines$all)

The previous code chunk downloads and processes 324 origin-destination pairs, representing inter-zonal commuting trips made by 42,139 people on the island (population: 140,000). By default, the function includes intra-zonal flows, but these can be omitted as follows (the argument omit_intrazonal in get_od() does the same thing):

wight_lines_census = wight_lines %>%
filter(geo_code1 != geo_code2)
nrow(wight_lines_census)
sum(wight_lines_census$all) Another OD data processing step developed for the PCT was converting oneway lines into 2 way lines. This can be done as follows: wight_lines_census1 = od_oneway( wight_lines_census, attrib = c("all", "bicycle") ) nrow(wight_lines_census1) / nrow(wight_lines_census) sum(wight_lines_census1$all) / sum(wight_lines_census$all) Note that the resulting lines contain 50% of the number of lines, but the same number of trips: this is because 2 separate lines between the same zones have been converted into 1 line representing the combined number of trips in both directions, for each OD pair. This step is not essential but it has a couple of advantages: it was used in the PCT to make the routing more computationally efficient (less work computing the same route twice); and it makes visualising the lines and routes simpler. Now that the lines data contains data on 2 way trips between zones, we can estimate routes (note: the results on the PCT website contain estimated uptake levels from intrazonal flow) Visually, this involves converting the straight desire lines shown in the previous map into routes that can be cycled, as shown in the next code chunk. Note: this code does not run dynamically, because you need an CycleStreets.net API key for this, and it takes some time: wight_routes_fast = route( l = wight_lines_census1, route_fun = cyclestreets::journey, plan = "fastest") You can download these routes as follows: u = "https://github.com/ITSLeeds/pct/releases/download/0.5.0/wight_routes_fast.Rds" wight_routes_fast = readRDS(url(u)) A sample of these is provided in the package as wight_routes_30_cs, which was generated as follows: wight_lines_census_30 = wight_lines_census1 %>% top_n(30, bicycle) wight_routes_30_cs = wight_routes_fast %>% group_by(geo_code1, geo_code2) %>% summarise( all = mean(all), bicycle = mean(bicycle), av_incline = weighted.mean(gradient_smooth, w = distances), length = sum(distances), time = sum(time) ) %>% ungroup() %>% top_n(30, bicycle)  A simple verification that we have the right desire lines matched to the routes involves plotting the Euclidean vs Route distance, e.g. as follows: d = as.numeric(st_length(wight_lines_census_30)) / 1000 plot(d, wight_routes_30_cs$length / 1000, xlim = c(0, 10))
abline(a = c(0, 1))

How well does that match the route distance data downloaded from the PCT?

plot(wight_lines_30$rf_dist_km, wight_routes_30_cs$length)

Almost perfectly for most of the routes. Differences can be explained by changes in infrastructure since the PCT results were first generated (these will be updated in the Propensity to Cycle Tool on-line data later in 2019).

We now have everything needed to estimate cycling uptake for each desire lines on the Isle of Wight (we’ll do the calculation on the top 30 by current cycling levels).

## Estimate cycling uptake

Functions named with uptake_*() estimate cycling uptake:

• uptake_pct_godutch(): generates the “GoDutch” scenario level of cycling based on a particular route’s hilliness percentage and length.
• uptake_pct_govtarget(): generates the UK government target again based on the hilliness and length parameters.

We will estimate cycling potential with uptake_pct_godutch(), using the length and av_incline from the wight_routes_30_cs object.

pcycle_govtarget = uptake_pct_govtarget(
distance = wight_routes_30_cs$length, gradient = wight_routes_30_cs$av_incline * 100
)

In terms of cycling uptake, the results are shown below:

wight_routes_30_cs$govtarget = wight_lines_census_30$bicycle +
pcycle_govtarget * wight_lines_census_30$all wight_routes_30_cs$govtarget_pct = wight_lines_30$govtarget_slc ggplot(wight_routes_30_cs) + geom_point(aes(length, govtarget), colour = "red") + geom_point(aes(length, govtarget_pct), colour = "blue") cor(wight_routes_30_cs$govtarget, wight_routes_30_cs$govtarget_pct) The final computational stage is also one of the most important from a policy perspective: estimating cycling potential down to the street level, to help prioritise investment where it is most needed. This work is done by the overline2() function, as follows: wight_routes_30_ls = sf::st_cast(wight_routes_30_cs, "LINESTRING") rnet = overline(wight_routes_30_ls, "govtarget") plot(rnet) Running the same function for all routes in wight_routes_fast, generates the packaged data object wight_rnet, which was created as follows: wight_routes_fast_gt = wight_routes_fast %>% group_by(geo_code1, geo_code2) %>% mutate( govtarget = uptake_pct_govtarget(sum(distances), mean(gradient_smooth)) * (sum(all) + sum(bicycle)) ) wight_routes_fast_gt = sf::st_cast(wight_routes_fast_gt, "LINESTRING") wight_rnet = overline(wight_routes_fast_gt, "govtarget") pal = colorNumeric(palette = "RdYlBu", domain = wight_rnet$govtarget)
leaflet(data = wight_rnet) %>%
addLegend(pal = pal, values = ~govtarget)