31  Loading and Visualising the Historical Boundaries

Let’s imagine we’ve been told that the historical boundaries have been based on groups of LSOAs. We’ve been given a list of the LSOAs that belong to each dispatcher, but we only have a standard file of all LSOAs in our region. How can we turn our lists of LSOAs and this file into something to help us visualise the boundaries, and later start to alter the boundaries to try out different solutions?

Note

There are other ways in which you may get the boundaries.

  • You may be able to find a shapefile, geojson or geopackage file defining the boundaries.
  • You may draw the boundaries in another tool, like

31.1 Our region

We’re going to work with our Birmingham region again. We first worked with this in the section of the book covering travel times; we may return to this travel time data in a later chapter of this part.

To start with, let’s just visualise our wider region.

import geopandas

lsoa_geojson_path = 'https://github.com/hsma-programme/h6_3c_interactive_plots_travel/raw/main/h6_3c_interactive_plots_travel/example_code/LSOA_2011_Boundaries_Super_Generalised_Clipped_BSC_EW_V4.geojson'

lsoa_boundaries = geopandas.read_file(lsoa_geojson_path)

Let’s now filter this down to just our region.

xmin, xmax = 370000, 420000
ymin, ymax = 250000, 310000

bham_region = lsoa_boundaries.cx[xmin:xmax, ymin:ymax]

bham_region.plot(
    figsize=(10,7),
    edgecolor='black',
    color="cyan"
    )

Tip

There are other ways you could conduct this initial filtering - for example, using a list of LSOAs you need to cover, or a filter

Straight away we can see that there are a very large number of LSOAs in our problem, with a lot of variation in the area of each LSOA.

We’ll start with the split of our LSOAs by their overarching dispatch center.

We’ve been told that our dispatch centres are split based on the wider region name. Here, we’d visualised these names.

import matplotlib.pyplot as plt

bham_region.head()

bham_region["region"] = bham_region["LSOA11NM"].str[:-5]

fig, ax = plt.subplots(figsize=(10,7))

ax = bham_region.plot(
    ax=ax,
    edgecolor='black',
    column="region",
    legend=True,
    cmap="tab20",
    legend_kwds={'loc':'center left', 'bbox_to_anchor':(1.2, 0.5)}
    )

# Add a dark background to our plotted region
ax.set_facecolor('black')
fig.patch.set_facecolor('black')

# Add a label per region for easier crossreferencing
# particularly where colours aren't very distinct
one_row_per_region = bham_region.groupby('region').head(1)

for x, y, label in zip(one_row_per_region.geometry.centroid.x, one_row_per_region.geometry.centroid.y, one_row_per_region.region):
    ax.annotate(label, xy=(x, y), xytext=(0, 0), textcoords="offset points", color="white",
    fontweight="bold")

ax.axis('off')
/home/sammi/.local/lib/python3.10/site-packages/geopandas/geodataframe.py:1528: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

We’ve been told that the regions are as follows:

Centre 1

  • Dispatcher 1: Telford and Wrekin; Shropshire
  • Dispatcher 2: Cannock Chase; Lichfield; Tamworth
  • Dispatcher 3: South Staffordshire; Walsall
  • Dispatcher 4: Wolverhampton
  • Dispatcher 5: Dudley; Sandwell
  • Dispatcher 6: Birmingham
  • Dispatcher 7: Solihull; North Warwickshire

Centre 2

  • Dispatcher 1: Bromsgrove
  • Dispatcher 2: Wyre Forest
  • Dispatcher 3: Wychavon; Worcester
  • Dispatcher 4: Malvern Hills; Herefordshire
  • Dispatcher 5: Redditch; Stratford-on-Avon; Warwick

This information has been given to us in a dataframe; let’s import and inspect this, then join it to our other data.

import pandas as pd

boundary_allocations_df = pd.read_csv("boundary_allocations.csv")
boundary_allocations_df.head()
Region Centre Dispatcher
0 Telford and Wrekin Centre 1 1
1 Shropshire Centre 1 1
2 Cannock Chase Centre 1 2
3 Lichfield Centre 1 2
4 Tamworth Centre 1 2

If we join this to our boundary data, with the geodataframe on the left of the merge, we can then access the data about the dispatchers and centers as variables we can plot.

First, let’s do the join and view the resulting dataframer which will have the additional columns. As the original dataframe only contains separate columns for center and dispatcher, we will first create an additional column that joins the two together.

bham_region = pd.merge(
    bham_region,
    boundary_allocations_df,
    left_on="region",
    right_on="Region",
    how="left"
)

bham_region["centre_dispatcher"] = bham_region["Centre"].astype("str") + '-' + bham_region["Dispatcher"].astype("str")

bham_region.head()
FID LSOA11CD LSOA11NM LSOA11NMW BNG_E BNG_N LONG LAT GlobalID geometry region Region Centre Dispatcher centre_dispatcher
0 8649 E01008881 Birmingham 067A Birmingham 067A 412690 285438 -1.81463 52.46675 f79cd8fb-0cf2-4c84-b07c-55bffcf379bb POLYGON ((413040.413 285809.794, 413145.440 28... Birmingham Birmingham Centre 1 6 Centre 1-6
1 8650 E01008882 Birmingham 066A Birmingham 066A 411772 285390 -1.82814 52.46634 e7835311-f105-4c92-acae-ecd5255ce534 POLYGON ((412233.657 285480.231, 412255.373 28... Birmingham Birmingham Centre 1 6 Centre 1-6
2 8651 E01008883 Birmingham 078A Birmingham 078A 412121 285001 -1.82302 52.46283 29dbcbc2-4a29-4bb1-addb-42b37d91c647 POLYGON ((412775.838 285204.808, 412575.845 28... Birmingham Birmingham Centre 1 6 Centre 1-6
3 8652 E01008884 Birmingham 078B Birmingham 078B 411241 284712 -1.83598 52.46025 df5a2350-a102-4ecc-b028-5a0bbeded198 POLYGON ((411803.096 284902.674, 411563.054 28... Birmingham Birmingham Centre 1 6 Centre 1-6
4 8653 E01008885 Birmingham 076A Birmingham 076A 413207 284818 -1.80704 52.46116 13a855fc-fa09-42a5-a05e-947fcd141ce4 POLYGON ((413417.353 284803.764, 413034.282 28... Birmingham Birmingham Centre 1 6 Centre 1-6

Let’s first just visualise the split across our dispatching centres by plotting the center column.

ax = bham_region.plot(
    edgecolor='black',
    column="Centre",
    legend=True,
    legend_kwds={'loc':'center left', 'bbox_to_anchor':(1.2, 0.5)}
    )

ax.axis('off')

Then we’ll visualise the split per dispatcher, using our new column to avoid issues with dispatcher numbers being duplicated across centres.

ax = bham_region.plot(
    edgecolor='black',
    column="centre_dispatcher",
    legend=True,
    cmap="tab20",
    legend_kwds={'loc':'center left', 'bbox_to_anchor':(1.2, 0.5)}
    )

ax.axis('off')

Finally, let’s demonstrate how we could overlay the plot by dispatcher with something that highlights the per-centre boundary as well.

To do this, we will have to merge all of the polygons into a single large polygon. This will give us a brand new dataframe with just two rows referencing the two large polygons; we will view this at the end.

GenAI Alert - This code was modified from a suggested approach provided by ChatGPT

# Group by the specified column
grouped_centre_gdf = bham_region.groupby('Centre')

# Create a new GeoDataFrame for the boundaries of each group
boundary_list = []

for group_name, group in grouped_centre_gdf:
    # Combine the polygons in each group into one geometry
    combined_geometry = group.unary_union

    # Get the boundary of the combined geometry
    boundary = combined_geometry.boundary

    # Add the boundary geometry and the group name to the list
    boundary_list.append({'group': group_name, 'boundary': boundary})

# Create a GeoDataFrame from the list of boundaries
grouped_centre_gdf_boundary = geopandas.GeoDataFrame(boundary_list, geometry='boundary', crs=bham_region.crs)

grouped_centre_gdf_boundary.head()
group boundary
0 Centre 1 MULTILINESTRING ((366838.095 273147.999, 36728...
1 Centre 2 MULTILINESTRING ((376328.508 242672.351, 37647...

Finally, we’ll repeat our per-dispatcher

ax = bham_region.plot(
    edgecolor='black',
    column="centre_dispatcher",
    legend=True,
    cmap="tab20",
    legend_kwds={'loc':'center left', 'bbox_to_anchor':(1.2, 0.5)}
    )

grouped_centre_gdf_boundary.plot(
    edgecolor='red',
    ax=ax,
    linewidth=2
)

ax.axis('off')

31.2 Bringing in historical demand data

Finally, let’s bring in some historical demand data to see if we can spot any obvious issues.

First, so we can overlay both the centre boundaries and the per-dispatcher boundaries, we’ll repeat our earlier process to create a geodataframe with a single row and a single polygon per dispatcher.

# Group by the specified column
grouped_dispatcher_gdf = bham_region.groupby("centre_dispatcher")

# Create a new GeoDataFrame for the boundaries of each group
boundary_list = []

for group_name, group in grouped_dispatcher_gdf:
    # Combine the polygons in each group into one geometry
    combined_geometry = group.unary_union

    # Get the boundary of the combined geometry
    boundary = combined_geometry.boundary

    # Add the boundary geometry and the group name to the list
    boundary_list.append({'group': group_name, 'boundary': boundary})

# Create a GeoDataFrame from the list of boundaries
grouped_dispatcher_gdf_boundary = geopandas.GeoDataFrame(boundary_list, geometry='boundary', crs=bham_region.crs)

grouped_dispatcher_gdf_boundary.head()
group boundary
0 Centre 1-1 MULTILINESTRING ((368935.969 274516.594, 36824...
1 Centre 1-2 LINESTRING (412258.535 300984.197, 411756.668 ...
2 Centre 1-3 LINESTRING (381775.908 282326.500, 380577.001 ...
3 Centre 1-4 LINESTRING (388885.341 295185.385, 388630.406 ...
4 Centre 1-5 LINESTRING (388258.095 285110.427, 388666.258 ...
demand = pd.read_csv("demand_pop_bham.csv")

bham_region = bham_region.merge(demand, on="LSOA11CD")

ax = bham_region.plot(
    edgecolor='black',
    column="demand",
    cmap="Blues",
    legend=True,
    linewidth=0.3
    )

grouped_dispatcher_gdf_boundary.plot(
    edgecolor='purple',
    ax=ax,
    linewidth=1.5
)

grouped_centre_gdf_boundary.plot(
    edgecolor='red',
    ax=ax,
    linewidth=2.5
)

ax.axis('off')

31.3 Looking at the current balance of incidents across the region

The demand dataset provided gives us the total number of incidents our dispatchers have dealt with per region per year. Let’s do a bit of merging and exploratory data analysis to see how this varied historically, and so we can start to think about what would be a target level of demand for each region.

First let’s look at the per-centre rates.

grouped_by_centre = bham_region.groupby('Centre')[['demand']].sum()
grouped_by_centre['difference_from_average'] = (grouped_by_centre['demand'] - grouped_by_centre['demand'].mean()).astype('int')
grouped_by_centre
demand difference_from_average
Centre
Centre 1 277146 110663
Centre 2 55820 -110663

And now by dispatcher

grouped_by_dispatcher = bham_region.groupby('centre_dispatcher')[['demand']].sum()
mean_demand = grouped_by_dispatcher['demand'].mean()
print(f"The demand, averaged across {len(bham_region['centre_dispatcher'].unique())} regions, is {mean_demand:.0f} incidents per year")
The demand, averaged across 12 regions, is 27747 incidents per year
grouped_by_dispatcher['difference_from_mean'] = (grouped_by_dispatcher['demand'] - mean_demand).astype('int')
grouped_by_dispatcher
demand difference_from_mean
centre_dispatcher
Centre 1-1 7600 -20147
Centre 1-2 12825 -14922
Centre 1-3 36084 8336
Centre 1-4 26690 -1057
Centre 1-5 63379 35631
Centre 1-6 109387 81639
Centre 1-7 21181 -6566
Centre 2-1 9630 -18117
Centre 2-2 9429 -18318
Centre 2-3 16699 -11048
Centre 2-4 4217 -23530
Centre 2-5 15845 -11902
print(f"The average difference from the mean across all regions is {abs(grouped_by_dispatcher['difference_from_mean']).mean():.1f}")
The average difference from the mean across all regions is 20934.4
import plotly.express as px

px.bar(grouped_by_dispatcher.reset_index(), x="centre_dispatcher", y="difference_from_mean")

No wonder our dispatchers are struggling - there are huge discrepancies in the demand they have been responsible for.

So what are we going to do about it?

Unlike in our travel time location work in the location allocation chapter, where we initially focussed on minimize the average travel time or population-weighted travel time, here we instead want to minimize the difference between the demand for each region going forward - so let’s move onto the next chapter and explore how we will do that.