Construction Years Analysis
In this notebook, I thought it would be fun to try to visualize which sections of towns are the oldest, and which were constructed more recently. It’s also an exercise in generating and displaying raster (grid) data from vector data.
The assessor’s database, which is contained in the residential tax parcels file, contains a construction yaer field. Our goal is to generate a heatmap visualization showing the average year of construction in the neighborhood of a given map point.
We’ll approach this as follows:
Create a bounding box for Milton
Generate a regular grid of points filling the Milton bounding box on which to calculate the for construction year heatmap
For each point, calculate average construction date of N nearest neighbor properties, and add to image. If no properties within x meters, black.
Display construction year heatmap
[1]:
import geopandas as gpd
import numpy as np
import folium
import joblib
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display
from shapely.geometry import Point
import milton_maps as mm
sns.set_style("darkgrid", {"axes.facecolor": ".9"})
sns.set(rc = {'figure.figsize':(15,8)})
sns.set_context("notebook")
/Users/alexhasha/Library/Caches/pypoetry/virtualenvs/milton-maps-gfMaDXEA-py3.10/lib/python3.10/site-packages/geopandas/_compat.py:123: UserWarning: The Shapely GEOS version (3.10.3-CAPI-1.16.1) is incompatible with the GEOS version PyGEOS was compiled with (3.10.4-CAPI-1.16.2). Conversions between both will be slow.
warnings.warn(
[2]:
town_boundaries = gpd.read_file("../data/processed/town_boundaries.shp.zip").set_index("TOWN_ID")
milton_quincy_boundaries = town_boundaries[town_boundaries.TOWN.isin(["MILTON", "QUINCY"])]
milton_quincy_boundaries.shape
[2]:
(2, 19)
[3]:
residential_tax_parcels = joblib.load("../data/processed/residential_tax_parcels.pkl")
residential_tax_parcels.shape
[3]:
(32387, 28)
[4]:
town_boundaries.crs
[4]:
<Projected CRS: EPSG:26986>
Name: NAD83 / Massachusetts Mainland
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: United States (USA) - Massachusetts onshore - counties of Barnstable; Berkshire; Bristol; Essex; Franklin; Hampden; Hampshire; Middlesex; Norfolk; Plymouth; Suffolk; Worcester.
- bounds: (-73.5, 41.46, -69.86, 42.89)
Coordinate Operation:
- name: SPCS83 Massachusetts Mainland zone (meters)
- method: Lambert Conic Conformal (2SP)
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich
Calculate the Milton bounding box.
[5]:
(xmin, ymin, xmax, ymax) = town_boundaries.loc[189].geometry.bounds
These coordinates are in the “EPSG:26986” coordinate reference system, which is a cartesian projection measured in meters. The absolute values can safely be ignored, but we need to convert to latitude and longitude when we want to display on a folium map.
[6]:
(xmin, ymin, xmax, ymax)
[6]:
(230482.1424999982, 883438.2446000017, 238662.33250000328, 892056.4745999984)
Generate a regular x,y grid with points spaced 50 meters apart within the Milton bounding box, and reshape to a list of shapely Point
objects.
[7]:
# Generate grid and point list
xticks = np.arange(xmin, xmax, 50.0)
yticks = np.arange(ymin, ymax, 50.0)
X, Y = np.meshgrid(xticks, yticks, indexing='xy')
x = X.ravel()
y = Y.ravel()
p = [Point(xx, yy) for xx,yy in zip(x,y)]
[8]:
milton_grid = gpd.GeoDataFrame(data={'x': x, 'y': y}, geometry=p, crs="EPSG:26986")
milton_grid.head()
[8]:
x | y | geometry | |
---|---|---|---|
0 | 230482.1425 | 883438.2446 | POINT (230482.142 883438.245) |
1 | 230532.1425 | 883438.2446 | POINT (230532.142 883438.245) |
2 | 230582.1425 | 883438.2446 | POINT (230582.142 883438.245) |
3 | 230632.1425 | 883438.2446 | POINT (230632.142 883438.245) |
4 | 230682.1425 | 883438.2446 | POINT (230682.142 883438.245) |
For each point in the grid, we want to identify the \(k\)-nearest neighbor tax parcels within a specified radius and calculate the average construction year of those neighboring parcels.
We start by generating a 150 meter buffer circle around each point in the grid:
[9]:
# Generate a buffer circle of radius 150 meters for each grid point.
milton_grid_buffered = milton_grid.copy()
milton_grid_buffered['geometry'] = milton_grid_buffered.geometry.buffer(150)
milton_grid_buffered.head().plot()
[9]:
<Axes: >
And convert the residential tax parcels dataframe geometries from boundary polygons to their centroids:
[10]:
residential_parcel_centers = residential_tax_parcels.copy()
residential_parcel_centers['geometry'] = residential_parcel_centers['geometry'].centroid
Now, perform a spatial join between the grid buffer circles and tax parcel centroids, finding all tax parcels within the 150 meter buffer radius of each grid point:
[11]:
grid_neighbors = residential_parcel_centers.sjoin(milton_grid_buffered, predicate="within")
grid_neighbors.shape
[11]:
(302948, 31)
For each grid point / tax parcel pair, calculate the distance from the grid point to the tax parcel centroid
[12]:
def grid_distance(row):
dist = row['geometry'].distance(Point(row['x'], row['y']))
return dist
grid_neighbors['neighbor_dist'] = grid_neighbors.apply(grid_distance, axis=1)
[13]:
grid_neighbors.head()
[13]:
SHAPE_Leng | SHAPE_Area | MAP_PAR_ID | POLY_TYPE | MAP_NO | SOURCE | PLAN_ID | LAST_EDIT | BND_CHK | NO_MATCH | ... | STYLE | LOT_SIZE | TOTAL_VAL | LAND_VAL | SITE_ADDR | IS_RESIDENTIAL | index_right | x | y | neighbor_dist | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LOC_ID | |||||||||||||||||||||
F_757603_2907021 | 718.018773 | 21213.803766 | A 10 1B | FEE | A | ASSESS | A-9-10 | 20180101.0 | NaN | N | ... | CONTEMPORY | 2.19 | 1459500.0 | 656200.0 | 1259 BRUSH HILL RD | True | 8208 | 230882.1425 | 885938.2446 | 128.588769 |
F_757603_2907021 | 718.018773 | 21213.803766 | A 10 1B | FEE | A | ASSESS | A-9-10 | 20180101.0 | NaN | N | ... | CONTEMPORY | 2.19 | 1459500.0 | 656200.0 | 1259 BRUSH HILL RD | True | 8209 | 230932.1425 | 885938.2446 | 124.374803 |
F_757603_2907021 | 718.018773 | 21213.803766 | A 10 1B | FEE | A | ASSESS | A-9-10 | 20180101.0 | NaN | N | ... | CONTEMPORY | 2.19 | 1459500.0 | 656200.0 | 1259 BRUSH HILL RD | True | 8210 | 230982.1425 | 885938.2446 | 139.295053 |
F_758074_2906873 | 816.535998 | 27069.928790 | A 10 12 | FEE | A | OTHER | A-9-10 | 20120120.0 | NaN | N | ... | COLONIAL | 6.54 | 2079700.0 | 754100.0 | 1425 BRUSH HILL RD | True | 8210 | 230982.1425 | 885938.2446 | 111.604083 |
F_758204_2906549 | 641.513221 | 21533.836193 | A 10 7 | FEE | A | OTHER | A-9-10 | 20120120.0 | NaN | N | ... | COLONIAL | 5.21 | 2664100.0 | 739900.0 | 1451 BRUSH HILL RD | True | 8210 | 230982.1425 | 885938.2446 | 120.765580 |
5 rows × 32 columns
[14]:
gn = grid_neighbors.groupby(["x", "y"]).neighbor_dist.count()
gn.head()
[14]:
x y
230782.1425 886038.2446 1
886088.2446 1
230832.1425 885988.2446 1
886038.2446 1
886088.2446 1
Name: neighbor_dist, dtype: int64
Now, we implement the “k-nearest neighbor” average by grouping the data on grid points (x,y), sorting by increasing distance, and truncating after up to \(k\) neighbors. We’ll use \(k=5\) for the calculation.
[15]:
def knearest_avg(df, k, field):
knearest_avg = df.sort_values(by="neighbor_dist", ascending=True).iloc[:k][field].mean()
return knearest_avg
avg_year_built = grid_neighbors.groupby(["x", "y"]).apply(lambda g: knearest_avg(g, 5, "YEAR_BUILT")).reset_index()
avg_year_built.head()
[15]:
x | y | 0 | |
---|---|---|---|
0 | 230782.1425 | 886038.2446 | 1983.0 |
1 | 230782.1425 | 886088.2446 | 1983.0 |
2 | 230832.1425 | 885988.2446 | 1983.0 |
3 | 230832.1425 | 886038.2446 | 1983.0 |
4 | 230832.1425 | 886088.2446 | 1983.0 |
[16]:
avg_year_built.shape
[16]:
(10746, 3)
Construct a GeoDataFrame from the results:
[17]:
geom = avg_year_built.apply(lambda row: Point(row.x, row.y), axis=1)
avg_year_built_df = gpd.GeoDataFrame(data={"x": avg_year_built.x,
"y": avg_year_built.y,
"avg_year": avg_year_built[0]},
geometry=geom,
crs="EPSG:26986",
)
/Users/alexhasha/Library/Caches/pypoetry/virtualenvs/milton-maps-gfMaDXEA-py3.10/lib/python3.10/site-packages/pandas/core/dtypes/cast.py:130: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.
arr = construct_1d_object_array_from_listlike(values)
[18]:
df_crs = avg_year_built_df.crs.to_string()
[19]:
fig, ax = plt.subplots(1, figsize=(20,20))
ax = mm.plot_map(avg_year_built_df,
column="avg_year",
categorical=False,
markersize=30,
cmap='gnuplot',
alpha=0.5,
ax=ax
)
import contextily as ctx
ctx.add_basemap(ax, crs=df_crs)
We can see pockets of very old construction from the 19th century in the following areas:
Along Hillside Street
In East Milton Square around Adams Street
In Lower Mills near the Baker Chocolate Factory
Along Canton Avenue near Rt 138.
[20]:
map_center = town_boundaries.to_crs("EPSG:4326").loc[189].geometry.centroid
Here it is again as an interactive map:
[21]:
import branca.colormap as cm
colormap = cm.LinearColormap(colors=['red', 'blue', 'white'], index=[1765, 1900, 2021],vmin=1765,vmax=2021)
[22]:
map_center = town_boundaries.to_crs("EPSG:4326").loc[189].geometry.centroid
milton_map = folium.Map(width=800,height=800, location=[map_center.y, map_center.x], zoom_start=13)
folium.GeoJson(milton_quincy_boundaries.to_crs("EPSG:3857").loc[[189]].geometry).add_to(milton_map)
for i, row in avg_year_built_df.fillna(0).to_crs("EPSG:4326").iterrows():
folium.CircleMarker(
(row.geometry.y, row.geometry.x),
radius=1,
color=colormap(row['avg_year'])
).add_to(milton_map)
folium.CircleMarker((map_center.y, map_center.x), radius=1, color="yellow").add_to(milton_map)
display(milton_map)
[ ]: