This is not an official City of Boston website. This is an academic project created by Harvard Kennedy School students

Methodology

To conduct our analysis we utilized a dataset of parking violations in the City of Boston in 2024. Each violator had a unique de-identified ID (DEID). To analyze one-time offenders, we filtered the dataset for violators whose DEID only appeared once in the dataset. As a result we distinguish between first-time offenders and one-time offenders. We make the assumption that one-time violators are different from violators with two or more parking violations in a calendar year.

To generate the heatmap visuals we used PyDeck to plot the latitude and longitude of each one-time violator's parking violation. The heatmaps illustrate the density of one-time parking violations across different areas of Boston.

To analyze repeat offenders we grouped individual violations by their DEID and counted the number of violations per DEID. We generated the point plot in PyDeck to visualize the spatial distribution of the top repeat offenders based on the number of violations they committed in 2024.

Dashboard Methodology

To generate an interactive dashboard that replicates our findings, we used Streamlit to create a Python web application that allows users to explore the 2024 parking violation summons data. The dashboard includes various visualizations, such as heatmaps and point plots, to illustrate the distribution of one-time and repeat parking violators in Boston.

Cluster Map Methodology

The cluster map visualization used spatial gridding (map bins). Spatial patterns are summarized using a fixed latitude/longitude grid. Each ticket is assigned to a grid cell by rounding latitude and longitude to the nearest 0.0005 degrees. Each grid cell is then colored based on the number of tickets assigned to that cell, with a color gradient indicating higher or lower concentrations of tickets.

Color scheme choice (map mode vs heatmap mode)

In “Map” mode, we use a discrete, high-contrast palette with semi-transparency so dense areas don’t fully obstruct the base map. In “Heatmap” mode, we switch to a kernel-density-style visualization (pydeck HeatmapLayer) where intensity is driven by the selected metric (or by point count for individual violations).

Quantile binning

Parking ticket counts are heavy-tailed: the dashboard computes distribution cutoffs dynamically from the filtered data using the 20th, 40th, 60th, and 80th percentiles. Each cell is assigned to one of five bins (≤20%, 20–40%, 40–60%, 60–80%, >80%) and colored accordingly; the legend shows the numeric thresholds for the current filter state.