Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2024.06.12 edition

Interest group positions, state tax revenues, NYC shelter exits, English women’s football, and Sudoku solves.

Interest group positions. Galen Hall et al. have compiled a dataset of “over 13 million policy positions stated by tens of thousands of interest groups and individuals on bills in 17 state legislatures over the past 25 years.” The authors collected and standardized the data, which span 1997 to 2022, from lobbying and testimony disclosures. For each of those positions, the dataset indicates the relevant bill, client or individual represented, representative name, position phrase (for, against, monitoring, undecided, etc.), the date the position was reported, and more. It also provides details about each bill from Legiscan and the National Conference of State Legislatures, as well as client-industry categorizations from FollowTheMoney.org.

State tax revenues. How much money do US states collect through different types of taxes? The Census Bureau’s Quarterly Summary of State and Local Government Tax Revenue provides these figures every three months, going back decades. The categories include taxes on property, income, general sales, sales of specific products (such as tobacco, alcohol, gas, and gambling), licenses, and more. For several years now, the agency has also published monthly data for a subset of those taxes. As seen in: “Which states make the most from sports betting? What about lotteries?” by the Washington Post’s Andrew Van Dam. Previously: The Census’s Annual Survey of State and Local Government Finances (DIP 2020.11.18).

NYC shelter exits. A 2022 law requires New York City to report the monthly number of individuals and families exiting the city’s shelter system. Unfortunately, the city publishes those reports only as PDFs and without a historical archive. Patrick Spauster has built a pipeline to download and preserve the reports, and to turn them into structured data. For each month since May 2023, each row indicates the number of exits for a particular city agency, family/person category, and destination type. The latter includes various kinds of permanent housing, transitional housing, medical facilities, as well as “unknown.” Read more: Spauster’s analysis of the data for City Limits. Previously: NYC shelter counts (DIP 2023.12.13).

English women’s football. The English Women’s Football Database “covers all matches played since the 2011 season for the highest division (the Women’s Super League) and since the 2014 season for the second-highest division (the Women’s Championship).” The project, built by Rob Clapp, lists the date, teams, score, attendance, division, tier, and season of each match, as well as each season’s final standings. Previously: Josh Fjelstul’s English Football Database (DIP 2023.02.01), which Clapp cites as inspiration.

Sudoku solves. In the spirit of introspection, Sudoku enthusiast Vivek Rao has conducted a detailed analysis of his cell-by-cell performance on 100 puzzles from the New York Times’ daily offerings. The underlying data, collected via a custom browser extension that Rao built, indicates the order and timing of every cell he filled.