Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2024.10.23 edition

SBA disaster loans, US buildings, the cost of sustenance, German election results, and synthesizers.

SBA disaster loans. “Following a declared disaster,” the US Small Business Administration offers “disaster assistance in the form of low-interest, long-term disaster loans for damages not covered by insurance or other recoveries to businesses of all sizes, private nonprofit organizations, as well as homeowners and renters.” The SBA publishes anonymized data about each such loan in fiscal years 2000 to 2022, drawn directly from its Disaster Credit Management System. The records provide the relevant disaster declaration IDs, property type, ZIP code, city, county, state, verified losses (in real estate and in “content”), and approved loan amounts. Previously: SBA datasets the Paycheck Protection Program (DIP 2020.07.08) and the administration’s 7(a) and 504 loan programs (DIP 2023.01.11). [h/t Benjamin L. Collier et al.]

US buildings. “Leveraging high performance computing, remote sensing, geographic data science, machine learning, and computer vision,” Hsiuhan Lexie Yang et al., researchers at Oak Ridge National Laboratory, have “partnered with Federal Emergency Management Agency (FEMA) to build a baseline structure inventory covering the US and its territories to support disaster preparedness, response, and recovery.” The dataset and interactive map trace the outlines of 125 million buildings and, in many cases, contain the building’s address, occupancy class, usage type, height, elevation, and other attributes. They also provide information about the imagery used to identify the structure.

The cost of sustenance. The UN World Food Programme’s Fill the Nutrient Gap initiative conducted a series of analyses in 2015 through 2021 to “calculate the costs of energy-sufficient and nutrient-adequate diets and the percentage of households that were unable to afford each diet.” In a recent paper, Zuzanna Turowska et al. describe the analyses’ methodology and share their results as a dataset. For each of the 37 countries analyzed, the dataset contains one row per geographic unit, timeframe, and type of household member; each row provides the cost and unaffordability estimates for that category.

German election results. GERDA, a new project by Vincent Heddesheimer et al., “provides a comprehensive dataset of local, state, and federal election results in Germany.” The results go back to 1953 for federal elections, to 1990 for local elections, and to 1996 for state elections. The files indicate each geographic unit’s number of eligible voters, actual voters, valid votes, invalid votes, and vote shares by party. The authors have also created “geographically harmonized datasets that account for changes in municipal boundaries and mail-in voting districts.”

Synths. Iftah Gabbai is building a dataset of “hardware synthesizers, samplers, and drum machines” produced since 1896, “compiled through a mix of automated and manual processes, combined with extensive research.” For each of the 2,300+ devices identified, the dataset indicates its name, brand, release year, years in production, device type (synth, sampler, et cetera), form factor, architecture, synth engine used, number of keys, key type, oscillator count, and more. Learn more: Gabbai’s introductory video. [h/t Stefan Bohacek]