Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2022.08.31 edition

Billion-dollar disasters, business formations, attempted repairs, Luftwaffe locations, and a fern tree of life.

Billion-dollar disasters. As “the Nation’s Scorekeeper in terms of addressing severe weather and climate events in their historical perspective,” the US National Centers for Environmental Information maintains an inventory of the most costly such disasters in the US — those that have caused at least $1 billion in estimated direct losses. The quarterly-updated dataset contains more than 330 severe storms, floods, droughts, wildfires, freezes, and other extreme events since 1980. You can download, filter, and sort the list (by disaster type, start/end dates, inflation-adjusted cost, and total deaths), as well as map, chart, and summarize it. [h/t Gary Price]

Business formations. To compile its Business Formation Statistics, the US Census Bureau analyzes several sources, including every application to the IRS for an Employer Identification Number (EIN) and those applicants’ first payroll tax filings. This allows the Bureau to provide monthly counts of business applications and formations by business type, industrial sector, and state. They also publish weekly datasets of application counts by state and an annual dataset that drills down to individual counties; both, however, lack business formation counts and other details found only in the monthly files. [h/t John C. Haltiwanger]

Attempted repairs. The Open Repair Alliance, “an international group of organisations committed to working towards a world where electrical and electronic products are more durable and easier to repair,” is developing an open standard for sharing data about those repairs. So far, they’ve gathered 62,000+ records from five partners. Each entry represents a repair session: its date and country, the product’s brand and category, the repair status, a description of the problem, barriers to repair, and more.

Luftwaffe locations. Data scientist Sam Weiss has constructed a dataset tracking the World War II movements of the Luftwaffe, Nazi Germany’s air force. The information, scraped and geocoded from the Luftwaffe-history website ww2.dk, includes monthly locations and aggregate statistics (total size, additions, losses) by aircraft type and unit. Read more: A blog post and Twitter thread from Weiss.

Fern phylogenetics. Joel H. Nitta et al.’s Fern Tree of Life uses “a mostly automated, reproducible, open pipeline” to convert fern DNA sequences from the National Institutes of Health’s GenBank into an interactive, browsable, and downloadable evolutionary tree. It currently covers 5,500+ species, from Abacopteris aspera to Zealandia vieillardii. [h/t Santiago Ramírez Barahona]