Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2024.11.13 edition

Tariff, climate summit attendees, substance abuse treatment, waves, and NYC marathon finishers.

Tariffs. The United States International Trade Commission maintains annual datasets of US import tariffs going back to 1997. The datasets include each impacted product’s eight-digit Harmonized Tariff Schedule code, a brief description, the duty rate, rate type, effective and ending dates, and more. The commission also publishes a tariff search tool and data on upcoming tariff rates. More globally, the World Trade Organization provides tools to query and download data about its members’ tariffs, as well as databases of regional trade agreements and preferential trade agreements. Previously: Trade policy intervention data from Global Trade Alert (DIP 2022.01.19).

Climate summit attendees. Daria Blinova et al. have built a dataset of 310,000+ attendees of United Nations climate summits. The data, largely compiled from PDFs of attendance rosters, include each attendee’s year and meeting attended, name, job title, affiliation, delegation, delegation type (party, observer state, intergovernmental organization, NGO), gender, and more. In all, the attendees span 27,000+ delegations across three decades of COP and predecessor summits. Read more: “This Is 29 Years of International Climate Summits, Visualized,” by The New York Times’ Mira Rojanasakul.

Substance abuse treatment. The Substance Abuse and Mental Health Services Administration’s Treatment Episode Data Set records admissions to, and discharges from, substance abuse centers in the US. The public-use datasets, which span several decades, are based on records collected by state agencies. They include each patient’s demographic information, state, metro/micro area, referral source, treatment type, substances used, frequency of use, age at first use, number of previous treatment episodes, among other details. Related: The administration’s National Survey of Substance Abuse Treatment Services, “an annual census of treatment facilities.” [h/t Conor Lennon et al.]

Waves. The Coastal Data Information Program, launched in the 1970s by a research group at the Scripps Institution of Oceanography, “is an extensive network for monitoring waves and beaches along the coastlines of the United States.” The program provides a map of its stations, a table of recent observations, a catalog of real-time and historical wave measurements, and an extreme wave tracker. As seen in: Dion Häfner et al.’s “FOWD: A Free Ocean Wave Dataset for Data Mining and Machine Learning.”

NYC marathon finishers. New York Road Runners publishes a searchable database of all races it has organized since 1970 — the year of NYC’s first marathon — and all finishers of those races. Data Is Plural reader Joe Hovde has scraped the results of the 2024 marathon into a downloadable spreadsheet. Each row represents one of the 55,000+ finishers and provides their name, bib number, age, gender, city, state, country, time ran, and place finished. Read more: “Marcelo & Karolina, the Fastest Names in the NYC Marathon,” by Hovde.