Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2024.07.31 edition

Ballots cast, hurricane evacuation orders, networked scholarship, people surveyed since 1979, and NPR’s weekly quiz show.

Ballots cast. “Electronic records of actual ballots cast (cast vote records) are available to the public in some jurisdictions,” Shiro Kuriwaki et al. write. “However, they have been released in a variety of formats and have not been independently evaluated.” So the researchers have constructed a standardized dataset representing 40.7 million (anonymous) ballots in the November 2020 general election, spanning 352 counties across 20 states. Each of the 160 million rows corresponds to a voter’s choice in a particular race and indicates the precinct, legislative district, office in question, candidate selected, and candidate’s party. The initial release, which the authors use to analyze ticket-splitting patterns, covers votes for president, Senate, House, governor, and state legislature. [h/t Derek Willis]

Hurricane evacuation orders. Harsh Anand et al.’s Hurricane Evacuation Order Database “is a comprehensive and standardized database of evacuation orders issued by state and local government officials in response to the hurricanes that impacted the United States between 2014 and 2023.” To build it, the authors combed through government websites, official social media, news reports, and other sources. The database covers 27 storms and several types of announcements: state-of-emergency declarations, mandatory evacuations, voluntary evacuations, and the lifting of those orders. For each announcement, the database indicates the order type, date/time announced, date/time effective, counties affected, and evacuation area.

Scholarship, networked. OpenAlex, “a free and open catalog of the global research system,” has compiled data on more than 250 million scholarly works — and has linked those works to structured information about their authors, institutions, publishers, funders, topics. The data are available to search online, to download in bulk, and via API. As seen in: Aliakbar Akbaritabar et al.’s “Bilateral flows and rates of international migration of scholars for 210 countries for the period 1998-2020” and Philippe Mongeon et al.’s dataset of scholars’ Twitter/X usernames.

People surveyed since 1979. The Bureau of Labor Statistics’ NLSY79 survey has interviewed the same people dozens of times since 1979. It began with a “nationally representative sample of 12,686 young men and women”; more than four decades later, 6,000+ interviewees are still responding to the project’s biennial inquiries. The survey asks about a range of topics, including education, employment, health, dating, marriage, children, attitudes, and substance abuse. Public-use data are available to download and through the agency’s NLS Investigator tool. Related: The agency’s other national longitudinal surveys. [h/t Prashant Bharadwaj et al.]

Wait Wait. Linh Pham considers himself “the unofficial scorekeeper” of Wait Wait… Don’t Tell Me!, NPR’s weekly quiz show. Since 2007, he’s been maintaining a structured database that describes Wait Wait’s episodes, venues, hosts, guests, panelists, and more. Pham provides the data via API, and also publishes charts and automated reports, such this list of panelists who won their debut appearances. [h/t Cody Winchester]