Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2021.06.30 edition

Medicaid drug utilization, ranked-choice voting, police activity in 2002 Gujarat news coverage, Android permissions, and the cat’s meow.

Three decades of Medicaid prescriptions. US law requires state Medicaid agencies to report the quarterly number of outpatient prescriptions, total units, and reimbursement costs for each permutation of each drug they’ve covered. The federal Medicaid program’s State Drug Utilization Data makes those records — which spanned nearly 5 million rows in 2020 alone — available as state-level and national files going back to 1991. Related: The National Drug Code Directory, which “contains product listing data submitted for all finished drugs including prescription and over-the-counter drugs, approved and unapproved drugs and repackaged and relabeled drugs.” [h/t Michael Q. Maguire]

Ranked-choice voting, continued. FairVote, an organization that advocates for ranked-choice voting, has gathered the results of hundreds of elections that used those rules. The spreadsheets capture both single-winner and multi-winner elections in 26 jurisdictions between 2001 and 2021 — not yet including last week’s New York City primaries, whose results won’t be finalized until all absentee ballots are processed. Previously: ranked.vote, which provides detailed diagrams and data on a smaller number of elections (DIP 2020.05.27).

Police activity in 2002 Gujarat news coverage. More than 1,000 people died in the inter-communal violence that erupted in Gujarat, India, in early 2002. A team of political and computational scientists recently trained students to annotate 21,000+ sentences from 1,257 contemporaneous articles about the events published in the Times of India, asking them to categorize whether police officers used force, killed someone, made arrests, failed to intervene, and/or took any other action. The resulting dataset includes the raw annotations as well as final sentence- and document-level classifications. [h/t Katherine A. Keith]

Android permissions. Developer Gautham Prakash has built a dataset of the device permissions requested by more than 1 million Android apps in the Google Play marketplace. The permissions include the ability to make calls, read the phone’s contacts, record audio, get the phone’s precise location, know what other apps are running, and dozens more.

The cat’s meow. For a 2019 study, University of Milan researchers collected and analyzed 440 recordings of “meows emitted by cats in different contexts”: when brushed by their owners, when isolated in an unfamiliar environment, and when waiting for food. [h/t Duncan Geere]