Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2021.11.03 edition

Internet shutdowns, concealed carry licensing, Amazon search results, FOIA reading rooms, and spiders.

Internet shutdowns. A coalition organized by Access Now, a nonprofit that “defends and extends the digital rights of users at risk around the world,” has been gathering data on intentional internet shutdowns. It has identified 155 shutdowns in 2020 and 50 in January–May 2021, based on information from a range of sources, including news reports and other trackers, such as the India-focused internetshutdowns.in. The datasets indicate the type of shutdown, start and end dates, geographic scope, who ordered it, public justifications, affected networks, and more. As seen in: “Internet shutdowns have become a weapon of repressive regimes” (The Economist).

Concealed carry licenses. To construct his Concealed Carry Weapons License Database, sociologist Trent Steidley spent “over a year collecting data from 28 states using public records requests and cleaning into a state and county-year format.” The published files include the raw data, cleaned data, and documentation. The details vary by state, but can include the number of licenses held, issued, denied, revoked, and/or suspended, among other statuses; in some instances, the numbers are also disaggregated by demographic. For most states, the records stretch back to the early 2000s, some even earlier.

Amazon search results. A recent Markup investigation “found that Amazon places products from its house brands and products exclusive to the site ahead of those from competitors — even competitors with higher customer ratings and more sales, judging from the volume of reviews.” Reporters Adrianne Jeffries and Leon Yin published their methodology, as well as the underlying code and data, which includes product-placement information relating 12,000+ search queries, details about 157,000+ products, raw HTML, and more.

FOIA reading rooms. Data librarian Lisa DeLuca has compiled a spreadsheet of 300+ Freedom of Information Act libraries, the online reading rooms where federal agencies must post certain records, including those that “are likely to become the subject of subsequent requests for substantially the same records.” DeLuca’s spreadsheet, originally published in 2019 and updated last week, lists the agency’s name, its parent agency, and the portal’s name and URL. [h/t Mago Torres]

Spiders. The World Spider Catalog is “the first fully searchable online database covering spider taxonomy,” with a bulk dataset that lists 49,000+ species, their geographic distributions, and author-year citations. Stano Pekár et al.’s World Spider Trait database collates “individual measurements, observations, or composite characteristics,” such as body length, web diameter, and number of egg sacs produced.