Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2024.12.04 edition

100 million places, education policies around the world, China leaders’ foreign visits, nanosatellites, and Pixar films.

100 million places. Foursquare has released an open dataset describing more than 100 million points of interest across 200+ countries. For each place, the dataset includes its name, address, latitude/longitude, date entered, date updated, date marked closed, telephone number, website, email address, and relevant categories. Among the many possible labels: casino, comedy club, 300+ kinds of restaurants (e.g., deli, diner, Korean BBQ, “mac and cheese joint”), and 100+ types of retailers (e.g., candy store, used car dealership, shopping mall). Learn more: Some initial explorations from Tim Wallace and from Simon Willison. Previously: The Overture Maps Foundation’s datasets (DIP 2023.08.09), including information about 53 million places. [h/t Derek M. Jones + Sharon Machlis + Giuseppe Sollazzo]

Education policies around the world. Adrián del Río et al. “introduce a global dataset on education policies and systems across modern history,” with “measures on compulsory education, ideological guidance and content of education, governmental intervention and level of education centralization, and teacher training.” The dataset covers 157 countries annually from 1789 to 2020. The questions answered by the team’s evaluators include, for example, “How many years of schooling are required by compulsory education?”, “Are there any national laws in place that ban specific subjects or topics in school?”, and “Which entities operate secondary schools?”

China leaders’ foreign visits. Yu Wang and Randall W. Stone’s China Visits dataset records 400+ visits by China’s presidents and premiers to 100+ countries between 1998 and early 2020. To compile it, the authors consulted official reports, web search results, and relevant Wikipedia pages. For each visit, the dataset indicates its starting and ending date, Chinese leader, foreign country, broader meeting (e.g., those of the Shanghai Cooperation Organisation, and source URL.

Nanosatellites. Space systems engineer Erik Kulu’s Nanosats Database tracks 4,000+ nanosatellites that have been launched into space, are planned for future launch, or have had their launches cancelled. The data for each satellite include its mission name and description, launching organization and country, mass/unit size, launch date, and status. Additional tables provide lists of CubeSat companies, launch providers, costs, and more. [h/t Ahmad Assem]

Pixar films. Software engineer Eric Leung built and maintains a dataset and R package providing structured information about every Pixar film — from 1995’s Toy Story to 2024’s Inside Out 2. It lists each film’s creators (storywriters, screenwriters, directors, composers, and producers), budget, box-office earnings, aggregate critic ratings, Oscar nominations and wins, and more. [h/t Josh Laurito]