Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2024.08.21 edition

H-1B lotteries, multinational corporations, California residential water supply, UK grantmakers, and Olympic medalists.

H-1B lotteries. A recent Bloomberg News investigation into the US government’s annual H-1B lottery, a key step in allocating the country’s skilled-worker visas, finds that “thousands of companies got an unfair advantage by helping themselves to extra lottery tickets.” To reach those conclusions, the team “obtained data on all H-1B lottery registrations, selections, and petitions for fiscal years 2021 through 2024 after bringing a lawsuit against the Department of Homeland Security under the Freedom of Information Act.” They’ve shared the records, which indicate each registration’s employer, as well as the proposed beneficiary’s gender, nationality, and birth year. For registrations that led to visa petitions, the data include additional details, such as the worksite, salary, job title, and beneficiary’s field of study. [h/t Eric Fan]

Multinational corporations. The Multinational Enterprise Information Platform, a collaboration between the OECD and the UN Statistics Division, provides publicly sourced data on the 500 multinational corporations with the largest market capitalization. Its “Global Register” dataset examines the companies’ structure, listing each subsidiary’s name, parent company, address, alternative names, and various unique identifiers. The “Digital Register” dataset lists all known web domains controlled by each company and assessments of those domains’ popularity. The platform’s “Media Monitor” feature, although not downloadable, links to news articles and other webpages mentioning the companies. [h/t Annie Burns-Pieper]

California residential water supply. Marie-Philine Gross et al.’s dataset of residential water demand and supply in California includes the monthly volumes of water produced/sold by 404 of the state’s water suppliers, covering 2013–2021. The researchers extracted, standardized, and cleaned the data from the state’s mandatory annual reports, which collect thousands of data points from each supplier. They also added contextual information, such as climatic data (monthly local precipitation, temperature, and drought severity) and each supplier’s hydrologic region.

UK grantmakers. The UK Grantmaking initiative “is a unique cross-sector collaboration between” several major organizations in the field. Their downloadable dataset provides information about 12,000+ trusts, foundations, charities, and other grantmakers for financial year 2022-23, based on records from government regulators. The dataset lists each organization’s name, government-assigned ID, location, category, registration date, income, spending totals, net assets, and more. Previously: UK grants via 360Giving (DIP 2018.12.05). [h/t Giuseppe Sollazzo]

Olympic medalists. The European Data Journalism Network’s Giorgio Comai has used Wikipedia and Wikidata to create a series of datasets listing the name, birth date, sex, and birthplace of Summer Olympic medalists. Comai has mapped the birthplace coordinates and, for Europe-born medalists, linked them to their NUTS regions. The project focuses on the 2024 and 2020 Summer Olympics but also provides provisional data for other recent iterations. [h/t Federico Caruso]