Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2015.12.02 edition

Historical climate data, mass shootings, faster open-data downloads, college sports financing, and celebrity faces.

Historical climate data. The National Centers for Environmental Information maintains more than 20 petabytes of data, it says. Among the most useful slices is the Global Historical Climatology Network’s data, which aggregates reports on temperature, precipitation, wind, and more from tens of thousands of climate-monitoring stations around the world. One tidbit: January 1995 was Death Valley’s wettest month since at least the 1960s, with a whopping 2.59 inches of precipitation.

Mass shootings in America. ShootingTracker.com provides datasets listing all U.S. mass shootings — defined as  “when four or more people are shot in an event, or related series of events” — since 2013. So far in 2015, mass shootings have killed 447 people and wounded an additional 1,292.

A faster way to download open data. Socrata’s software powers open-data portals around the world. But downloading large datasets — e.g., this 2.8-gigabyte dataset of NYC parking tickets — from Socrata-powered portals can feel, well, sluggish. One solution: OpenDataCache.com, a free website that provides faster-to-download versions of virtually every dataset from 50+ Socrata portals. Related: Thomas Levine’s detailed analyses of Socrata-powered portals,  published in 2013 and 2014. [h/t John Krauss and Steven Romalewski]

College sports financing. The Huffington Post and Chronicle of Higher Education teamed up to investigate how colleges bankroll their athletics. (Georgia State, for example, spent more than $100 million subsidizing sports between 2010 and 2014, mostly via student fees.) The report, published last week, draws on five years of revenue/expense reports from 234 Division I public universities. You can download the raw data or explore it online. Related: The Washington Post also tackled this topic — from a slightly different angle — last week, examining the profitability (or lack thereof) of athletic programs at 48 schools. [h/t Shane Shifflett]

Celebrity faces, annotated. The CelebA dataset, published in September, contains 200,000+ images of 10,000+ celebrities, each annotated with 40 yes/no variables. Some favorites: “5_o_Clock_Shadow,” “Bags_Under_Eyes,” and “Goatee.”