2019.09.04 edition
Malaria geography. The University of Oxford’s Malaria Atlas Project collects, models, and publishes a range of datasets related to the mosquito-borne disease, including localized incidence rates. You can explore and download the data, layer by layer, through the project’s interactive map. [h/t Clara Burgert-Brucker]
CAR refugees. The Central African Republic’s ongoing civil war has pressed more than 600,000 people to flee the country. The violence has also internally displaced another 600,000 people, a phenomenon that the UN’s Humanitarian Data Exchange has been tracking. In addition to counts of internally displaced people by locality, the UN’s datasets include a listing of refugee sites and the country’s road network. Related: A multimedia presentation of one family’s 600-kilometer journey in search of safety. [h/t Becky Band Jain]
Publicly funded patents. The 3PFL dataset — Patents and Publications with a Public-Funding Linkage — lists more than 13,000 US patents that have acknowledged federal funding. The dataset, accompanied by a detailed methodology, also links the patents to details about the funding, as well as to scientific publications that stemmed from it. Previously: Patent geography (DIP 2019.07.31). [h/t Gaétan de Rassenfosse]
Drama. The Drama Corpora Project has collected and processed more than 800 plays in German, Greek, Spanish, Russian, Latin, and English. For each play, the project provides a structured-data version of the text, a network diagram, speech distribution metrics, plus several other files and features. [h/t Lynn Cherny]
Rah, rah, rah! Fight, fight, fight! FiveThirtyEight has built a dataset of 65 college football fight songs, which contains each song’s name, authors, year written, tempo, duration, and whether it includes various tropes, such as spelling out words or mentioning the school’s colors. Related: FiveThirtyEight’s “Guide To The Exuberant Nonsense Of College Fight Songs,” where you can listen to the songs, read the lyrics, and explore an interactive chart of tempo versus duration.