Web Data
Entity Matching datasets
Ranking
Location data
Politics
Misc
US
India
Social economic data
Geolocation to census and other geolocation data
Religion, Ethnicity, Demographics
World Events
Personality prediction data
Economics data
Misc
Name datasets
Manually curated name gender list from multiple countries Originally introduced here and here - Python package
Large name dataset python program
Name embeddings
NYC baby names by gender and mother's ethnicity Also available here
Baby Names map (100,000 unique names from 14 different countries)
Name master file latest
Frequent first and last names in US census 1990
US first and last names list by gender and race
First name and last name data for Dutch, English, Portugese, Italian, and Spanish
Names from different ethinicities used in bias evaluation
Gender by language wikidata
US congress members name and gender
Name pronounciation of US congress members
Name gender race data from DBPedia SPARQL
Name gender race data frok Wikidata SPARQL
List of Caucasian actors
List of Caucasian actresses
List of African-American actors
List of African-American actresses
Wiki List of African-American actors
Wiki List of American actresses
Artists in MoMA collection and their names/nationality
Swedish names by gender/year with at-least 10 individuals
Given names by culture and lanugage from Wikipedia
Names of all politicians
IMDB artists identified if they are actor or actresses
List of Olympic athlete names mapped with ethnicity and country
140k name/ethnicity associations
First names, gender, and year in Indian electoral roll data
Indian names with ethnicity, religion, gender extracted from SimplyMarry and CBSE data
Name Nationalities from Wikipedia Categories and classifier to predict these
WikiTree Data Dump (24M geneologies)
List of names and surnames for Dutch, English, Portuguese and Spanish
Chinese Names
Details and relations between names: https://www.name-doctor.com/name-volodenka-meaning-of-volodenka-55917.html
NameGrapher - Explore the historical popularity of United States baby names
Baby Name Atlas: The Most Popular Names Around the World
French Baby Names
Russian names with gender
Family History Resources from Forebears.io
Name-Based Gender Classification (36 distinct sources—spanning over 150 countries and more than a century) - Github - Software
EthniColr: Predict Race and Ethnicity Based on the Sequence of Characters in a Name
World Gender Name Dictionary
Genni + Ethnea for the Author-ity 2009 dataset
demographicx: A Python package for estimating gender and ethnicity using deep learning transformers
List of first names, genders and country-specific frequencies
Validated Names for Experimental Studies on Race and Ethnicity
JRC-Names is a highly multilingual named entity resource for person and organisation name - RDF
A Brief History of Human Time - Cross-verified Dataset
- Includes name, year, gender
Images
Stream processing data
Controversial topics
Crowdtruth NLP datasets
Humor data
Audio
Driving data
Sports
Cricket stats
NFL
Food
Portal:Food - Wikipedia
Cookbook:Chiles - Wikibooks, open books for an open world
Category:Ingredients - Wikibooks, open books for an open world
Category:Recipes - Wikibooks, open books for an open world
Recipe - Schema.org Type
GitHub - cosylabiiit/recipe-knowledge-mining - NER for recipe - [2004.12184] A Named Entity Based Approach to Model Recipes
RecipeDB: a resource for exploring recipes - NER Dataset from RecipeDB - recipedb - A resource for exploring recipes
cosylabiiit/Recipedb-companion-data
GitHub - cosylabiiit/recipe-knowledge-mining
Training Recipe Ingredient NER with Transformers
Tasty.co - Each recipe page has LD+JSON data for recipe
Recipes from Tasty | Kaggle
RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation - ACL Anthology - RecipeNLG - HuggingFace
SHARE: a System for Hierarchical Assistive Recipe Editing
lishuyang/recipepairs - Datasets at Hugging Face
Open Recipes: https://huggingface.co/datasets/napsternxg/openrecipes-20170107-061401-recipeitems - Github
Wikibook - Cookbook:Table_of_Contents
Wikipedia Commons - Category:Nutrition
Wikipedia Commons - Category:Food_and_drink
Wikipedia Commons - Category:Beverages
Wikipedia Commons - Category:Food
🪐 spaCy Project: Analyzing how mentions of ingredients change over time (Named Entity Recognition)
TASTESet recipe NER model and dataset - TasteSet 2.0 - 1K annotated - Entities Linked to https://foodon.org/ - Spacy NER Training
FINER: Food Ingredient NER Dataset - Paper: SMPT: A Semi-Supervised Multi-Model Prediction Technique for Food Ingredient Named Entity Recognition (FINER) Dataset Construction
NYTimes - CRF Ingredient Phrase Tagger
CulinaryDB - Data Analytics for World Cuisines
FoodData Central - Datasheet
Food and Nutrient Database for Dietary Studies - Food ingredients
LexMapr - A Lexicon and Rule-Based Tool for Translating Short Biomedical Specimen Descriptions into Semantic Web Ontology Terms
FoodBase corpus: A new resource of annotated food entities
OpenFoodFacts - Used by Wikidata - Datasets - Translations
Kaggle Whats Cooking - recipe-ingredients-dataset dataset
Recipe box - Structured recipes scraped from food websites - Code
KitchenScale - Paper
TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark - Paper
recipe-scrapers
Recipes5K
Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images
Nutrition5k
Food 101
Assorted, Archetypal, and Annotated Two Million (3A2M) Cooking Recipe Dataset
FINER: Food Ingredient NER Dataset
A Named Entity Based Approach to Model Recipes
Common Crawl Web Dumps JSONLD - Includes Recipes - Common Crawl JSONLD Dumps
Langual TM - Thesaurus for Food used in Foodon
Food.com Recipes and Interactions - Reviews - Github
TasteAtlas - Indexed by Wikidata - Gist
Food.com Recipes with Search Terms and Tags
Fruit and Vegetable Prices
Data Products - Economic Research Service - U.S. DEPARTMENT OF AGRICULTURE
Department of Agriculture - Data.gov
Food-a-pedia from HealthData.gov - Find the calorie content of any food or beverage using the Food-a-pedia
SuperTracker - source code and foods database
MyPlate - U.S. DEPARTMENT OF AGRICULTURE - Multilingual
MyPlate Kitchen
The Open Food Repo API
AICrowd: Food Recognition Challenge
MyFoodRepo: a full-stack system for professional nutrition tracking
Food Ingredient Lists: A list of 10,000 food products and their ingredients
What's On The Menu? - Dataset on historical menus, dishes, and dish prices - From New York Public Library
Epicurious - Recipes with Rating and Nutrition
AUSNUT 2011-13 was developed to enable food, dietary supplement and nutrient intake estimates from the 2011-13 Australian Health Survey (AHS)
Australian Food and nutrients databases
Australian Branded Food Database
The New Zealand Food Composition Database
Automated Cuisine Classification of Recipes
Food and Agriculture Organization of United Nations - Food and Agriculture Statistics
FAOSTAT Food and agriculture data provides free access to food and agriculture data for over 245 countries and territories and covers all FAO regional groupings from 1961 to the most recent year available
Food-101N Dataset - 310,009 images of food recipes classified in 101 classes (categories)
Food-101 – Mining Discriminative Components with Random Forests
FOODD: FOOD DETECTION DATASET FOR CALORIE MEASUREMENT USING FOOD IMAGES
ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network
UNIMIB2015 Food Database - Food Recognition
Food524DB is the largest publicly available food dataset with 524 food classes and 247,636 images by merging food classes from existing datasets in the state of the art
E-Commerce