h4cker/ai_research/ML_Fundamentals/ml_ai_datasets.md

63 lines
3.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Datasets for AI / ML Research
1. **UCI Machine Learning Repository**: A collection of databases, domain theories, and data generators widely used by the machine learning community.
Website: [UCI ML Repository](https://archive.ics.uci.edu/ml/index.php)
2. **Kaggle Datasets**: Offers a wide variety of datasets in different domains including economics, biology, computer vision, and natural language processing.
Website: [Kaggle](https://www.kaggle.com/datasets)
3. **AWS Public Datasets**: Amazon Web Services offers a variety of public datasets that anyone can access.
Website: [AWS Public Datasets](https://registry.opendata.aws/)
4. **Google Dataset Search**: A tool that enables the discovery of datasets stored across the web.
Website: [Google Dataset Search](https://datasetsearch.research.google.com/)
5. **Microsoft Research Open Data**: A collection of free datasets from Microsoft Research to advance state-of-the-art research in areas such as natural language processing, computer vision, and domain-specific sciences.
Website: [Microsoft Research Open Data](https://msropendata.com/)
6. **OpenML**: An online platform for collaborative machine learning - easily share data, models, and experiments.
Website: [OpenML](https://www.openml.org/)
7. **Data.gov**: The home of the U.S. Governments open data, providing data, tools, and resources.
Website: [Data.gov](https://www.data.gov/)
8. **EU Open Data Portal**: Provides access to an expanding range of data from the European Union institutions and other EU bodies.
Website: [EU Open Data Portal](https://data.europa.eu/euodp/en/home)
9. **Awesome Public Datasets on GitHub**: A collection of high-quality open datasets in public domains.
GitHub Repository: [Awesome Public Datasets](https://github.com/awesomedata/awesome-public-datasets)
10. **World Bank Open Data**: Free and open access to global development data.
Website: [World Bank Open Data](https://data.worldbank.org/)
11. **CERN Open Data Portal**: Provides access to data generated by the Large Hadron Collider and other CERN experiments.
Website: [CERN Open Data Portal](http://opendata.cern.ch/)
12. **National Aeronautics and Space Administration (NASA)**: Offers a wide range of datasets related to space and Earth sciences.
Website: [NASA](https://data.nasa.gov/)
13. **NOAA Data Sets**: Provides access to national and global data on climate, weather, oceans, and coasts.
Website: [NOAA](https://www.noaa.gov/data)
14. **ImageNet**: A dataset of over 15 million labeled high-resolution images across 22,000 categories.
Website: [ImageNet](http://www.image-net.org/)
15. **COCO (Common Objects in Context)**: A dataset with millions of images containing objects in complex scenes with annotations.
Website: [COCO Dataset](https://cocodataset.org/)
16. **Wikipedia: List of datasets for machine-learning research**: A wikipedia article providing a comprehensive list of datasets for machine-learning research. Website: [Wikipedia List](https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research)
17. **Natural Earth Data**: Offers free vector and raster map data at various scales.
Website: [Natural Earth Data](https://www.naturalearthdata.com/)
18. **Reddit Datasets**: A subreddit that has datasets made available by the Reddit community.
Website: [Reddit Datasets](https://www.reddit.com/r/datasets/)
19. **Quandl**: Provides financial, economic, and alternative datasets.
Website: [Quandl](https://www.quandl.com/)
20. **Stanford Large Network Dataset Collection**: A collection of large network datasets including social networks, web graphs, etc.
Website: [Stanford Network Analysis Project](http://snap.stanford.edu/data/index.html)
These sources offer a wide range of datasets from various domains, and you can explore them based on your specific requirements and interests in machine learning.