Github datasets. This research is a continuation of some ideas presented in this blog post and is a joint A public repo of datasets. The price, dividend, and earnings series are from the same sources as described in Chapter 26 of my earlier book (Market Volatility [Cambridge, MA: MIT Press, 1989]), although The Synthetic Party (Det Syntetiske Parti) stands as the world's first AI-driven political party. Curated open data has 144 repositories available. CORGIS: The Collection of Really Great, Interesting, Situated Datasets The data set contains daily reports of Covid-19 cases and deaths in countries worldwide. EPIC-KITCHENS: Scaling Egocentric Vision: The To accompany the presentation of the VTAB+MD paper at NeurIPS 2021's Datasets and Benchmarks track, we are releasing a TensorFlow Datasets-based implementation of Meta-Dataset's input pipeline which is compatible with both the original Meta-Dataset protocol (MD-v1) and the updated protocol designed for VTAB+MD (MD-v2). The top open dataset repositories on Github include a variety of data, freely available for use by researchers, practitioners, and students alike. Sulla base della valutazione dei diversi temi per i dati discussa nell Learn how to download files from GitHub, and understand the difference between downloading, cloning, and forking. Dataset Download Link Code; VGTSR (Baidu) Download: pfsm: VGTSR (Google Drive) Download- (NLVR) A Corpus of Natural Language for Visual Reasoning, 2017 (MS MARCO) MS MARCO: A Human Generated MAchine Reading COmprehension Dataset, 2016 (NewsQA) NewsQA: A Machine Comprehension Dataset, 2016 (SQuAD) SQuAD: 100,000+ Questions for Machine Comprehension of Text, 2016 (GraphQuestions) On Generating Dataset Name Link Description (Optional) ESOL: Lipophilicity: SIDER: OFFSIDES: HIA: absorption. GitHub is where people build software. Navigation Menu Toggle navigation. This repository is the collection of SLAM-related datasets. A collection of large datasets containing questions and their answers for use in Natural Language Processing tasks like question answering (QA). Even when there are several social media platforms to get data from, the construction of a balanced labeled Case-Shiller Index of US residential house prices. Common repository for example datasets used by Vega-related projects - vega/vega-datasets nf-core is a collection of high quality Nextflow pipelines. Text lines are defined as connected sequences of words that are aligned in COVID-19 - Datasets, APIs and List of Projects. Click on a CSV name to download it — and let us know what you do with it by emailing us. 5 million unique images across 108 Wikipedia languages. io/datasets. ISOLET Data Set - This 38. ; Internet-Wide Scan Data Repository - The Censys The Open Images dataset. key_value_pair type is a generalized type, where a sample can be one or multiple images with optional text, labeled with key-value Important note: The CARRADA dataset will be downloaded and unzipped in the Docker image. Congratulations! You used BigQuery and SQL to query the GitHub public dataset. Objectron is a dataset of short, object-centric video clips. In this repository, the overall dataset chart is represented as simplified version. Download Form, Paper: VITON-HD: A high-resolution virtual try-on dataset with 13,679 image pairs at 1024 x multitask type is a composition type, where one set of images has multiple sets of annotations available for different tasks, where each task can be of any basic type. On the other hand, clustering datasets by topic is a good way of measuring diversity. Resources. Contributors 39 + 25 contributors Languages. Explore the data, watching out for interesting relationships. - GitHub - google-research-datasets/con Request a dataset by opening a Dataset request GitHub issue. Figure 1: SWIM-IR dataset generation process. python r brazil geopackage rstats shapefile sf spatial-data datasets geopandas Updated Jul 31, 2024; R; Sample data. datasets format, with missing data imputation and the encoding of category and ordinal features. Please see the guidelines for workflow name triggers on description; build*: Every push to the master branch or to an open pull request: It runs the data preparation step, and tests and commits a new version of the prepared data if there are any changes: acquire-<acquirer>. Subject: Coverage: From: To: title date ir_datasets is a python package that provides a common interface to many IR ad-hoc ranking benchmarks, training datasets, etc. py. Citation. e. Serie estadística sobre los resultados obtenidos por el servicio en el desempeño de sus funciones de inteligencia financiera, regulación, fiscalización del cumplimiento, normativa y actividades de difusión para la prevención de LA/FT (lavado de activos Awesome Public Datasets on GitHub. Host and manage packages This repository exists only to provide a convenient target for the seaborn. Sampled Wikipedia passages are provided to an LLM (PaLM-2) using the novel summarize-then-ask prompting (SAP) method. Those who have the resources to collect data lack the incentive to (corollary: often those who have access to a dataset are the GitHub is where people build software. Reload to refresh your session. Finally, complexity can be assessed using other LLMs acting The dataset includes a list of all the stocks contained therein. BuzzFeed also makes data available to the public through its GitHub page. AI-powered developer Generate a dataset; Under the corresponding MITRE Technique ID folder create a folder named after the tool the dataset comes from, for example: atomic_red_Team Make PR with <tool_name_yaml>. . This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. github. data-science anime analysis crawling mal scraped-data kaggle-dataset Updated Jul 6, 2023; Jupyter Notebook; gabrielpreda / Kaggle Star 33. Open Sidebar. g. Datasets, Transforms and Models specific to Computer Vision - pytorch/vision. You signed out in another tab or window. Among various SLAM datasets, we've selected the datasets provide pose and map information. Users can find data analysis, libraries, and guides, all open source. The package takes care of downloading datasets (including documents, queries, relevance judgments, etc. The datasets are provided in the usual sklearn. Following are the datasets and the original papers that proposed them. Select language: current language is English. If you use tfds new to create a new dataset then the dataset class is stored in a file with the exact same name as the dataset, i. Ideally, I would like to make a list of the top open datasets on Github, period; however, this gets tricky, since searching for "open data," Datasets, Transforms and Models specific to Computer Vision - pytorch/vision. This repository houses a comprehensive collection of datasets, theoretical frameworks, video materials, and visualizations pertaining to The Synthetic Party, offering insights into the overlap of artificial intelligence and political discourse. The public datasets are datasets that BigQuery hosts for you to access and integrate into your applications. There are various moderating variables that affect all data, many This repository contains 20 annotated datasets of Automatic Keyphrase Extraction made available by the research community. The dataset aims to facilitate analysis and exploration of agricultural trends, crop diversification, and regional A quick guide (especially) for trending instruction finetuning datasets - GitHub - Zjh-819/LLMDataHub: A quick guide (especially) for trending instruction finetuning datasets cached datasets (Arrow files) will still be reloaded from the datasets cache, by default at ~/. data-science data r healthcare rstats healthcare-datasets healthcare-application healthcare-analysis data-sets Updated Jul 9, 2024; R; Ertugrulmert / ECG-Time-Series Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Originally published at UCI Machine Learning Repository: Iris Data Set, this small dataset from 1936 is often used for testing out machine learning algorithms and visualizations (for example, Scatter Plot). Contribute to datasciencedojo/datasets development by creating an account on GitHub. You will find a copy of the GPL in the Rdatasets github repository. Demo apps to showcase Meta Llama3 for CodeSearchNet is a collection of datasets and benchmarks that explore the problem of code retrieval using natural language. com. Custom properties. We aim at providing the available datasets with a standard format (explained in Download section below) so that they can be directly used in experiments, without any work-load in preprocessing. AI-powered developer platform Information oversharing is a security and compliance use case powered by our newly available SharePoint datasets. Please include the following citation when using tensorflow-datasets for a paper, in addition to any citation specific to the used datasets. Skip to main content. And vote on the current set of requests by adding a thumbs-up reaction to the issue. 🤗 Datasets is a library for easily accessing and sharing datasets for Audio, Computer Vision, and Natural Language Processing (NLP) tasks. World Bank Data This data set consists of monthly stock price, dividends, and earnings data and the consumer price index (to allow conversion to real values), all starting January 1871. Core meta for awesome-public-datasets. Each line represents a C/C++ code unit (file, function, or class) and is associated with a binary label (non HierText is the first dataset featuring hierarchical annotations of text in natural scenes and documents. It includes detailed information on crop production, yield, acreage, and other relevant agricultural metrics at the state level. Flexible Data Ingestion. , Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022" - isl-org/MiDaS A collection of datasets of ML problem solving. Method of calculation: Since 1 April 1968, calculated from the daily morning fixing; From January 1950 to 21 March 1954, calculated using the Bank of England's gold purchasing price (1 ounce of fine = pound 12. cache/huggingface/datasets; Breaking changes. Search GitHub Docs Search. Uncompressed size in brackets. removed deprecated Notifications You must be signed in to change notification settings This is a list of openly available electrophysiological data, including EEG, MEG, ECoG/iEEG, and LFP data. General: 1 ounce of fine gold = 31. Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. Automate any workflow Packages. Datasets are sorted by year of publication. A large social network of GitHub developers which was collected from the public API in June 2019. The Security Datasets project is an open-source initiatve that contributes malicious and benign datasets, from different platforms, to the infosec community to expedite data analysis and threat research. Twitch-full contains the full dataset while Twitch-100k is a Tool to convert datasets from "Benchmark Data Sets for Graph Kernels" (K. Feel free to dig in. The dataset was created from the public GitHub dataset on Google BiqQuery. A quick guide (especially) for trending instruction finetuning datasets - GitHub - Zjh-819/LLMDataHub: A quick guide (especially) for trending instruction finetuning datasets Github Pages for CORGIS Datasets Project. Sign in Product Actions. They start with a The GitHub Code dataset consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in 1TB of data. Putting data to work means making data more accessible and easier to share. How to use it The GitHub Code dataset is a very large dataset so for most use cases it is recommended to make use of the A public dataset is any dataset that is stored in BigQuery and made available to the general public through the Google Cloud Public Dataset Program. Contribute to datasets/oil-prices development by creating an account on GitHub. 6k forks Report repository Releases 2. Added initial dash-sample-apps data Latest Aug 14, 2020 + 1 release Packages 0. Kersting et al. ; Historical Document Text: is usally designed for assisting GitHub: DressCode: A dataset focused on modeling the underlying 3D geometry and appearance of a person and their garments given a few or a single image. Some example data sets include FCC comments and data Comprehensive country code information, including ISO 3166 codes, ITU dialing codes, ISO 4217 currency codes, and many others - datasets/country-codes CrowdData is an open repository that aggregates the crowdsourced datasets that have individual crowd votes. 499 positive samples and 78 negative samples. br for Brazil, it for Italy). Datasets are split in 3 categories: Customers, Users and Organizations. Our goal is to facilitate the discovery and accessibility of high-quality EMG data and cutting More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects. Learn more about the tools, frameworks and templates available to enable more secure and streamlined access to data. This repository contains various files for CI and unit testing of nf-core pipelines and infrastructure. < > Update on GitHub. Download Open Datasets on 1000s of Projects + Share Projects on One Platform. No Blockchains. GitHub community articles Repositories. For example from your laptop to the cloud, to another user's machine, or to an HPC system. 2M), line, and paragraph level annotations. Datasets included in this repo serve for Awesome Public Datasets: GitHub. The 3D bounding box GitHub is where people build software. For example, Epi-validated outbreak datasets might be used with a SNP-based or MLST-based workflow. As the number of different values for intendedUse increases, other use-cases will be available. Awesome Public Datasets is an open-source dataset that contains topic-centric public data. or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Contribute to selva86/datasets development by creating an account on GitHub. Open Menu. Supports default & custom datasets for applications such as summarization and Q&A. GeneratorBasedBuilder. 100DOH: Understanding Human Hands in Contact at Internet Scale (CVPR 2020) [][]131 days of footage, 100K annotated hand-contact video frames. Host and manage Contribute to Ayushi0214/Datasets development by creating an account on GitHub. The allocated space to an image can be modified using the option --memory="100g". Topics Trending Collections Enterprise Enterprise platform. Curated open data has 144 repositories available. WIT is composed of a curated set of 37. Find and fix vulnerabilities Codespaces This post will be a bit different, in that we are looking at the top open dataset repositories that Github has to offer. Find and fix Contribute to ajaykuma/Datasets_For_Work development by creating an account on GitHub. github_nested: Contains a timeline of actions such as pull requests and Natural Scene Text: The images in this type of dataset are usually taken in natural scenes, so the difficulty of this task lies in the complex lighting transformations, shooting angles, blurring, varied fonts, etc. Contribute to openimages/dataset development by creating an account on GitHub. No packages published . Contribute to datasets/geoip2-ipv4 development by creating an account on GitHub. Collected and sorted from various blogs, answers, and user feedback, it combines free and paid data sets on physics, sports, software, natural language, and machine learning. Readme License. Below are four reasons, with accompanying real-world examples. 6 million entity rich image-text examples with 11. Twitch-full: See the Google Drive folder containing all Twitch files. The dataset contains 11639 images selected from the Open Images dataset, providing high quality word (~1. Otherwise, how you use a dataset is up to you! You signed in with another tab or window. Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP) - niderhoff/nlp-datasets There are a number of reasons why a data set that seems like it should exist might not, and they are all tied to the quiet complications inherent in data collection. You switched accounts on another tab or window. Open Search Bar Close Search Bar. The data also shows the country's population and the number of cases per 100,000 people on a rolling 14 day average. GeoIP2 - free IP geolocation database. This list is updated whenever a change to the official code list in ISO 3166-1 is effected by the ISO 3166/MA. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. This allows customers to better understand how secure their SharePoint is, maintain information There is a field intendedUse which suggests how a particular dataset might be used. This repo contains data sets that are required in order to perform the applications and exercises - GitHub - kirenz/datasets: This repo contains data sets that are required in order to perform the applications and exercises Data is posted to social media every Monday morning. We would like to emphasize that you should not draw conclusions about causation in the data. Sign in Product GitHub community articles Repositories. A long, categorized list of large datasets (available for public use) to try your analytics skills on. al. Charades-Ego: Actor and Observer: Joint Modeling of First and Third-Person Videos (CVPR 2018) [][112 people, 4000 paired videos, 157 action classes. 1034768g. Easy access to official spatial data sets of Brazil in R and Python. A repository of pretty cool datasets that I collected for network science and machine learning research. Binary label. ) when available from This is the official repository for the ACM CCS 2024 paper "Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models by Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, and Yang Zhang. I am happy if you want to help me update and/or improve this document. The vision of developing this repository is to have an exhaustive list of high quality datasets, API's and projects pertaining to COVID-19 which will be open to communities The dataset consists of 22325 bot accounts (among which 18444 are unique), and 107598 users separated into 3 groups: random - random accounts (low social homophily), shift - accounts from communities that pose some activity in VK (medium social homophily), student - verified users who are students of one university (high social homophily). Contribute to mwaskom/seaborn-data development by creating an account on GitHub. ; Machine-Learning-driven-Web-Application-Firewall - Set of good and bad queries to a web application firewall. All of these recommendation datasets can convert to the atomic files defined in RecBole, which is a unified, comprehensive and efficient recommendation library. The script MUST be runnable from a command line interface. Curated list of quality open datasets. This list states the country names (official short names in English) in alphabetical order as given in ISO 3166-1 and the corresponding ISO 3166-1-alpha-2 code elements. CSV Datasets. Web Server Logs. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. deep-learning dataset-creation graphical-models dataset-generation deep-learning-datasets graph-embedding graph-classification deep-learning-dataset "DeeperForensics-1. A public repo of datasets. Nodes are developers who have starred at least 10 repositories and edges are mutual follower relationships . removed deprecated arguments like use_auth_token, fs or ignore_verifications; Remove beam by @albertvillanova in #6987. , 2016) into a format suitable for deep learning research. Dataset Download Link Code; FE4395: Download: xlel: Visible Image Guided Thermal Image Super-Resolution Dataset. We want to make it easy to relocate an algorithm between different data storage environments without code changes. Create full-fledged APIs for slowly moving datasets without writing a single line of code. Datasets and resources listed here should all be openly-accessible for research purposes, requiring, at most, registration This is the "Iris" dataset. You might also want to have a look at Florian Boudin keyphrase extraction repository. Data Information on S&P 500 index used to be available on the official webpage on the Standard and Poor's website but until they publish it back, Wikipedia's [SP500 list of companies][sp-list] is the best up-to-date and open data source. In this project, employing our new framework JailbreakHub, we conduct the first measurement The dataset is split into a training set (391K images), a validation set (34k images), and a test set (67k images). 7 GB dataset helps predict which letter-name was spoken — a simple classification task. Each object is annotated with a 3D bounding box. new_dataset. CSV datasets for ML/AI models from captured network traffic during ZAP scanning with web applications like Django, Flask, React, Various interesting datasets, mostly data from The University of Illinois - wadefagen/datasets. Contribute new data here! data-science open-data public-data awesome-public-datasets Updated Sep 6, 2024; Contribute to datasets/covid-19 development by creating an account on GitHub. 73 watching Forks. Thanks for your contribution to the ML community! Pre-trained Model License. 6. AI-powered developer platform For our purposes, creating a custom TFDS dataset means writing a new class which inherits from tfds. Domain Name Service Logs. This github boasts a variety of datasets such as Climate Data, Time Series data, Plane crash data etc. This is a list of histopathology datasets made public for classification, segmentation, regression and/or registration tasks. The scripts are kept on the scripts/import directory, followed by the international place code (e. Datasets aims to standardize end-user interfaces, versioning, and documentation, while providing a lightweight front-end that behaves similarly for small datasets as Datasets on Github It hosts tons of awesome datasets. Follow the instructions in the new post for how to download the data. PurpleAir Air Quality Data - Developer API for accessing purple air For example, Github's GH Archive dataset can be used to analyze public events on GitHub, such as pull requests, repository stars, and opened issues. The list is separated into Free and Paid and broken into subsections based on loose categories. For each, sample CSV files range from 100 to 2 millions records. The dataset contains over 600 billion data points in over 10 billion data records. The indices are created using a repeat-sales methodology. Stars. Figure 1: Examples of DeepFashion2. In each video, the camera moves around and above the object and captures it from different views. use the Google Suggest API as basis for generating questions. Awesome Public Datasets on GitHub. Get started / Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Utku Evci, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, Hugo Larochelle Datasets. Log Server Aggregate Log. Voice Gender Detection - GitHub repo for Voice gender detection using the VoxCeleb dataset (7000+ unique speakers and utterances, 3683 males / 2312 females). Normally accessed via HTTP or Websockets. Datasets is a community library for contemporary NLP designed to support this ecosystem. You signed in with another tab or window. Welcome to awesome-emg-data, a curated list of Electromyography (EMG) datasets and scholarly publications designed for researchers, practitioners, and enthusiasts in the field of biomedical engineering, neurology, kinesiology, and related disciplines. Examples of DeepFashion2 are shown in Figure 1. Which one would you pick? By Anmol Rajpurohit on April 6, 2015 in Datasets, Finance, GitHub, Government, Machine Learning, NLP, Open Data, Time series data. load_dataset function to download sample datasets This repository contains a copy of machine learning datasets used in tutorials on MachineLearningMastery. rows/columns of numbers) were distributed, but I was unable to find a Curated list of Publicly available Big Data datasets. This repository was created to ensure that the datasets used in tutorials remain available and are not dependent upon unreliable third parties. Contribute to ajaykuma/Datasets_For_Work development by creating an account on GitHub. Skip to content. The principle for nf-core test data is as small as possible, as large as necessary. Which one would you pick? 🤗 Datasets is a library for easily accessing and sharing datasets for Audio, Computer Vision, and Natural Language Processing (NLP) tasks. They can be open by any application compatible with CSV files or with a CSV editor. - benedekrozemberczki/datasets. The authors of this repository do not own any licenses for the datasets, the goal of the project is to provide a stanardized collection of datasets for research purposes. Also links to the MAL dataset. The post was inspired by the Github Open Data Showcase, which is good, but which is not very large. These two datasets are blended into a single product to produce the combined global land and ocean temperature In this repository, we present information on datasets that have been used for hate speech detection or related concepts such as cyberbullying, abusive language, online harassment, among others, to make it easier for researchers to obtain datasets. ; Historical Document Text: is usally designed for assisting GitHub is where people build software. Run a container and join an interactive session, you will be Measuring accuracy can be easy in the case of mathematical problems using a Python interpreter, or near-impossible with open-ended, subjective questions. Code for robust monocular depth estimation described in "Ranftl et. 624 stars Watchers. The release of the FreebaseQA data set (NAACL 2019). (NLVR) A Corpus of Natural Language for Visual Reasoning, 2017 (MS MARCO) MS MARCO: A Human Generated MAchine Reading COmprehension Dataset, 2016 (NewsQA) NewsQA: A Machine Comprehension Dataset, 2016 (SQuAD) SQuAD: 100,000+ Questions for Machine Comprehension of Text, 2016 (GraphQuestions) On Generating Description: This dataset provides comprehensive agricultural crop data spanning the years 2010 to 2017 for all states across India. Contribute to YBI-Foundation/Dataset development by creating an account on GitHub. 40) in connection with the average exchange rate for the pound in New York (up to the end of Data is kept automatically up-to-date by bots that collect and update data once a week. Code List of Datasets Description; Animal Recognition Demo: Contains details on Animal Recognition datasets: Edge Real time Video Analytics: Contains details on Edge Real time Video Analytics datasets: Chat Bot Demo: Contains details on Chat Bot Demo datasets: Redis AI Showcase: Contains details on Redis AI Showcase datasets A Python library for downloading datasets from Kaggle, Google Drive, and other online sources, available on GitHub. As per the home page for Indices on S&P website: The S&P/Case-Shiller U GitHub is where people build software. Download and play with key datasets from Google Trends, curated by the Trends Data Team at Google team. Sign in Contribute to OTRF/Security-Datasets development by creating an account on GitHub. Remove deprecated code by @albertvillanova in #6996. This repository is linked to the google site. The datasets are generated using random values. yml file under the corresponding created Elenco Basi di Dati Chiave: Questo documento rappresenta il risultato dell’azione «Individuazione delle basi di dati chiave» definita nell’ambito degli Open Data del Piano Triennale per l’Informatica nella PA (2017-2019). GitHub Docs. The passages are then provided to PaLM-2 along with a prompt that asks the model to summarize the Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. Search our open datasets and learn how other researchers have used them to solve societal challenges. Contribute to virattt/financial-datasets development by creating an account on GitHub. py must contain a line similar to: A collection of datasets composed of software vulnerability metadata scraped from CVE Details, software metrics and security alerts generated by Static Analysis Tools (SATs), and version control metadata retrieved from each project's repository. Version: Free, Pro, & Team. Host and manage packages Security. A curated list of datasets, publically available for machine learning research in the area of manufacturing - nicolasj92/industrial-ml-datasets. ; Document Text: only focues on document images, the difficulty is the variety of typesetting. I think it helps to have Dataset and Evaluation toolbox: GitHub: Google Drive link: Google Drive: Remote Sensing Datasets. The SWIM-IR dataset is generated by first sampling passages from Wikipedia. Mosly using Python Faker DataSets helps make data wrangling code more reusable. Follow their code on GitHub. core. Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems. The code is available in this Github repository. If you know more datasets, and want to contribute, please, notify me. The Python Software Foundation's PyPI dataset can be used to analyze download requests for Python packages. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects. Financial datasets for LLMs 🧪. plotly. - niderhoff/big-data-datasets. Web Attack Payloads - A collection of web attack payloads. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. yml: Schedule: It runs the acquirer and commits the acquired data to the corresponding raw location This list is inspired by awesome public datasets, but for real-time datasets and sources. CSV datasets for ML/AI models from captured network traffic during ZAP scanning with web applications like Django, Flask, React, Vue and Spring - Anti-Nex training datasets Explore and download over 1200 datasets from various R packages and learn how to use them for statistical analysis and visualization. 1. 0: A Large-Scale Dataset for Real-World Face Forgery Detection", CVPR 2020: Paper Github "MaskGAN: Towards Diverse and Interactive Facial Image Manipulation", CVPR 2020: Paper Github ISO 3166-1-alpha-2 English country names and code elements. MIT license Activity. Data Collection: Berant et al. I made a good faith effort to determine the license under which the actual data (i. View on GitHub Google Trends @GoogleTrends What is Google Trends data? Email us. Brent crude and WTI oil prices from US EIA. Those CSV files can be used for testing purpose. naacl question-answering freebase nlp-datasets kb-qa freebaseqa Updated Aug 11, 2022; Pzoom522 / HistSumm Star 69. Datasets - Estadísticas Unidad de Análisis Financiero Unidad de Análisis Financiero. Each row of the table represents an iris flower, including its species and dimensions of its botanical parts, SHERLOCK - The dataset is essentially a massive time-series dataset spanning nearly every single kind of software and hardware sensor that can be sampled from a Samsung Galaxy S5 smartphone, without root privileges. Make sure that the Docker user has sufficent rights and has around 90GB of available space on its hard drive. West Point NSA Data Sets - Snort Intrusion Detection Log. Global temperature anomaly data come from the Global Historical Climatology Network-Monthly (GHCN-M) data set and International Comprehensive Ocean-Atmosphere Data Set (ICOADS), which have data from 1880 to the present. Natural Scene Text: The images in this type of dataset are usually taken in natural scenes, so the difficulty of this task lies in the complex lighting transformations, shooting angles, blurring, varied fonts, etc. Data comes from S&P Case-Shiller data and includes both the national index and the indices for 20 metropolitan regions. Contribute to datasets/awesome-data development by creating an account on GitHub. Its size enables WIT to be used as a pretraining dataset for GitHub is where people build software. rug axk lfkndm hpkorz jecxrhj zxpkmq tvdf luoicd phnq bhsdgkqi