DATASET Marketplace Store Free and Discounted Access of AI and Robotics DATASET

Navigation

DATASET
TEXT
IMAGE
VIDEO
AUDIO
REINFORCEMENT LEARNING
MULTI MODAL
AI Agent Teacher

Search AI Agent Marketplace

Try: Coding Agent Autonomous Agent GUI Agent MCP Server Sales Agent HR Agent

Overview

DATASET Marketplace and Directory Navigation of 40+ categories of AI, LLM, RL, Text, Image Datasets.

DATASET

Webscale-RL

salesforce/webscale-rl

DATASET - DATASET

5.0 star - 1 Reviews

Download 8.04k Like 73 Lineno 1.11M

# Webscale-RL Dataset | ## Dataset Description **Webscale-RL** is a large-scale reinforcement learning dataset designed to address the fundamental bottleneck in LLM RL training: the scarcity of high-quality, diverse RL data. While pretraining leverages **>1T diverse web tokens**, existing RL datasets remain limited to **<10B tokens** with constrained diversity. Webscale-RL bridges this gap

Agent Reinforcement Learning Open Dataset

deepnlp/agent-reinforcement-learning-open-dataset

DATASET - DATASET

5.0 star - 1 Reviews

500 credits

rl agent reinforcement learning

# Open Agent RL Dataset: High Quality AI Agent | Tool Use & Function Calls | Reinforcement Learning Datasets DeepNLP website provides **high quality, genuinue, online users' request** of Agent & RL datasets to help LLM foundation/SFT/Post Train to get more capable models at function call, tool use and planning. The datasets are collected and sampled from users' requests on our various clients

DATASET

ODA-Mixture-100k

opendataarena/oda-mixture-100k

DATASET - DATASET

0.0 star - 0 Reviews

Download 101k Like 9

# ODA-Mixture-100k ODA-Mixture-100k is a compact general-purpose post-training dataset curated from top-performing open corpora (selected via the *OpenDataArena* leaderboard) and refined through deduplication, benchmark decontamination. --- ## Dataset Summary - **Domain**: General-purpose(e.g., Math, Code, Reasoning, General). - **Format**: Problem → Solution (reasoning trace) → Final answe

DATASET

imagenet-1k

ilsvrc/imagenet-1k

DATASET - DATASET

0.0 star - 0 Reviews

Download 65.9k Like 599 Lineno 1.43M

Access to dataset ILSVRC/imagenet-1k is restricted. You must have access to it and be authenticated to access it. Please log in.

DATASET

CADS-dataset

mrmrx/cads-dataset

DATASET - DATASET

0.0 star - 0 Reviews

Download 62.3k Like 24 Lineno 22.1k

# CADS: A Comprehensive Anatomical Dataset and Segmentation for Whole-Body Anatomy in Computed Tomography ## Overview CADS is a robust, fully automated framework for segmenting 167 anatomical structures in Computed Tomography (CT), spanning from head to knee regions across diverse anatomical systems. The framework consists of two main components: 1. **CADS-dataset**: - 22,022 CT volumes w

DATASET

MATH-500

huggingfaceh4/math-500

DATASET - DATASET

0.0 star - 0 Reviews

Download 60.7k Like 201 Lineno 500

# Dataset Card for MATH-500 This dataset contains a subset of 500 problems from the MATH benchmark that OpenAI created in their _Let's Verify Step by Step_ paper. See their GitHub repo for the source file: https://github.com/openai/prm800k/tree/main?tab=readme-ov-file#math-splits

DATASET

OpenVid-1M

nkp37/openvid-1m

DATASET - DATASET

0.0 star - 0 Reviews

Download 36.5k Like 231 Lineno 1.45M

# Summary This is the dataset proposed in our paper [**[ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation**](https://arxiv.org/abs/2407.02371). OpenVid-1M is a high-quality text-to-video dataset designed for research institutions to enhance video quality, featuring high aesthetics, clarity, and resolution. It can be used for direct training or as a qualit

DATASET

dolma3_dolmino_mix-100B-1025

allenai/dolma3_dolmino_mix-100b-1025

DATASET - DATASET

0.0 star - 0 Reviews

Download 31.8k Like 5 Lineno 14.1M

# Dolma 3 Dolmino Mix (100B) The Dolma 3 Dolmino Mix (100B) is the mixture of high-quality data used for the second stage of training for Olmo 3 7B model. ### Dataset Sources | Source | Category | Tokens | Documents | |--------|----------|--------|-----------| | TinyMATH Mind | Math (synth) | 898M (0.9%) | 1.52M | | TinyMATH PoT | Math (synth) | 241M (0.24%) | 758K | | CraneMath | Math (sy

DATASET

alpaca-cleaned

yahma/alpaca-cleaned

DATASET - DATASET

0.0 star - 0 Reviews

Download 28.2k Like 740 Lineno 51.8k

# Dataset Card for Alpaca-Cleaned - **Repository:** https://github.com/gururise/AlpacaDataCleaned ## Dataset Description This is a cleaned version of the original Alpaca Dataset released by Stanford. The following issues have been identified in the original release and fixed in this dataset: 1. **Hallucinations:** Many instructions in the original dataset had instructions referencing data on t

DATASET

UltraData-Math

openbmb/ultradata-math

DATASET - DATASET

0.0 star - 0 Reviews

Download 27.4k Like 202 Lineno 181M

# UltraData-Math Dataset | Source Code | 中文 README ***UltraData-Math*** is a large-scale, high-quality mathematical pre-training dataset totaling **290B+ tokens** across three progressive tiers—**L1** (170.5B tokens web corpus), **L2** (33.7B tokens quality-selected), and **L3** (88B tokens multi-format refined)—designed to systematically enhance mathematical reasoning in LLMs. It has

DATASET

olmo-mix-1124

allenai/olmo-mix-1124

DATASET - DATASET

0.0 star - 0 Reviews

Download 20.3k Like 80 Lineno 621M

# OLMo 2 (November 2024) Pretraining set Collection of data used to train OLMo-2-1124 models. The majority of this dataset comes from DCLM-Baseline with no additional filtering, but we provide the explicit breakdowns below. | Name | Tokens | Bytes (uncompressed) | Documents | License | |-----------------|--------|----------------------|-----------|-----------| | DCLM-Baseline | 3.

DATASET

tiny-imagenet

zh-plus/tiny-imagenet

DATASET - DATASET

0.0 star - 0 Reviews

Download 20k Like 86 Lineno 110k

# Dataset Card for tiny-imagenet ## Dataset Description - **Homepage:** https://www.kaggle.com/c/tiny-imagenet - **Repository:** [Needs More Information] - **Paper:** http://cs231n.stanford.edu/reports/2017/pdfs/930.pdf - **Leaderboard:** https://paperswithcode.com/sota/image-classification-on-tiny-imagenet-1 ### Dataset Summary Tiny ImageNet contains 100000 images of 200 classes (500 for each

DATASET

Alpaca-CoT

qingyisi/alpaca-cot

DATASET - DATASET

0.0 star - 0 Reviews

Download 19.1k Like 740

# Instruction-Finetuning Dataset Collection (Alpaca-CoT) This repository will continuously collect various instruction tuning datasets. And we standardize different datasets into the same format, which can be directly loaded by the of Alpaca model. We also have conducted empirical study on various instruction-tuning datasets based on the Alpaca model, as shown in . If you think this dataset c

DATASET

multilingual_librispeech

facebook/multilingual_librispeech

DATASET - DATASET

0.0 star - 0 Reviews

Download 18.7k Like 163 Lineno 1.49M

# Dataset Card for MultiLingual LibriSpeech ## Table of Contents - - - - - - - - - - - - - - - - - - - - - - - ## Dataset Description - **Homepage:** - **Repository:** [Needs More Information] - **Paper:** - **Leaderboard:** ### Dataset Summary This is a streamable version of the Multilingual LibriSpeech (MLS) dataset. The data ar

DATASET

BLIP3o-Pretrain-Long-Caption

blip3o/blip3o-pretrain-long-caption

DATASET - DATASET

0.0 star - 0 Reviews

Download 17.1k Like 53 Lineno 27.2M

## BLIP3o Pretrain Long-Caption Dataset This collection contains **27 million images**, each paired with a long (~120 token) caption generated by **Qwen/Qwen2.5-VL-7B-Instruct**. ### Download \`\`\`python from huggingface_hub import snapshot_download snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Long-Caption", repo_type="dataset" ) \`\`\` ## Load Dataset without Extracting You do

DATASET

boolq

google/boolq

DATASET - DATASET

0.0 star - 0 Reviews

Download 16.4k Like 89 Lineno 12.7k

# Dataset Card for Boolq ## Table of Contents - - - - - - - - - - - - - - - - - - - - - - ## Dataset Description - **Homepage:** - **Repository:** https://github.com/google-research-datasets/boolean-questions - **Paper:** https://arxiv.org/abs/1905.10044 - **Point of Contact:** - **Size of downloaded dataset files:** 8.77 MB - **Size of

DATASET

AIME_2024

maxwell-jia/aime_2024

DATASET - DATASET

0.0 star - 0 Reviews

Download 13.3k Like 72 Lineno 30

# AIME 2024 Dataset ## Dataset Description This dataset contains problems from the American Invitational Mathematics Examination (AIME) 2024. AIME is a prestigious high school mathematics competition known for its challenging mathematical problems. ## Dataset Details - **Format**: JSONL - **Size**: 30 records - **Source**: AIME 2024 I & II - **Language**: English ### Data Fields Each record

DATASET

Stable-Diffusion-Prompts

gustavosta/stable-diffusion-prompts

DATASET - DATASET

0.0 star - 0 Reviews

Download 13.2k Like 505 Lineno 81.9k

# Stable Diffusion Dataset This is a set of about 80,000 prompts filtered and extracted from the image finder for Stable Diffusion: "". It was a little difficult to extract the data, since the search engine still doesn't have a public API without being protected by cloudflare. If you want to test the model with a demo, you can go to: "". If you want to see the model, go to: "".

DATASET

MathVision

mathllms/mathvision

DATASET - DATASET

0.0 star - 0 Reviews

Download 13k Like 103 Lineno 3.34k

# Measuring Multimodal Mathematical Reasoning with the MATH-Vision Dataset ] ] ] ] ] ] ## Data Usage \`\`\`python from datasets import load_dataset dataset = load_dataset("MathLLMs/MathVision") print(dataset) \`\`\` ## Acknowledgments We would like to thank the following contributors for helping improve the dataset quality: - for correcting answers for ID 338 and ID 1826 ## News

DATASET

MATH-lighteval

digitallearninggmbh/math-lighteval

DATASET - DATASET

0.0 star - 0 Reviews

Download 12.8k Like 49 Lineno 25k

# Dataset Card for Mathematics Aptitude Test of Heuristics (MATH) dataset in lighteval format ## Table of Contents - - - - - - - - - - - - - ## Dataset Description - **Homepage:** https://github.com/hendrycks/math - **Repository:** https://github.com/hendrycks/math - **Paper:** https://arxiv.org/pdf/2103.03874.pdf - **Leaderboard:** N/A - **Point of Contact:** Dan

Page

Reviews

Write Your Review

Detailed Ratings

Community
可灵(Kling AI)的文生视频功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-10-31 12:23 #AI Store #Video Generator #Image Generator

大家在使用可灵AI生成视频的时候遇到了哪些好的体验和有问题的体验？请务必写明prompt输入文本和视频截图or短视频clip

READ MORE
即梦(Dreamina)的文生图文生视频功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-10-31 12:24 #AI Store #Video Generator #Image Generator

大家在使用抖音的即梦AI生成视频的时候遇到了哪些好的体验和有问题的体验？请务必写明prompt输入文本和视频截图or短视频clip

READ MORE
快手(Kuaishou Kwai)内容的搜索推荐功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-10-31 12:26 #AI Store #Search #Recommendation

大家在使用快手(Kuaishou Kwai)短视频的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验？请麻烦写明复现条件，比如prompt输入文本，上传截图。

READ MORE
小红书(Xiaohongshu)内容的搜索推荐功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-10-31 12:27 #AI Store #Search #Recommendation

大家在使用小红书(Xiaohongshu)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验？请麻烦写明复现条件，比如prompt输入文本，上传截图。

READ MORE
微信(WeChat)内容的搜索推荐功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-10-31 12:29 #AI Store #Search #Recommendation

大家在使用微信(WeChat)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验？请麻烦写明复现条件，比如prompt输入文本，上传截图。

READ MORE
微信(WeChat)的AI问答功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-10-31 12:30 #AI Store #AI Search Engine

大家在使用微信(WeChat)APP的AI问答功能的时候，遇到了哪些好的体验和有问题的体验？请麻烦写明复现条件，比如prompt输入文本，上传截图。

READ MORE
知乎(Zhihu)的搜索推荐功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-10-31 12:31 #AI Store #Search #Recommendation

大家在使用知乎(Zhihu)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验？请麻烦写明复现条件，比如prompt输入文本，上传截图。

READ MORE
京东(JD)的搜索推荐功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-10-31 12:32 #AI Store #Search #Recommendation

大家在使用京东(JD)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验？请麻烦写明复现条件，比如prompt输入文本，上传截图。

READ MORE
淘宝(Taobao)的搜索推荐功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-10-31 12:33 #AI Store #Search #Recommendation

大家在使用淘宝(Taobao)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验？请麻烦写明复现条件，比如prompt输入文本，上传截图。

READ MORE
支付宝(Alipay)的搜索推荐功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-10-31 12:34 #AI Store #Search #Recommendation

大家在使用支付宝(Alipay)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验？请麻烦写明复现条件，比如prompt输入文本，上传截图。

READ MORE
拼多多(PDD Temu)的搜索推荐功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-10-31 12:35 #AI Store #Search #Recommendation

大家在使用拼多多(PPD Temu)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验？请麻烦写明复现条件，比如prompt输入文本，上传截图。

READ MORE
知乎直答(Zhihu)AI搜索功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-11-07 08:43 #AI Store #AI Search Engine

大家在使用知乎直答(Zhihu)AI搜索功能的时候，遇到了哪些好的体验和有问题的体验？请麻烦写一下当时输入的条件，比如prompt输入文本，或者是上传截图。

READ MORE
百度(Baidu AI Search)AI搜索的功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-11-07 08:46 #AI Store #AI Search Engine

大家在使用知乎直答(Zhihu)AI搜索功能的时候，遇到了哪些好的体验和有问题的体验？请麻烦写一下当时输入的条件，比如prompt输入文本，或者是上传截图。

READ MORE
快手(Kuaishou)的AI搜索功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-11-07 08:47 #AI Store #AI Search Engine

大家在使用快手(Kuaishou)的AI搜索功能的时候，遇到了哪些好的体验和有问题的体验？请麻烦写一下当时输入的条件，比如prompt输入文本，或者是上传截图。

READ MORE
抖音(Douyin Tiktok)的AI搜索功能有哪些体验的GOOD CASE和BAD CASE?

community helper 2024-11-07 08:48 #AI Store #AI Search Engine

大家在使用抖音(Douyin Tiktok)的AI搜索功能的时候，遇到了哪些好的体验和有问题的体验？请麻烦写一下当时输入的条件，比如prompt输入文本，或者是上传截图。

READ MORE
What are the Best and Coolest AI Generated Images you have ever seen?

community_assistant 2024-08-25 23:05 #AI Store #Image Generator

Please leave your thoughts on the best and coolest AI Generated Images.

READ MORE
Are there any free alternatives to Midjourney Stable Diffusion and other AI Image Generators?

community_assistant 2024-08-25 23:05 #AI Store #Image Generator

Please leave your thoughts on free alternatives to Midjourney Stable Diffusion and other AI Image Generators.

READ MORE
What are the Worst Most Scary or Creepiest AI Generated Image you have ever seen?

community_assistant 2024-08-25 23:05 #AI Store #Image Generator

Please leave your thoughs on the most scary or creepiest AI Generated Images.

READ MORE
Generative AI Search Engine Optimization: How to Improve Your Content

rockingdingo 2024-08-25 23:05 #AI Search Engine #SEO #GEO #Generative Engine Optimization #AI Store

We are witnessing great success in recent development of generative Artificial Intelligence in many fields, such as AI assistant, Chatbot, AI Writer. Among all the AI native products, AI Search Engine such as Perplexity, Gemini and SearchGPT are most attrative to website owners, bloggers and web content publishers. AI Search Engine is a new tool to provide answers directly to users' questions (queries). In this blog, we will give some brief introduction to basic concepts of AI Search Engine, including Large Language Models (LLM), Retrieval-Augmented Generation(RAG), Citations and Sources. Then we will highlight some majors differences between traditional Search Engine Optimization (SEO) and Generative Engine Optimization(GEO). And then we will cover some latest research and strategies to help website owners or content publishers to better optimize their content in Generative AI Search Engines.

READ MORE
Do you think human drivers will not be needed due to AI of robotaxi and self-driving vehicles

community_assistant 2024-08-25 23:05 #AI Store #unemployment #self-driving vehicles #robotaxi

We are seeing more applications of robotaxi and self-driving vehicles worldwide. Many large companies such as Waymo, Tesla and Baidu are accelerating their speed of robotaxi deployment in multiple cities. Some human drivers especially cab drivers worry that they will lose their jobs due to AI. They argue that the lower operating cost and AI can work technically 24 hours a day without any rest like human will have more competing advantage than humans. What do you think?

READ MORE

Chatbot Close

Bot
Hi TEMP_f4239b06,
How can I help you today?

Send

Navigation

Overview

Reviews

Tags

Write Your Review

Community