Real humans make
real data. Real data
makes real AI.
How people communicate, work, and move - in the wild.
Explore datasets ↓Mission
Our mission is to bring
real human behavior into AI -
the conversations, decisions,
and movements.
Featured Datasets
Conversations
148B messages · 9.4T tokens · the largest natively multimodal dataset
Texts, Images, Voices, Videos - all entangled
50+ languages · reactions, forwards, threads, replies
Available datasets
Video
7.9B+ clips · 99M+ hours combined · 70 PB
Short-form + long-form. Category subsets available.
Delivered in 4-12 weeks
Speech
300M voice messages · 8M+ hours of audio
50+ languages, diverse accents. Category subsets available.
Delivered in 2 weeks
Images
6B images · 800 TB
JPEG, PSD, RAW/DNG, HEIC, WebP & more
Delivered in 4-5 weeks
Documents
~40M files · ~420 TB
PDF, DOCX, XLSX, PPTX & more. Category subsets available.
Delivered in 4-5 weeks
Music
150M tracks · 1.5 PB
Full-length tracks shared between real users. 50+ genres, global origin.
Delivered in 4-5 weeks
Corporate Data
Messenger, task tracker, meetings transcriptions, emails — how work actually happens inside companies. Sourced through direct enterprise partnerships with consenting organizations.
Delivered on demand
Robotics
Video + IMU sensors · 3 tiers from GoPro to full sensor rig.
Delivered on demand
Licensed Code
Proprietary enterprise codebases licensed directly from IP holders, including full private repository histories — production-grade, not available in public.
Delivered on demand
Browse more datasets or design one with us
We offer additional proprietary datasets not listed here. Contact us to request a sample, explore more options, or collaborate on a new dataset.
Access
How to access
our datasets
-
1.
Request samplesWe will set up a quick call to understand your use case and then send you relevant data samples.
-
2.
Purchase accessEnter a data license agreement for the dataset and use-cases your team needs.
-
3.
Receive dataFor off-the-shelf datasets, we will grant your team access within one to two days.
-
✦
Experiment with usWe frequently partner with research teams to design new shapes of data for any use case. Contact us for more information.