Real humans make
real data. Real data
makes real AI.

How people communicate, work, and move - in the wild.

Explore datasets
Mission

Our mission is to bring
real human behavior into AI -
the conversations, decisions,
and movements.

Conversations
148B messages · 9.4T tokens · the largest natively multimodal dataset
Texts, Images, Voices, Videos - all entangled
50+ languages · reactions, forwards, threads, replies
Video
7.9B+ clips · 99M+ hours combined · 70 PB
Short-form + long-form. Category subsets available.
Delivered in 4-12 weeks
Speech
300M voice messages · 8M+ hours of audio
50+ languages, diverse accents. Category subsets available.
Delivered in 2 weeks
Images
6B images · 800 TB
JPEG, PSD, RAW/DNG, HEIC, WebP & more
Delivered in 4-5 weeks
Documents
~40M files · ~420 TB
PDF, DOCX, XLSX, PPTX & more. Category subsets available.
Delivered in 4-5 weeks
Music
150M tracks · 1.5 PB
Full-length tracks shared between real users. 50+ genres, global origin.
Delivered in 4-5 weeks
Corporate Data
Messenger, task tracker, meetings transcriptions, emails — how work actually happens inside companies. Sourced through direct enterprise partnerships with consenting organizations.
Delivered on demand
Robotics
Video + IMU sensors · 3 tiers from GoPro to full sensor rig.
Delivered on demand
Licensed Code
Proprietary enterprise codebases licensed directly from IP holders, including full private repository histories — production-grade, not available in public.
Delivered on demand

Browse more datasets or design one with us

We offer additional proprietary datasets not listed here. Contact us to request a sample, explore more options, or collaborate on a new dataset.

Contact us →
Access

How to access
our datasets

  1. 1.
    Request samples
    We will set up a quick call to understand your use case and then send you relevant data samples.
  2. 2.
    Purchase access
    Enter a data license agreement for the dataset and use-cases your team needs.
  3. 3.
    Receive data
    For off-the-shelf datasets, we will grant your team access within one to two days.
  4. Experiment with us
    We frequently partner with research teams to design new shapes of data for any use case. Contact us for more information.