Real humans make
real data. Real data
makes real AI.
How people communicate, work, and move - in the wild.
Explore datasets ↓Mission
Our mission is to bring
real human behavior into AI -
the conversations, decisions,
and movements.
Featured Datasets
Conversations
148B messages · 9.4T tokens · the largest natively multimodal dataset
Texts, Images, Voices, Videos - all entangled
50+ languages · reactions, forwards, threads, replies
Available datasets
Video
7.9B+ clips · 99M+ hours combined · 70 PB
Short-form + long-form. Category subsets available.
Delivered in 4-12 weeks
Speech
300M voice messages · 8M+ hours of audio
50+ languages, diverse accents. Category subsets available.
Delivered in 2 weeks
Images
6B images · 800 TB
JPEG, PSD, RAW/DNG, HEIC, WebP & more
Delivered in 4-5 weeks
Documents
~40M files · ~420 TB
PDF, DOCX, XLSX, PPTX & more. Category subsets available.
Delivered in 4-5 weeks
Music
150M tracks · 1.5 PB
Full-length tracks shared between real users. 50+ genres, global origin.
Delivered in 4-5 weeks
Corporate Data
Messenger, task tracker, meetings transcriptions, emails — how work actually happens inside companies. Sourced through direct enterprise partnerships with consenting organizations.
Delivered on demand
Robotics
Video + IMU sensors · 3 tiers from GoPro to full sensor rig.
Delivered on demand
Licensed Code
Proprietary enterprise codebases licensed directly from IP holders, including full private repository histories — production-grade, not available in public.
Delivered on demand
Browse more datasets or design one with us
We offer additional proprietary datasets not listed here. Contact us to request a sample, explore more options, or collaborate on a new dataset.
Access
How to access
our datasets
-
1.
Request samplesWe will set up a quick call to understand your use case and then send you relevant data samples.
-
2.
Purchase accessEnter a data license agreement for the dataset and use-cases your team needs.
-
3.
Receive dataFor off-the-shelf datasets, we will grant your team access within one to two days.
-
✦
Experiment with usWe frequently partner with research teams to design new shapes of data for any use case. Contact us for more information.
Team
Founding team
Artem Brustovetskii
CTO
Konstantin Kolesniak
Founding Engineer