Real humans make
real data. Real data
makes real AI.
How people communicate, work, and move - in the wild.
Explore datasets ↓Mission
Our mission is to bring
real human behavior into AI -
the conversations, decisions,
and movements.
Featured Datasets
Conversations
148B messages · 10.8T tokens · the largest natively multimodal dataset
Texts, Images, Voices, Videos - all entangled
89+ languages · reactions, forwards, threads, replies
Available datasets
Video
2.7B+ clips · 141M+ hours · 70+ PB
Short-form + long-form. Category subsets available.
Delivered in 4-12 weeks
Speech
390M voice messages · 3.2M hours of audio
65+ topic categories, diverse accents. Category subsets available.
Delivered in 2 weeks
Images
8.2B+ images · 99 formats
JPEG, PSD, RAW/DNG, HEIC, WebP & more
Delivered in 4-5 weeks
Documents
~35M+ files · ~334+ TB
PDF, DOCX, XLSX, PPTX & more. Category subsets available.
Delivered in 4-5 weeks
Code
25M+ files · 2B+ lines of code
C/C++ (61%), Python (15%), JavaScript, Lua, Shell - game engine source, SDKs.
Delivered in 4-5 weeks
Game Assets
161M+ files · 17+ TB
Textures (49M+), sound effects (6.8M+), 3D models, Unity projects, Minecraft worlds.
Delivered in 4-5 weeks
Books
1.3M+ files · 22+ TB
EPUB, MOBI, CBR/CBZ (comics), FB2 - multilingual long-form content.
Delivered in 4-5 weeks
Corporate Data
Messenger, task tracker, meetings transcriptions, emails — how work actually happens inside companies. Sourced through direct enterprise partnerships with consenting organizations.
Delivered on demand
Robotics
Video + IMU sensors · 3 tiers from GoPro to full sensor rig.
Delivered on demand
Browse more datasets or design one with us
We offer additional proprietary datasets not listed here. Contact us to request a sample, explore more options, or collaborate on a new dataset.
Access
How to access
our datasets
-
1.
Request samplesWe will set up a quick call to understand your use case and then send you relevant data samples.
-
2.
Purchase accessEnter a data license agreement for the dataset and use-cases your team needs.
-
3.
Receive dataFor off-the-shelf datasets, we will grant your team access within one to two days.
-
✦
Experiment with usWe frequently partner with research teams to design new shapes of data for any use case. Contact us for more information.
Team
Founding team
Artem Brustovetskii
Co-founder & CTO