Real humans make
real data. Real data
makes real AI.

How people communicate, work, and move - in the wild.

Explore datasets
Mission

Our mission is to bring
real human behavior into AI -
the conversations, decisions,
and movements.

Conversations
148B messages · 10.8T tokens · the largest natively multimodal dataset
Texts, Images, Voices, Videos - all entangled
89+ languages · reactions, forwards, threads, replies
Video
2.7B+ clips · 141M+ hours · 70+ PB
Short-form + long-form. Category subsets available.
Delivered in 4-12 weeks
Speech
390M voice messages · 3.2M hours of audio
65+ topic categories, diverse accents. Category subsets available.
Delivered in 2 weeks
Images
8.2B+ images · 99 formats
JPEG, PSD, RAW/DNG, HEIC, WebP & more
Delivered in 4-5 weeks
Documents
~35M+ files · ~334+ TB
PDF, DOCX, XLSX, PPTX & more. Category subsets available.
Delivered in 4-5 weeks
Code
25M+ files · 2B+ lines of code
C/C++ (61%), Python (15%), JavaScript, Lua, Shell - game engine source, SDKs.
Delivered in 4-5 weeks
Game Assets
161M+ files · 17+ TB
Textures (49M+), sound effects (6.8M+), 3D models, Unity projects, Minecraft worlds.
Delivered in 4-5 weeks
Books
1.3M+ files · 22+ TB
EPUB, MOBI, CBR/CBZ (comics), FB2 - multilingual long-form content.
Delivered in 4-5 weeks
Corporate Data
Messenger, task tracker, meetings transcriptions, emails — how work actually happens inside companies. Sourced through direct enterprise partnerships with consenting organizations.
Delivered on demand
Robotics
Video + IMU sensors · 3 tiers from GoPro to full sensor rig.
Delivered on demand

Browse more datasets or design one with us

We offer additional proprietary datasets not listed here. Contact us to request a sample, explore more options, or collaborate on a new dataset.

Contact us →
Access

How to access
our datasets

  1. 1.
    Request samples
    We will set up a quick call to understand your use case and then send you relevant data samples.
  2. 2.
    Purchase access
    Enter a data license agreement for the dataset and use-cases your team needs.
  3. 3.
    Receive data
    For off-the-shelf datasets, we will grant your team access within one to two days.
  4. Experiment with us
    We frequently partner with research teams to design new shapes of data for any use case. Contact us for more information.
Team

Founding team

Vadims Casecnikovs
Vadims Casecnikovs
Co-founder & CEO
LinkedIn ↗
Artem Brustovetskii
Artem Brustovetskii
Co-founder & CTO
Lev Chizhov
Lev Chizhov
Co-founder & CBO
LinkedIn ↗