Copyright © 2025 Asa Bailey. International Academy of Television Arts & Sciences All rights reserved.

From Framing to Capturing

I remember when all I needed was my viewfinder and a gut feeling for the perfect frame. Now I'm hunting photons, depth maps and neural coordinates like some kind of digital prospector. Welcome to my new world where cinematography meets data science. I'm Asa Bailey, a Director of Virtual production, your techno DP.

The Great Shift: When Frames Became Datasets

Let me tell you something about shooting for AI that nobody warned me about: your perfectly composed shot doesn't matter anymore. Not in the way it used to.

In the old days, I'd obsess over the rule of thirds, depth of field, lens choice the visual language we've refined for over a century. Now? I'm capturing structured data that an algorithm will reassemble, reshape, and reimagine. My "frame" isn't a rectangle anymore, it's a volumetric cloud of information that exists in dimensions I can't even see.

This isn't just a change in technology; it's a philosophical earthquake for anyone who grew up worshiping at the altar of composition.

Data Doesn't Lie, But It Speaks a Different Language

When I shoot data instead of frames, I'm not capturing what things look like, I'm capturing what things are. Dynamic NeRFs aren't static 3D models frozen in time; they're evolving datasets that breathe and move. Think of them as 3D holograms playing inside the computer's mind, where every ray of light, every shadow, every subtle shift in perspective becomes part of an interconnected dance.

The AI doesn't watch footage like we do. It doesn't squint at a monitor and say, "I think he moved left." It processes coordinates, depth relationships, and light interactions simultaneously. The difference between shooting footage and shooting data is like the difference between looking at a painting and stepping inside it.

Two Ways to Bottle Lightning, The Lab vs. The Wild

The Performance Lab: Sterile but Seductive

Now I'm setting up multi-camera arrays with depth sensors and LiDAR scanners like some kind of technical director on a NASA mission. Here, we're not just filming actors; we're scanning their molecular existence into the digital realm. Every micro-expression, every muscle twitch becomes data points in a volumetric capture. The lighting is flat and boring by traditional standards, but it's perfect for clean data acquisition. No shadows to confuse the algorithms, no distractions to corrupt the dataset.

The results? Pristine NeRF models that AI can manipulate with surgical precision. Digital performances so clean you can rotate around them like they're museum exhibits.

The Urban Chase: Beautiful Chaos

Then there's shooting in the wild. Multiple cameras mounted everywhere: inside cars, on street corners, strapped to drones, even body cams on stunt performers. The lighting changes by the second, reflections bounce off skyscrapers, and people walk through your shot without signing release forms.

This is messy, glorious chaos. The depth data is inconsistent, the lighting unpredictable. But the texture of reality is intact—the physics of how a car rocks when it takes a corner too fast, how fabric ripples in genuine wind, how light scatters through real rain.

The AI has to work harder here, filling gaps, making educated guesses about occluded spaces, but what it learns is invaluable: the authentic motion grammar of the physical world.

Marrying Science and Art: The Hybrid Workflow

Here's what I've learned: don't choose combine. Start in the lab, end in the wild.

I now capture actors in controlled environments first clean volumetric data that preserves every nuance of human performance. Then I shoot environments separately, letting those Gaussian Splats soak up all the messy, beautiful complexity of real-world motion.

Later, the AI merges these worlds placing pristine character performances into reconstructed environments, maintaining the authenticity of both. It's like having your actors perform in a perfect bubble, then dropping that bubble into the chaotic real world without breaking it.

The End of "That's a Wrap"

The strangest part? I no longer know when shooting ends and post-production begins. After we've captured our datasets, the director can still "shoot" from any angle, change lighting, adjust camera move weeks after we've struck the set.

I haven't framed my last shot I've captured my first dataset. From here, the "filmmaking" continues as algorithms reconstruct our captures into infinite possible perspectives.

This isn't just new technology; it's a new relationship with reality itself. We're no longer preserving a single perspective, we're capturing the underlying physics of a moment, the data DNA of a scene that can be expressed in countless ways.

So next time you see me on set with more computing power than camera gear, remember: I'm not just shooting footage. I'm bottling reality itself, one photon at a time.

Shooting Data for AI Filmmaking

Shifting the Perspective: From Cinematic Framing to Data Capture

Traditional filmmaking revolves around framing the shot to tell a story visually. But in AI-driven filmmaking, we are not just capturing frames; we are capturing datasets that can be reconstructed, reinterpreted, and repurposed by AI. This shift requires a completely different approach to shooting, focusing on data integrity, motion accuracy, and depth fidelity rather than traditional cinematography alone.

To explore this, let’s examine two fundamentally different ways to capture action data, how they affect AI-driven filmmaking, and how a hybrid approach could be the key to unlocking new creative workflows.

Two Methods of Capturing AI Performance Data

Controlled Stage Capture – The Performance Lab

What It Is: A carefully designed space where actors perform in front of multiple cameras, optimized for clean motion capture and depth data acquisition.

Setup:

Multi-camera array: Fixed cameras recording from multiple angles.
Depth sensors and LiDAR scanners: Provide precise 3D spatial data.
Uniform lighting: Eliminates shadows and ensures clean, consistent image data.
Minimal background distractions: Focuses entirely on capturing the actor’s movements.

Data Captured:

High-fidelity body and facial motion data for AI character animation.
Volumetric 3D representation using techniques like NeRF (Neural Radiance Fields) and Gaussian Splats.
Clean datasets with minimal artifacts, allowing AI to accurately reconstruct performances.

Best Use Cases:

Training AI for hyper-realistic digital doubles.
Extracting motion for AI-driven character animation.
Creating structured NeRF-based performance datasets.

On-Location Action Scene – The Dynamic Chase

What It Is: Filming real-world action scenes with multiple mobile cameras, such as a car chase in a city.

Setup:

Multiple moving cameras (inside the car, on the streets, drones, body cams).
Unstructured depth capture (natural environment, no controlled lighting).
Real-world motion and occlusions, requiring AI processing to extract meaningful data.

Data Captured:

Dynamic environment motion (how objects and actors move in real-world settings).
Multiple depth perspectives, though often noisy and requiring AI-based interpolation.
Scene complexity, including lighting variations, occlusions, and reflections.

Best Use Cases:

AI training for understanding complex real-world motion.
Reconstructing entire environments using Gaussian Splat-based scene models.
Providing AI with realistic datasets for learning physics-based motion interactions.

Comparing the Two Approaches

The Hybrid Approach: Best of Both Worlds

Instead of choosing one method, we can combine them into a unified AI filmmaking workflow:

Capture Controlled Performances First

Use NeRF-based volumetric capture to extract actor performances in a clean, controlled setting.
Train AI models with precise motion and expression data before introducing environmental complexities.

Capture Real-World Motion Separately

Film on-location to record environmental motion dynamics using Gaussian Splats.
Reconstruct entire action scenes in 3D space, enabling AI-driven replays from any angle.

Merge the Two Datasets

Use the clean NeRF character motion data and place it inside Gaussian Splat-based environments.
Train AI systems to understand movement in both structured (studio) and unstructured (real-world) settings.
Allow AI to generate final AI-powered cinematic shots, keeping human performance realism intact.

Why This Matters for AI Filmmaking

This workflow challenges the way we think about shooting for AI-driven film production. Instead of trying to capture final shots, we are capturing datasets that allow AI to:

Reconstruct any shot from any angle, even after filming is complete.
Ensure real-world physics and motion accuracy while maintaining AI flexibility.
Use controlled actor performances without limiting them to pre-set camera angles.
Allow filmmakers to reshoot or reposition AI-generated cinematography after capturing the raw data.

By shifting our focus from framing the shot to shooting for AI datasets, we unlock a whole new level of control and flexibility in filmmaking—blurring the lines between real-world performance and AI-generated cinematography.

Final Thought: The Future of AI Film Data Capture

As AI filmmaking continues to evolve, production teams will need to rethink how they approach production and its potential new aim - data collection. The key is to move beyond traditional cinematography and instead focus on building structured motion datasets that AI can later use to generate the final film.

Whether capturing a highly detailed NeRF performance on a controlled stage or reconstructing a high-speed car chase using Gaussian Splats, the goal remains the same:

Capture the world in a way AI can understand, reconstruct, and reimagine.

Asa Bailey

Director of Virtual Production
https://bai-ley.com

Rabbet Hole Links and terms:

https://dl.acm.org/doi/10.1145/3606038.3616158

Convert Multi-Cam Video into AI Training Data

NeRF Reconstruction (Neural Radiance Fields) creates a volumetric model of the performance.
AI estimates depth, occlusions, lighting conditions, and fine motion details.
Facial performance NeRF is merged with body motion NeRF to create a full-performance dataset.

Output: A time-evolving NeRF animation that preserves performance, movement, and spatial positioning but strips away visual textures (clothing, lighting, etc.).

1. Dynamic NeRFs: These models extend NeRFs to capture scenes that change over time. By incorporating temporal information, Dynamic NeRFs can render novel views of time-evolving scenes. However, they often require significant computational resources and extensive data capture.

2. CL-NeRF (Continual Learning NeRF): Proposed in 2023, CL-NeRF addresses the challenge of adapting NeRFs to scenes that evolve over time. It efficiently updates the NeRF model with a few new images, retaining memory of unchanged areas. This approach reduces the need for complete retraining when parts of the scene change.

3. Articulated Neural Point Clouds: This method utilizes a point-based representation combined with Linear Blend Skinning (LBS) to learn dynamic NeRFs and associated skeletal models from sparse multi-view videos. It enables high-quality novel view synthesis and reposing of captured objects without requiring object-specific skeletal templates.

How to Shoot Data For AI Filmmaking