By Aayush Prakash

I’ve spent the last decade researching and developing technologies that allow businesses and organizations to innovate more quickly. Most recently, I spent five years at NVIDIA researching new approaches to machine learning namely Structured Domain Randomization, Meta-Sim and Sim2SG (Sim-to-Real Scene Graph) generation. You can find my research here.

Why have I invested so much time in this space – which might seem esoteric? I believe these approaches are important steps towards resolving a critical problem with wide implications: the “domain gap.” That’s the difference between real and synthetic data performance. The gap has been shrinking, but it persists, and some real data is generally still required for optimal performance. 

If we can produce synthetic data that, on its own, can train algorithms as well or better than real data, then we resolve the data bottleneck in AI. Most prominently, we will unleash computer vision and transform capabilities everywhere you see a camera: CCTVs, robots, the phone in your hand.


Why I’m interested in this problem

Let’s take a step back. In our lifetimes, we have begun to perceive, and on some level experience, the vast potential of AI. However, data limits have restrained most AI applications. We can’t collect enough photography, or construct enough real-world scenarios, with which to train computers about the extent of the world. Humans must hand-label photography for a computer to understand it, and set up training grounds one by one. AI as we know it does not scale.

The obvious answer, if we can do it well, is to synthesize training materials and environments. What if we could computer-generate images of all possible objects and scenarios in all possible conditions: draughts and hurricanes; satellite perspectives and undersea views; a lone driver and an 18-vehicle pile-up? If we do this, we are limited only by the GPUs needed to generate hundreds of images and scenarios per minute. 


Why AI.Reverie

I’m joining AI.Reverie because among companies developing synthetic data, it has achieved the most tangible success, and its leaders have the clearest vision for the future.

The team does the research needed to produce new technology. AI.Reverie collaborated with In-Q-Tel to publish a paper that showed synthetic data can be used on its own to train an object detection and classification model. They released the largest open dataset of real and synthetic overhead imagery, and you can see their research at work in the Rareplanes case study

AI.Reverie is also applying its synthetic data to a wide range of real problems with clients like 7-Eleven, the World Bank and the U.S. Air Force. They’re creating data that can be used to train models that support contact-free shopping; detect defeats in shelters vulnerable to climate change; and identify threats to communities and soldiers. 

Most importantly, AI.Reverie has a concrete plan to close the domain gap in the next two to three years. My research and AI.Reverie’s proprietary technology are part of the equation. And if we resolve the domain gap, AI will finally scale.

Join us

Bring your questions to my conversation with AI.Reverie cofounder Daeil Kim in a webinar on May 26. Register here.

Join our team on our quest to resolve the domain gap and unleash the true potential of AI. We’re hiring

Partner with us on a business challenge via hello@aireverie.com.