Stacey Higginbotham speaks to Dan Jeavons, Chief Data Scientist at Shell, and Paul Walborsky, COO of AI.Reverie, about troubleshooting AI models:

When I asked [Jeavons] about Shell’s machine learning projects, using simulated data was one that he was incredibly excited about because it helps build models that can detect problems that occur only rarely.

“I think it’s a really interesting way to get info on the edge cases that we’re trying to solve,” he said. “Even though we have a lot of data, the big problem that we have is that, actually, we often only had a very few examples of what we’re looking for.”

In the oil business, corrosion in factories and pipelines is a big challenge, and one that can lead to catastrophic failures. That’s why companies are careful about not letting anything corrode to the point where it poses a risk. But that also means the machine learning models can’t be trained on real-world examples of corrosion. So Shell uses synthetic data to help.

Read more: Fake Data Is Great Data When It Comes to Machine Learning