fbpx

Imagine a flight simulator training new astronauts.

NASA used every bit of data they gathered from previous launches and created various scenarios to train the team for the journey.

Then comes the reality.

These astronauts will encounter situations in space like what the simulator exposed them to, but with additional variables.

They have to make split-second decisions based on what the simulator taught them (training data from source domains and labeled data) — plus using the data in front of them (new dataset, domain shifts, domain transfer, new source, and target domains).

This is an excellent example of how domain adaptation in machine learning works when a new target variable presents itself.

When the source domain suddenly differs from the existing labeled data, can Artificial Intelligence still make the right choices?

It’s a hot topic in all AI-related fields.

The New York Times cited Elon Musk saying “that we’re headed toward a situation where AI is vastly smarter than humans.”

Humans have learned how to teach machines — but can a machine truly think and act like humans? What if they do outsmart us?

According to one of AI’s pioneers — Yoshua Bengio — there’s no need for concern as “we are very far from super-intelligent AI systems, and there may even be fundamental obstacles to get much beyond human intelligence.”

It’s for this exact reason here at AI.Reverie that we take data creation, enhancement, and labeling extremely seriously.

We offer astute synthetic data creation to produce superb target labels and enhanced datasets.

Abbreviations You’ll Regularly Encounter in Domain Adaptation Research

  • CNNs: Convolutional Neural Networks
  • ResNet and VGG: Network architectures
  • GANs: Generative models
  • ML: Machine Learning
  • DA: Domain Adaptation
  • SDA: Supervised Domain Adaptation
  • USDA: Unsupervised Domain Adaptation
  • MMD: Maximum Mean Discrepancy
  • CORAL: Correlation Alignment
  • CCD: Contrastive Domain Discrepancy

Terminology in Transfer Learning and Domain Adaptation

Source examples A Source example is the dataset that contains existing training datasets.
Multi source It’s when various domains are used simultaneously in training domain adaptation algorithms to create a hypothesis that presents the least errors in a new dataset or target label.
Source distribution & data distributions “A key challenge is to design an approach that overcomes the covariate and target shift both among the sources, and between the source and target domains.”arXiv:2006.12938v1

 

Source distribution and data distributions between X and Y should exploit the diversity of source distributions by tuning their weights to the target task at hand.” according to the arXiv paper.

Feature space In ML, all variables are called features. The feature space is the n-dimension where all source variables live.
Target labels & target dataset “Target: final output you are trying to predict, also known as Y. It can be categorical (sick vs. non-sick) or continuous (price of a house).

Label: true outcome of the target. In supervised learning, the target labels are known for the training dataset but not for the test.” StackExchange

What Is Transfer Learning?

Transfer learning gives us the ability to re-use existing datasets to perform new tasks. We use transfer learning when the task has a similar framework but with other potential outcomes, making it easy to re-apply the existing dataset.

Transfer learning came to life to address the genuine issue of not having enough labeled data at our disposal.

“Transfer learning, used in machine learning, is the re-use of a pre-trained model on an additional problem. In transfer learning, a machine exploits the knowledge gained from a previous task to improve generalization about another.“  – Builtin

Transfer learning is still in its infancy stages — and incorporating Deep Domain Adaption (DDA) holds brilliant promises for more accurate features.

Going back to the scenario with the astronauts, you can look at the flight simulator as the transfer learning — teaching the pilots with data from previous experiences and flight models.

What Is Domain Adaptation?

Deep Domain Adaptation is the next step to follow on transfer learning.

It’s what the astronauts do with taught information and data when presented with a different scenario in real-time. This can be referred to as domain adaptation in a new target domain.

“Deep domain adaptation allows us to transfer the knowledge learned by a particular DNN on a source task to a new related target task. We have successfully applied it in tasks such as image translation or style transfer. In some sense, deep domain adaptation enables us to get closer to human-level performance in terms of the amount of training data required for a particular new computer vision task.”  – Towards data science

What Is Unsupervised Domain Adaptation?

Let’s go back to our astronauts again.

They’re floating in outer space, and an unidentified mass hurtles towards them. During their training, they encountered such a problem, but now the variables all differ from the test data used in the simulator.

What they do from here is like unsupervised domain adaptation. They use the source data distribution available to them and apply it on a new domain adaptation.

As they are doing this, they’re also creating new datasets for future target distribution. Their responses contribute to future algorithm learning and domain adaptation for other networks.

“Unsupervised domain adaptation (UDA) is training a statistical model on labeled data from a source domain to achieve better performance on data from a target domain, with access to only unlabeled data in the target domain.”  – Timothy Miller for ACL Anthology

How to Achieve Domain Adaptation

According to the well known arXiv paper on source distribution in domain adaptation, there are many variables to consider to achieve top-quality domain adaptation through a source domain.

AI.Reverie uses state-of-the-art synthetic data to create various source examples to enhance intelligence gathering.

How to Achieve DDA

The more accurate your source (training data) and target datasets are, the better your domain adaptation will perform.

“The performance of a classifier trained on data coming from a specific domain typically degrades when applied to a related but different one. While annotating many samples from the new domain would address this issue, it is often too expensive or impractical.

Domain Adaptation has emerged as a solution to this problem; It leverages annotated data from a source domain, in which it is abundant, to train a classifier to operate in a target domain. It is sparse or even lacking altogether.

In this context, the recent trend comprises learning deep architectures whose weights we share for both domains, which essentially amounts to learning domain invariant features.”  – arXiv 

Training Data in Domain Adaptation

Existing source domains used as training data on new or related target data in domain adaptation need to be from a well-performing model.

Using synthetic training data is on the rise as it promises massive sets of perfectly generated training data for a fraction of the cost of manually sourced annotated data.

Pros & Cons of Deep Domain Adaptation in Machine Learning

“In the framework of domain adaptation, the problem is more complex as we draw test patterns from a target domain distribution different from the source-domain distribution of training samples. Getting a good adaptation requires an adequate modeling of the relationship between source and target domains.” – Lorenzo Bruzzone

According to a study performed and published in the IEEE Xplore, these are the primary factors relating to the benefits and disadvantages of DDA:

Pros of Deep Domain Adaptation

Cons of Deep Domain Adaptation

  • “Domain adaptation (DA) techniques aim to make possible the transfer of knowledge among different networks.” 

 

  • “Results show that, when the number of samples from the target domain is limited to a few dozen, DA approaches consistently outperform standard supervised ML techniques.”
  • “Supervised ML models rely on large representative training sets, which are often unavailable due to the lack of the necessary telemetry equipment or of historical data.”

 

  • “Unfortunately, the resulting models may be ineffective when applied to the current network, if the training data (the source domain) is not well representative of the network under study (the target domain).”

Our Contribution to Deep Domain Adaptation

AI.Reverie Synthetic Data

AI.Reverie helps you launch your computer vision with the exact synthetic source data you need in a fraction of the time.

We thrive on delivering a superb feature space so your extractions can be flawless.

Contact us for your organization’s: