https://ai.googleblog.com/2018/06/improving-deep-learning-performance.html
The success of deep learning in computer vision can be partially
attributed to the availability of large amounts of labeled training data
— a model’s performance typically improves as you increase the quality,
diversity and the amount of training data. However, collecting enough
quality data to train a model to perform well is often prohibitively
difficult. One way around this is to hardcode image symmetries into
neural network architectures so they perform better or have experts
manually design data augmentation methods, like rotation and flipping,
that are commonly used to train well-performing vision models. However,
until recently, less attention has been paid to finding ways to
automatically augment existing data using machine learning. Inspired by
the results of our AutoML efforts to
design neural network architectures and
optimizers
to replace components of systems that were previously human designed,
we asked ourselves: can we also automate the procedure of data
augmentation?
In “
AutoAugment: Learning Augmentation Policies from Data”, we explore a
reinforcement learning
algorithm which increases both the amount and diversity of data in an
existing training dataset. Intuitively, data augmentation is used to
teach a model about image invariances in the data domain in a way that
makes a neural network invariant to these important symmetries, thus
improving its performance. Unlike previous state-of-the-art deep
learning models that used hand-designed data augmentation policies, we
used reinforcement learning to find the optimal image transformation
policies from the data itself. The result improved performance of
computer vision models without relying on the production of new and ever
expanding datasets.
Augmenting Training Data
The idea behind data augmentation is simple: images have many symmetries
that don’t change the information present in the image. For example,
the mirror reflection of a dog is still a dog. While some of these
“invariances” are obvious to humans, many are not. For example, the
mixup
method augments data by placing images on top of each other during
training, resulting in data which improves neural network performance.
|
Left: An original image from the ImageNet
dataset. Right: The same image transformed by a commonly used data
augmentation transformation, a horizontal flip about the center. |
AutoAugment is an automatic way to design custom data
augmentation policies for computer vision datasets, e.g., guiding the
selection of basic image transformation operations, such as flipping an
image horizontally/vertically, rotating an image, changing the color of
an image, etc. AutoAugment not only predicts what image transformations
to combine, but also the per-image probability and magnitude of the
transformation used, so that the image is not always manipulated in the
same way. AutoAugment is able to select an optimal policy from a search
space of 2.9 x 10
32 image transformation possibilities.
AutoAugment learns different transformations depending on what dataset it is run on. For example, for images involving
street view of house numbers
(SVHN) which include natural scene images of digits, AutoAugment
focuses on geometric transforms like shearing and translation, which
represent distortions commonly observed in this dataset. In addition,
AutoAugment has learned to completely invert colors which naturally
occur in the original SVHN dataset, given the diversity of different
building and house numbers materials in the world.
|
Left: An original image from the SVHN
dataset. Right: The same image transformed by AutoAugment. In this
case, the optimal transformation was a result of shearing the image and
inverting the colors of the pixels. |
On
CIFAR-10 and
ImageNet,
AutoAugment does not use shearing because these datasets generally do
not include images of sheared objects, nor does it invert colors
completely as these transformations would lead to unrealistic images.
Instead, AutoAugment focuses on slightly adjusting the color and hue
distribution, while preserving the general color properties. This
suggests that the actual colors of objects in CIFAR-10 and ImageNet are
important, whereas on SVHN only the relative colors are important.
|
Left: An original image from the ImageNet
dataset. Right: The same image transformed by the AutoAugment policy.
First, the image contrast is maximized, after which the image is
rotated. |
Results
Our AutoAugment algorithm found augmentation policies for some of the
most well-known computer vision datasets that, when incorporated into
the training of the neural network, led to state-of-the-art accuracies.
By augmenting ImageNet data we obtain a new state-of-the-art accuracy of
83.54% top1 accuracy and on CIFAR10 we achieve an error rate of 1.48%,
which is a 0.83% improvement over the default data augmentation designed
by scientists. On SVHN, we improved the state-of-the-art error from
1.30% to 1.02%. Importantly, AutoAugment policies are found to be
transferable — the policy found for the ImageNet dataset could also be
applied to other vision datasets (
Stanford Cars,
FGVC-Aircraft, etc.), which in turn improves neural network performance.
We are pleased to see that our AutoAugment algorithm achieved this level
of performance on many different competitive computer vision datasets
and look forward to seeing future applications of this technology across
more computer vision tasks and even in other domains such as audio
processing or language models. The policies with the best performance
are included in the
appendix of the paper, so that researchers can use them to improve their models on relevant vision tasks.
Acknowledgements
Special thanks to the co-authors of the paper Dandelion Mane, Vijay
Vasudevan, and Quoc V. Le. We’d also like to thank Alok Aggarwal,
Gabriel Bender, Yanping Huang, Pieter-Jan Kindermans, Simon Kornblith,
Augustus Odena, Avital Oliver, and Colin Raffel for their help with this
project.
No comments:
Post a Comment