Understand the principles and advantages of transfer learning at the application level

Two years ago, Wu Enda said on the NIPS 2016 Tutorial, "After supervised learning, transfer learning will lead the next wave of commercialization of machine learning technology." In reality, new scenarios are constantly appearing, and transfer learning can help us better. Deal with the new scenes you encounter. What are the advantages of transfer learning that can become the new focus of current machine learning algorithms? This article will compare with deep learning to let everyone understand the principles and advantages of transfer learning at the application level.

Preface

Deep learning has made great progress on many problems that are difficult to solve by other methods. The success of deep learning is attributed to several key differences between it and traditional machine learning, which makes it shine when dealing with unstructured data. Today, deep learning models can play games, detect cancer, talk to humans, and drive cars.

However, the several differences that make deep learning powerful also make it costly. You may have heard that the success of deep learning requires huge amounts of data, expensive hardware facilities, and even more expensive elite engineering talents. Now, some companies are beginning to be particularly excited about innovative ideas and technologies that can reduce costs. For example, multi-task learning, which is a method that allows machine learning models to learn from multiple tasks at once. One of the benefits of this method is that it can reduce the amount of training data required.

In this article, we will introduce transfer learning, a machine learning method that allows knowledge to be transferred from one task to another. Migration learning does not need to re-develop a fully customized solution for your problem, but allows you to transfer knowledge from related problems to help you solve your customized problems more easily. By transferring knowledge, you can take advantage of the expensive resources needed to acquire this knowledge, including training data, hardware equipment, and researchers, and these costs do not need to be borne by you. Let's take a look at when and how transfer learning works.

The difference between deep learning and traditional machine learning

Transfer learning is not a new technology, it is not specifically for deep learning, but given the recent progress in deep learning, it is very exciting. So first, we need to clarify how deep learning is different from traditional machine learning.

â–ŒDeep learning for low-level abstraction

Machine learning is a way for machines to automatically learn to assign predicted values ​​or labels to numerical inputs (ie data). The difficulty here is how to determine exactly this function so that it can generate output for a given input. Without adding any restrictions to the function, the possibilities (complexity) are endless. In order to simplify this task, depending on the type of problem we are solving, expertise in related fields, or simple trial-and-error methods, we usually impose a certain type of structure on the function. A structure defines a certain type of machine learning model.

In theory, there are an infinite number of possible structures to choose from, but in practice, most machine learning use cases can be solved by applying one of a few structures: linear models, combinatorial classifiers of trees, and support vectors The machine is the core of it. The job of a data scientist is to select the correct structure from the possible structures of this group. As black box objects, these models can be obtained from many mature machine learning libraries and can be trained with just a few lines of code. For example, you can use Python's scikit-learn library to train a random forest model as follows:

Or use R to train a linear regression model:

The difference is that deep learning runs at a lower level. Deep learning does not choose from a small group of model structures, but allows developers to form arbitrary structures. Building blocks are modules or layers, which can be thought of as basic basic data conversion. This means that when we apply deep learning, we need to open the black box to understand the data conversion, rather than treating it as a bunch of parameters fixed by the algorithm.

This approach allows us to build more powerful models, but at the same time it also adds a whole new challenge to the entire model building process. Although the deep learning community has published a lot of research, there are practical deep learning guides everywhere, or some experience talks, how to effectively combine these data transformations is still a very difficult process.

Below we consider an extremely simple convolutional neural network image classifier, here is a popular deep learning library PyTorch to define.

Because we are using low-level building blocks, we can easily change a single part of the model (for example, change F.relu to F.sigmoid). Doing so can get a completely new model architecture, which may produce completely different results, and its possibilities are, no exaggeration, endless.

â–ŒDeep learning is not yet fully understood

Even given a fixed neural network architecture, training it is notoriously extremely difficult. First of all, the loss function of deep learning is usually not a convex function, which means that training does not necessarily produce the best possible solution. Second, deep learning is still a very new technology, and many of its components are still not fully understood. For example, Batch Normalization has received much attention recently because it seems that including it in some models is the key to good results, but experts cannot agree on the reasons. Researcher Ali Rahimi even compared deep learning to alchemy at a machine learning conference recently, sparking a controversy.

â–ŒAutomatic feature engineering

The complexity of deep learning has promoted the development of a technology called representation learning, which is why it is often said that neural networks do "automatic feature engineering". To put it simply, we do not let humans manually extract effective features from the data set, but build a model so that the model can learn by itself which features are needed and useful for the current task. It is very effective to delegate the task of feature engineering to the model, but the price is that the model requires a huge amount of data and therefore requires huge computing power.

â–ŒWhat can you do?

Compared with other machine learning methods, deep learning is too complex and it seems impossible to integrate it into your business. For organizations with limited resources, this feeling is even stronger.

For those organizations that need to be at the forefront, they may indeed need to hire experts and purchase professional hardware facilities. But in many cases this is not necessary. There are ways that you can effectively apply deep learning techniques without making a lot of investment. This is where transfer learning can make a big difference.

Transfer learning can transfer knowledge from one machine learning model to another. These models may be the result of years of research on the model structure, training of the model with a considerable number of data sets, and years of computing time to optimize the model. With transfer learning, you do not need to bear any of the costs mentioned above to get most of the benefits of this work!

What is transfer learning

Most machine learning tasks start with zero knowledge, meaning that its structure and model parameters start from random guesses. The same is true when we say that the model is learned from scratch.

Randomly guess a cat detection model to start training. Through many different cats it has seen, the model integrates the same pattern from them, and gradually learns what a cat is.

In this case, everything the model learns comes from the data you show it. But is this the only way to solve the problem? In some cases, it seems so.

The cat detection model is likely to be useless in unrelated applications, such as fraud detection. It only knows how to deal with pictures of cats, not credit card transactions.

But in some cases, it seems that we can share information between different tasks.

The cat detection model is very useful in related tasks, such as the location of the cat's face. The detector already knows how to detect cat's beard, nose, and eyes, all of which are very useful for locating the cat's face.

This is the essence of transfer learning: use a model to learn how to complete a task well, and transfer some or all of its knowledge to a related task.

If you think about our own learning experience, you will find that this is actually very reasonable: we often transfer the skills we have learned in the past so that we can learn new skills more quickly. For example, a person who has learned to throw a baseball can learn how to throw a football without relearning the mechanics of throwing things. These tasks are interlinked in nature, and if you can handle one of them, you can naturally transfer the learned ability to another.

In the field of machine learning, perhaps the best example of the past 5 years is the field of computer vision. Few people now train an image model from scratch. Instead, we will start with a pre-trained model that already knows how to distinguish simple objects such as cats, dogs, and umbrellas. The model that learns to distinguish images first learns how to detect some common image features, such as edges, shapes, text, and faces. The pre-trained model has the above basic skills (there are more specific skills, such as the ability to distinguish between dogs and cats).

At this point, the pre-trained classification model can inherit the basic skills acquired at a huge cost by adding layers or retraining on a new data set, and then extend it to new tasks. This is transfer learning.

The benefits of this approach are obvious.

â–ŒThe amount of training data required for migration learning is smaller

When you reuse your favorite cat detection model in a new task related to cats, your model already has the "wisdom of a million cats", which means you don't need to use that many cats anymore Pictures to train new tasks. Reducing the amount of training data allows you to train even when there is little data, or the cost of obtaining more data is too high or it is impossible to obtain more data, and it also allows you to train faster on cheaper hardware facilities. Train the model locally.

â–ŒMigration learning training model has stronger generalization ability

Transfer learning can improve the generalization ability of the model, or enhance its ability to classify well on non-training data. This is because when training the pre-training model, it is purposeful to allow the model to learn common features that are useful for related tasks. When the model migrates to a new task, it will be difficult to fit the new training data, because it will only continue to learn from a very general knowledge base. Building a model with strong generalization ability is one of the most difficult and important parts of machine learning.

â–ŒThe transfer learning training process is more robust

Starting with a pre-trained model, you can also avoid training a complex model with millions of parameters. This process is very frustrating, very unstable, and confusing. Transfer learning can reduce the number of trainable parameters by up to 100%, making training more stable and easier to debug.

â–ŒMigration learning lowers the entry barrier for deep learning

Finally, transfer learning lowers the barrier to deep learning, because you don't need to be an expert to get expert-level results. For example, the popular image classification model Resnet-50, how was this specific structure chosen? This is the result of many years of research and experimentation by many deep learning experts. This complex structure contains 25 million weights. If there is no in-depth understanding of the various components in this model, optimizing these weights from scratch can be said to be an almost impossible task. Fortunately, with transfer learning, you can reuse this complex structure and these optimized weights, thus significantly reducing the entry barrier for deep learning.

What is multi-task learning?

Transfer learning is one of the knowledge sharing techniques used to train machine learning models, and it has been proven to be very effective. At present, the two most interesting knowledge sharing technologies are transfer learning and multi-task learning. In transfer learning, the model is first trained in a single task, and then it can be used as the starting point for related tasks. When learning related tasks, the original transferred model will learn how to deal with new tasks specifically, without worrying about whether it will affect its effect on the original task. In multi-task learning, a single model learns to handle multiple tasks at once, and the performance evaluation of the model depends on how well it can complete these tasks after learning. In the follow-up, we will also analyze and discuss the benefits of multi-task learning and when it will work.

in conclusion

Transfer learning is a knowledge sharing technology, which can reduce the dependence on training data, computing power, and engineering talents when constructing deep learning models. Since deep learning can provide significant improvements over traditional machine learning, transfer learning has become an indispensable tool.

I want to know more about how algorithms such as machine learning, deep learning, reinforcement learning, and migration learning play a role in specific applications and businesses. We will share more at the 2018 AI Developer Conference. Interested students must pay attention to us!

Solar Panels

Solar Panels,182Mm Solar Panel,166Mm 9Bb Solar Panel,Solar Panel Efficiency

Jiangxi Huayang New Energy Co.,Ltd , https://www.huayangenergy.com