Where we stand at the moment and what will be the future of machine learning on devices in 2020

Machine learning on devices is now developing more and more. Apple mentioned this about a hundred times during WWDC 2019. It's no wonder developers want to add machine learning to their applications.
However, many of these learning models are used only to draw conclusions based on a limited set of knowledge. Despite the term “machine learning”, no learning takes place on the device; knowledge is inside the model and does not improve over time.

Face Detection Technology is available in Smartphones now.

The reason for this is that model training requires a lot of computing power, while mobile phones are not yet capable of this. It’s much easier to train models offline on the server farm, and include all model improvements in the application update.
It is worth noting that training on the device makes sense for some applications, and we believe that over time, such training of models will become as familiar as using models for forecasting. In this context, we want to explore the possibilities of this technology.

Machine Learning Today

The most common applications of deep and machine learning are
•Search Engines like Google, Bing, Yandex etc
•Virtual Personal Assistants like Amazon Alexa, Apple's Siri, Google Now and
•Microsoft's Cortana
•Applications that cannot be programmed
•Self Driving Cars
•Database Mining for growth of automation
•Dynamic Pricing
•Spam Detector
•Google Translate
•Photo tagging Applications
•Online Video Streaming
•Fraud Detection

A modern phone has many different sensors and a fast Internet connection, which leads to a lot of data available for models.iOS uses several models of deep learning on devices: face recognition on photos, phrases " Hello, Siri " and handwritten Chinese characters . But all of these models do not learn anything from the user. Almost all machine learning APIs (MPSCNN, TensorFlow Lite, Caffe2) can make predictions based on user data, but you cannot force these models to learn new from this data.

Now training takes place on a server with a large number of GPUs. This is a slow process requiring a lot of data. The convolutional neural network, for example, is trained on thousands or millions of images. Learning such a network from scratch will take several days on a powerful server, several weeks on a computer and for ages on a mobile device.

Training on the server is a good strategy if the model is updated irregularly and each user uses the same model. The application receives a model update each time the application is updated in the App Store or when new parameters are periodically downloaded from the cloud.
Now training large models on the device is impossible, but it will not always be so. These models should not be large. And most importantly: one model for all may not be the best solution.

Why do I need training on the device?

There are several benefits of learning on the device:

•An application can learn from data or user behavior.
•Data will remain on the device.
•Transferring any process to a device saves money.
•The model will be trained and updated continuously.

This solution is not suitable for every situation, but there are some applications for it. I think that its main advantage is the ability to fit the model to a specific user.

On iOS devices, some applications already do this:

•The keyboard learns from the texts you type and makes assumptions about the next word in the sentence. This model is trained specifically for you, and not for other users. Since training takes place on the device, your messages are not sent to the cloud server .

•The Photos app automatically organizes images into the People album. I'm not quite sure how this works, but the program uses the face recognition API in the photo and places similar faces together. Perhaps this is just uncontrolled clustering, but training should still happen, since the application allows you to correct its errors and is being improved based on your feedback. Regardless of the type of algorithm, this application is a good example of customizing user experience based on their data.

•Touch ID and Face ID learn based on your fingerprint or face. Face ID continues to learn over time, so if you grow a beard or start wearing glasses, it will still recognize your face.

•Motion Detection. Apple Watch explores your habits, such as changing your heartbeat during various activities. Again, I do not know how this works, but obviously, training should take place.

•Clarifai Mobile SDK allows users to create their own classification models for images using photographs of objects and their designations. Typically, the classification model requires thousands of images for training, but this SDK can learn just a few examples. The ability to create image classifiers from your own photos without being an expert in machine learning has many practical uses.

Some of these tasks are easier than others. Often “learning” is simply remembering the last action of the user. For many applications, this is sufficient, and it does not require fancy machine learning algorithms.

The keyboard model is quite simple, and training can take place in real time. The Photo application learns more slowly and consumes a lot of energy, so training occurs when the device is charging. Many practical applications of learning on the device are between these two extremes.
Other existing examples include spam detection (your email client learns from emails that you define as spam), text correction (it examines your most common typing errors and corrects them), and smart calendars like Google Now that learn Recognize your regular activities.

How Far Can We Go?

If the purpose of learning on the device is to adapt the machine learning model to the needs or behavior of specific users, then what can we do about it?
Here's a fun example: a neural network turns drawings into emojis. She asks you to draw several different shapes and teaches the model to recognize them. This application is implemented on the Swift Playground, not the fastest platform. But even under such conditions, the neural network does not learn for long - it takes only a few seconds on the device ( this is how this model works ).

If your model is also not very complex, like this two-layer neural network, you can already conduct training on the device.

Note: on iPhone X, developers have access to the low-resolution 3D model of the user face. You can use this data to train a model that selects emojis or other actions in an application based on the facial expressions of users.

Here are a few other future features:

•-Smart Reply is a model from Google that analyzes an incoming message or letter and offers a suitable answer. She has not yet been trained on the device and recommends the same answers to all users, but (in theory) she can be trained on user texts, which will significantly improve the model.

•-Handwriting recognition that will learn exactly from your handwriting. This is especially useful on the iPad Pro with Pencil. This is not a new feature, but if you have the same bad handwriting as mine, then the standard model will make too many mistakes.

•-Speech recognition, which will become more accurate and tailored to your voice.

•Sleep tracking / fitness apps. Before these apps give you tips on improving health, they need to get to know you. For security reasons, this data is best left on the device

•-Personalized models for dialogue. We have yet to see the future of chatbots, but their advantage is that the bot can adapt to you. When you talk with a chatbot, your device will study your speech and preferences and change the chatbot's answers to your personality and manner of communication (for example, Siri can learn to give less comments).

•-Improved advertising. No one likes advertising, but machine learning can make it less annoying for users and more profitable for advertisers. For example, an ad SDK can learn how often you look and click on an ad, and find the most appropriate ad for you. An application can train a local model that will only request ads that work for a specific user.

•-Recommendations are a common use of machine learning. The podcast player can learn from the programs you listened to for advice. Now applications perform this operation in the cloud, but this can be done on the device.

•-Applications for people with disabilities can help them navigate the space and better understand it. I do not understand this, but I can imagine that applications can help, for example, distinguish between different drugs using the camera.

These are just a few ideas. Since all people are different, machine learning models could adapt to our specific needs and desires. Training on the device allows you to create a unique model for a unique user.

Different scenarios of model training
Before applying the model, you need to train it. Training needs to be continued further to improve the model.

There are several training options:

•Lack of training on user data. Collecting your own data or using publicly available data to create a single model. When improving the model, you release an application update or simply upload new parameters to it. Most existing machine learning applications do this.

•Centralized training. If your application or service already requires data from the user that is stored on your servers, and you have access to them, then you can carry out training based on this data on your server. User data can be used for training for a specific user or for all users. This is what platforms like Facebook do. This option raises questions related to privacy, security, scalability, and many others. The issue of privacy can be solved by Apple’s “selective privacy” method, but it also has its consequences .

•Collaborative learning. This method transfers training costs to the users themselves. Training takes place on the device, and each user trains a small part of the model. Model updates are sent to other users, so that they can learn from your data, and you from them. But this is still a single model, and all end up with the same parameters. The main advantage of such training is its decentralization . In theory, this is better for privacy, but, according to research , this option may be worse.

•Each user learns his own model. In this option, I personally am most interested. The model can learn from scratch (as in the example with drawings and emoji) or it can be a trained model that is customized to your data. In any case, the model can be improved over time. For example, a keyboard starts with a model already trained in a particular language, but over time learns to predict which sentence you want to write. The downside of this approach is that other users cannot benefit from it. So this option only works for applications that use unique data.

How to carry out training on the device?

It is worth remembering that training on user data is different from training on a large amount of data. The initial keyboard model can be trained on a standard body of texts (for example, on all Wikipedia texts), but a text message or letter will be written in a language that differs from a typical Wikipedia article. And this style will be different from user to user. The model should include these kinds of variations.
The problem also is that our best deep learning methods are quite inefficient and rude. As I said, training an image classifier can take days or weeks. The learning process, stochastic gradient descent, goes through small stages. There can be a million images in a data set, each of which a neural network will scan about a hundred times.

Obviously, this method is not suitable for mobile devices. But often you do not need to train the model from scratch. Many people take an already trained model and then use transfer learning based on their data. But these small data sets still consist of thousands of images, and even so learning is too slow.

With our current teaching methods, setting up models on the device is still far away. But not all is lost. Simple models can already be trained on the device. Classical machine learning models such as logistic regression, decision tree, or naive Bayesian classifier can be quickly trained, especially when using second-order optimization methods such as L-BFGS or conjugate gradient. Even a basic recurrent neural network must be available for implementation.

For the keyboard, the online learning method may work. You can conduct a training session after a certain number of words typed by the user. The same applies to models using an accelerometer and motion information, where the data comes in a constant stream of numbers. Since these models are trained on a small part of the data, each update should occur quickly. Therefore, if your model is small and you do not have much data, then training will take seconds. But if your model is larger or you have a lot of data, then you need to be creative. If a model studies the faces of people in your photo gallery, it has too much data to process, and you need to find a balance between the speed and accuracy of the algorithm.

Here are some more problems that you will encounter while learning on the device:

•Large models. For deep learning networks, current learning methods are too slow and require too much data. Many studies are now devoted to training models on a small amount of data (for example, on one photo) and in a small number of steps. I’m sure that any progress will lead to the spread of training on the device.

•Multiple devices. You are probably using more than one device. The issue of transferring data and models between user devices remains to be resolved. For example, the “Photos” application in iOS 10 does not transmit information about people's faces between devices, therefore it is trained on all devices separately.

•Application Updates. If your application includes a trained model that adapts to user behavior and data, then what happens when you update the model with the application?

Learning on the device is still at the beginning of its development, but it seems to me that this technology will inevitably become important in creating application

The Future Trends of Machine Learning in 2020

Amram David

Senior Contributor at DFI Club
Amram is a technical analyst and partner at DFI Club Research, a high-tech research and advisory firm .He has over 10 years of technical and business experience with leading high-tech companies including Huawei,Nokia,Ericsson on ICT, Semiconductor, Microelectronics Systems and embedded systems.Amram focuses on the business critical points where new technologies drive innovations.
Amram David