Meta makes AI video computing cost drop by 95%. AI can guess the original picture even if the picture is covered by half

2022-07-05

According to IEEE spectrum, a foreign media, researchers from meta published a series of new papers on MAE (masked auto encoder). Mae system can predict the missing part of the data through SSL Technology (self supervised learning), and then restore the incomplete text, image, video and audio. The general principle of MAE system restoring different types of files is to predict the missing content according to the existing information, and then make up with other data. Through this technology, AI may be able to automatically annotate the ground truth without manual annotation. This means that the learning efficiency of AI model has been greatly improved, which may bring new ideas for the future development of AI model. 1、 The essence of intelligence is predictive ability. SSL technology can improve AI intelligence The MAE system uses SSL Technology (self supervised learning). SSL refers to a technology that the annotation used for machine learning comes from the data itself, not from manual annotation. Mae system can predict the missing parts from the very scattered incomplete data, so as to restore images, video and audio. This is the process of MAE system building "world models". Yang, the chief AI scientist of meta? Yann Lecun said, "SSL technology is a prerequisite for AI systems to build 'world models'. Only with SSL function can AI have rationality and common sense, acquire the ability of knowledge migration and adapt to different environments like human beings." Yang? Lequin said that if the MAE system can predict the missing part of the data, it means that AI can understand that the world is three-dimensional and has a certain degree of resolution, so it is possible to predict people's complex behavior. Yang? Yann Lecun told foreign media IEEE spectrum: "we want to create AI models that can learn autonomously like animals and humans." Yang? Lequin believes that the essence of intelligence is a kind of predictive ability. This view was recognized by yoshua bengio, the 2018 Turing prize winner, who also believes that the ability to reason and predict the world is the key to intelligence. ▲ on the left is the training chart provided for Mae model, in the middle is the prediction result, and on the right is the original chart 2、 A new way to play crossword puzzles? AI helps you complete the picture Ross, a researcher in the AI Department of meta? Ross girshick co authored a paper on the principle of MAE system. It is mentioned in the paper that meta's Mae system is based on a neural network algorithm called transformer. Transformer is a kind of neural network algorithm based on attention mechanism. This algorithm can reduce the dependence of AI model on external information, capture the internal relationship of data or features, and optimize the model training results. ▲ paper on MAE principle When dealing with text data, MAE system will detect a text database lacking some data. After the MAE system detects these missing texts, it will supplement the missing contents with new text blocks. This technology can also be transferred to the processing of still images in Mae system. Researchers decompose the image into multiple patch blocks, and then let the MAE system make up for the missing image. Ross? Ross girshick said that this was inspired by Google's vision transformer. The basic principle of Vit model (vision transformer) is to apply transformer architecture to the field of computer vision. Specifically, the vit model can divide the picture into patch blocks of the same size, encode each patch block, and then form an image sequence. The machine can recognize this image sequence. Based on this inspiration, when Mae system predicts the missing image, it will decompose the image into many small patch blocks, and then fill the missing content with new patch blocks. 3、 The information density of text and image is different, and the experimental result of covering 75% of the image is the best The team found that because the information density of text and image is different, the proportion of data covered by text and image to get the best restoration effect is also different. When the MAE system restores a still image, the best result will be obtained by masking 75% of the data. But for the text, the figure is 15%. ▲ researchers found that the experimental results of covering 75% of the images were the best Language is a highly semantic and information intensive symbol generated by human beings. Each character contains many meanings. If too many words are lost in the sentence, the MAE model will predict many results, and the accuracy is not high. Correspondingly, images are natural symbols with a large amount of spatial redundancy. For example, on the same picture, the pixel characteristics of pictures with similar regions are not different, so the lost picture information can be recovered from adjacent image blocks through the model. Ross? Giershick explained that the MAE system consists of two working steps. First, the MAE system will use the encoder to learn the relationship between pixels through the data set. Then, the MAE system will use the decoder to reconstruct the original image from the mask. After these two parts are completed, the MAE system will discard the encoder and use the decoder for visual tasks such as classification and target detection. Ross? Gilsick said, "the decoder of MAE system can complete tasks such as object recognition, which is a great gain for us." This means that through the MAE system, the machine can automatically mark the ground truth for the data without manually marking the data. 4、 Mae system can save 95% of video computing cost When Mae system is used to process video, researchers will cover 95% of the data information in each frame. There is a high similarity between frames of video, which means that video has more information redundancy than static images. Meta researcher Christoph feichtenhofer said that through this method, MAE system can reduce the computing cost by 95%, which is a major advantage of MAE system in Video Computing. He also said that this technology may be used for content review and task classification on Facebook and instagram. For AI learning of audio, meta AI team found a clever method. They convert audio files into spectrograms, in other words, they convert sound into images. Then they will use the same processing method as the image to cover up the patch of the spectrogram before training. Although the model can only process audio clips for a few seconds at present, it has achieved good results. Bernie Huang, a staff member of the audio system, said that the potential applications of this technology in audio include audio classification, improving voice calls, and better finding ways to compress audio files. ▲ Mae framework Conclusion: Mae system may have more application space, but accuracy should be carefully considered Mae system can predict the missing part of incomplete data, and then restore text, pictures, video and audio. This technology has great imagination space and application potential, such as restoring photos of archaeological relics, making up for historical documents lost data, etc. Mae system may not only make breakthroughs in AI field, but also surprise other fields. However, MAE model also has shortcomings. Based on the current experiment, the accuracy cannot reach 100%, and the model may generate non-existent content. When using Mae model to restore data, people need to carefully consider and study these problems. (Xinhua News Agency)

Edit:Li Jialang    Responsible editor:Mu Mu

Source:ithome.com

Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com

Return to list

Recommended Reading Change it

Links

Submission mailbox:lwxsd@liaowanghn.com Tel:020-817896455

粤ICP备19140089号 Copyright © 2019 by www.lwxsd.com.all rights reserved

>