Teach AI to see the world from the perspective of "I"-瞭新社

Teach AI to see the world from the perspective of "I"

2021-11-01

In order to make artificial intelligence system interact with the world like human beings, the field of artificial intelligence needs to develop a new first person perception paradigm. This means that AI should understand daily activities from the first person perspective when moving and interacting in real time. The world is multidimensional, and the same scenery in life will show different forms from different perspectives. If we want to make artificial intelligence more like human beings, we must make its perspective closer to human beings. Viewing the environment from the human perspective, artificial intelligence may see a new world. Recently, the academic alliance composed of Facebook and 13 universities and laboratories in 9 countries announced that in November, it will open source the ego4d (egocentric 4D perception) project, which enables artificial intelligence to interact with the world from a first person perspective. The project contains more than 3025 hours of first person video, involving the daily lives of more than 700 participants from 73 cities. These videos will help make the way AI perceives the world more human. So, what kind of perspective does AI mainly use to recognize the world, and what impact will different perspectives have on AI cognitive environment? What are the main technologies through which artificial intelligence perceives the environment and knows the world? If you want to know the world more like human beings, what bottlenecks do AI need to break? Artificial intelligence usually adopts the third person perspective "To make artificial intelligence system interact with the world like human beings, the field of artificial intelligence needs to develop a new first person perception paradigm. This means that artificial intelligence should understand daily activities from the first person perspective when moving and interacting in real time." Christine Grumman, chief research scientist of Facebook, once said. Today's computer vision systems mostly use millions of photos and videos taken from a third person perspective to learn. "In order to build a new perception paradigm, we need to teach artificial intelligence to observe, understand and interact with the world from the first person perspective, that is, the 'I' perspective, just like human beings. This cognitive method can also be called self-centered cognition." on October 26, the director of the Artificial Intelligence Division of Yuanwang think tank Tan Mingzhou, Chief Strategic Officer of Turing robot, pointed out in an interview with science and technology daily. How to understand the first person and third person perspectives of artificial intelligence? Tan Mingzhou explained: "The first person perspective has a strong sense of substitution. For example, when playing a game, if you are immersive, the game picture you see is the picture you see in the real world. The third person perspective is also called God's perspective, as if you are floating around the character all the time. You can see the character itself and its surroundings. For example, you can see the cover behind the bunker under the third person perspective The situation in front of the bunker; but in the first person perspective, limited to the scope of perspective, only the bunker itself can be seen behind the bunker. " "Another example is automatic driving, if its vision system is only from the perspective of the bystander (such as the car) Collect data. Even through training through hundreds of thousands of vehicle moving images or videos seen from the bystander's perspective, AI may still not know how to do it, and it is difficult to reach the current level of automatic driving. Because the bystander's perspective is very different from that of sitting in front of the steering wheel in the car, the real driver's response from the first person perspective also includes braking These data cannot be collected from the perspective of bystanders, "Tan Mingzhou further said. "In the past, the Ai Community rarely collected data sets from the first person perspective. This project makes up for the shortcomings of the AI perspective system. In the future, the development of AR and VR is very important. If AI can observe and understand the world from the 'I' and the first person perspective, it will open a new era of immersive experience of human and AI," Tan Mingzhou pointed out. Christine Grumman also said: "the next generation of artificial intelligence systems need to learn from a completely different kind of data, that is, from the video showing the world through event center vision rather than edge vision." Building real world data sets At present, what is the "grasp" for artificial intelligence to perceive the environment, understand the world and establish a humanized cognitive system? Industry experts pointed out that history has proved that benchmarks and data sets are the key catalysts for innovation in the artificial intelligence industry. Today, computer vision systems that can recognize almost any object in an image are based on data sets and benchmarks, which provide researchers with an experimental platform for studying real-world images. "The project Facebook released recently is actually to build a data set to train artificial intelligence models to be more like humans. It has developed five benchmark challenges around the first person visual experience, that is, to disassemble the first person perspective into five goals and carry out corresponding training set competitions," Tan Mingzhou pointed out. The five benchmarks of ego4d are: episodic memory, when will it happen? Prediction, what I may do next? Hand object interaction, what am I doing? Audio-visual diary, who said what at what time? Social interaction, who is interacting with whom? Tan Mingzhou stressed that the above benchmark will promote the research on the building blocks necessary for the development of artificial intelligence assistants. Artificial intelligence assistants can not only understand and interact with instructions in the real world, but also understand and interact with instructions in the meta universe in the meta universe. In order to build this data set, the University team working with Facebook distributed ready-made head mounted cameras and other wearable sensors to research participants to capture first person, unedited daily life videos. The focus of the project is that participants capture videos from daily scenes, such as shopping, cooking, chatting while playing games, and communicating with family and friends Other group activities, etc. The video captures the objects that the camera wearer chooses to watch in a specific environment and how the camera wearer interacts with people and objects from a self-centered perspective. So far, the camera wearer has carried out hundreds of activities and interacted with hundreds of different objects. All the data of the project are public. "Facebook's research can accelerate the progress of self-centered cognition research in the field of artificial intelligence, which will have a positive impact on our future ways of life, work and entertainment," said Tan Mingzhou. Make AI cognitive ability more like human beings The ultimate goal of the development of artificial intelligence is to benefit mankind and enable us to cope with the increasingly complex challenges in the real world. Imagine that AR equipment can accurately display how to play the piano, play chess, hold a pen and outline in piano, chess, book and painting classes; vividly guide housewives to bake barbecue and cook dishes according to recipes; forgetful old people use the hologram in front of them Help me remember the past Facebook emphasizes that it hopes to open up a new path for academic and industry experts through the ego4d project to help build a more intelligent, flexible and interactive computer vision system. With the deeper understanding of human daily life style by artificial intelligence, it is believed that this project can contextualize and personalize the experience of artificial intelligence in an unprecedented way. However However, the current research only touches the surface of self-centered cognition. How can we make the cognitive ability of artificial intelligence more like human beings? "The first is attention. The attention mechanism of artificial intelligence is closer to intuition, while human attention is selective. At present, most of the attention mechanisms of artificial intelligence repeatedly tell AI what to pay attention to and what is related during training. In the future, people participating in the experiment may wear special ones to catch eye attention Point device to further collect relevant data, "Tan Mingzhou pointed out. "Second, we also need to define the behavior of artificial intelligence based on the relationship between events and behaviors. The occurrence of an event includes multiple behaviors. We should train the artificial intelligence system in the way of human feedback to make the behavior of artificial intelligence consistent with our intention." Tan Mingzhou further said. Tan Mingzhou stressed: "in addition, cooperation, response and linkage are also needed between hearing and vision, language and behavior, which requires the construction of a multi-modal interaction model, in-depth research on why the perspective focuses on investment and is combined with intention recognition to form a linkage mechanism with behavior." (Xinhua News Agency)

Edit： Responsible editor：

Source：

Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email：lwxsd@liaowanghn.com

Return to home page Return to list