Outlook | The "Data Bottleneck" of Artificial Intelligence
2024-04-10
In Asimov's classic science fiction novel "The Last Question," two drunken "programming apes" ask artificial intelligence a question: "How to significantly reduce the total entropy of the universe?" "There is insufficient data to answer." Artificial intelligence fails to answer this question in the first place. Although at the end of the novel, this artificial intelligence resembling a replica of ChatGPT delivers its answer sheet at the end of time, it has always been doing one thing throughout the entire process of life in the universe: collecting data. Data is the core resource on which artificial intelligence relies for development. The plot of the novel is certainly dramatic, but its content coincides with the reality of developing generative artificial intelligence. Currently, the "Hundred Model Battle" is in full swing, with leading companies competing to enter the field of artificial intelligence. However, the lack of effective data, especially the shortage of high-quality Chinese language materials and the closed data ecosystem in some fields, have hindered the development of artificial intelligence. How to solve the "data bottleneck" is a challenge that we will soon face or have already faced in the coming period. The "enclosure movement" of the data ocean, including coastal ports, urban neon lights, and interactive puppies... Recently, several videos generated by the American artificial intelligence cultural and biological video model Sora have quickly attracted the world's attention. Unlike "Wensheng Tu", Sora's videos are 60 seconds long and have rich motion changes, in which the interaction between objects and the depiction of physical laws have reached a level where they are almost indistinguishable from reality. From object interaction to mottled light and shadow, the transformation of pixels on the screen is breathtaking. Generative artificial intelligence like Sora is not made out of nothing. Unlike the previously familiar discriminative artificial intelligence, generative artificial intelligence is essentially a "simulator" generated from massive data based on large models and pre training. Chen Dingding, Dean of Haiguo Tuzhi Research Institute and Professor at Jinan University, believes that the rapid emergence of artificial intelligence achievements highly relies on a large amount of diverse data. Yin Ye, CEO of Huada Group, said that the development of artificial intelligence is not only driven by algorithm updates at the "ivory tower", but also by the massive data accumulation in open markets. The "violent aesthetics" based on massive data and ultra-high computing power is currently the core strategy of generative artificial intelligence, and it is also the key to the development of a group of enterprises represented by OpenAI. Simply put, under the same conditions, the more data is fed, the stronger the artificial intelligence. According to data, from GPT to GPT2 and then to GPT3, OpenAI increased the model parameters from 117 million to 1.5 billion, and then exploded to 175 billion, resulting in GPT3 having more than ten times the number of parameters compared to previous language models of the same type. As a fundamental component of the digital sea, massive and high-quality data competition has become a silent battlefield between countries and enterprises. The terms of use of products under OpenAI explicitly state that companies will retain the right to use interactive data. Universal based on digital technology
Edit: Responsible editor:
Source:
Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com