Zhou Bowen, Director of Shanghai Artificial Intelligence Laboratory: Exploring AI-45 ° Balance Law, Balancing Safety and Performance
2024-07-04
On July 4th, 2024, the World Artificial Intelligence Conference and High Level Conference on Artificial Intelligence Global Governance (WAIC 2024) opened in Shanghai. Zhou Bowen, Director and Chief Scientist of Shanghai Artificial Intelligence Laboratory, Chair Professor of Huiyan at Tsinghua University, and Founder of Zhuoyuan Technology, delivered a speech at the WAIC 2024 Plenary Session. Can AI security and performance be achieved simultaneously? In his speech, he proposed a technical proposition: to explore the 45 ° balance law of artificial intelligence - Towards AI-45 ° Law, which is a technology system that prioritizes AI security while ensuring long-term development of AI performance. The following is the full text of the speech: Dear leaders and guests, good morning. It is an honor to share with you the cutting-edge technology topics of artificial intelligence security at the WAIC conference and in Shanghai. I would like to propose a technical proposal: exploring the 45 ° balance law of artificial intelligence - TowardsAI-45 ° Law. Currently, generative artificial intelligence represented by large models is developing rapidly, but with the continuous improvement of capabilities, the models themselves and their applications have also brought a series of potential risk concerns. From the perspective of public attention to AI risks, the first is the content risks related to data leakage, abuse, privacy, and copyright; Secondly, malicious use brings risks of forgery, false information, and other related usage; Of course, it also triggers ethical issues such as prejudice and discrimination; In addition, there are also concerns about whether artificial intelligence will pose challenges to social systemic issues such as employment structure. In a series of science fiction movies about artificial intelligence, there are even settings such as AI losing control and humans losing autonomy. The risks brought by AI have begun to emerge, but more of them are potential risks. Preventing these risks requires joint efforts from all sectors and the scientific community to make more contributions. Last May, hundreds of AI scientists and public figures from around the world jointly signed an open letter titled "Statement of AI Risk", expressing concerns about AI risks and calling for prevention of risks brought about by artificial intelligence to be a global priority, just like other large-scale risks such as pandemics and nuclear war. The fundamental reason for the concerns about these risks is that our current AI development is imbalanced. Let's first take a look at the current trend of AI development: under the basic model architecture represented by Transformer, combined with the scaling law of big data - number of big parameters and big computing, AI performance is currently growing exponentially. In contrast, typical technologies in the AI security dimension, such as red team testing, safety signage, safety barriers, and evaluation measurements, exhibit fragmented, fragmented, and post positioned characteristics. Recent alignment techniques have balanced performance and security. For example, supervised fine-tuning SFT, human feedback reinforcement learning RLHF and other technologies, RLAIF, SuperAlignment, etc. These methods help to transmit human preferences to AI, leading to the emergence of exciting AI systems such as ChatGPT and GPT-4, as well as the scholar intern model in our Shanghai AI laboratory, and so on. Although aiming to improve both safety and performance, these methods often prioritize performance in practical use. So overall, our improvement in the security capability of AI models lags far behind the improvement in performance, and this imbalance leads to the development of AI being limp, which we call Crippled AI. Behind the imbalance lies the huge difference in investment between the two. If compared, in terms of whether the research is systematic, as well as the investment in talent intensity, business drivers, and computing power, the investment in security is far behind AI capabilities. Intelligence for the good requires ensuring AI controllability, coordinating development and security. Undoubtedly, to avoid such a development of Crippled AI, what we should pursue is TrustWorthy AGI, trustworthy AI, and trustworthy universal artificial intelligence. Trusted AGI needs to balance security and performance, and we need to find a technology system that prioritizes AI security while ensuring long-term development of AI performance. We call this technological ideology "AI-45 ° Balance Law". The AI-45 ° balance law refers to the long-term perspective where we should generally develop a balance between safety and performance at 45 degrees. Balance refers to the ability to fluctuate in the short term, but not below 45 degrees in the long term (as it is now), nor above 45 degrees in the long term (which will hinder development and industrial applications). This technological ideology requires strong technological drive, full process optimization, multi stakeholder participation, and agile governance. There may be multiple technical paths to achieve AI-45 ° balance law. Our Shanghai AI laboratory is currently exploring a path centered around causality, and we have named it the "ladder of causality" of trustworthy AGI, paying tribute to the pioneer in the field of causal reasoning - Turing Award winner Judea Pearl. The causal ladder of trustworthy AGI divides its development into three progressive stages: universal alignment, interventionism, and reflective ability. "Universal alignment" mainly includes the most cutting-edge human preference alignment technology currently available. However, it should be noted that these secure alignment techniques only rely on statistical correlations and ignore true causal relationships, which may lead to erroneous reasoning and potential danger. A typical example is Pavlov's dog: when a dog forms a conditioned reflex solely based on the statistical correlation between the bell and food, it may trigger behavior to secrete saliva in any situation it hears the bell - which is clearly unsafe if these behaviors involve. "Interventionable" mainly includes safety techniques that intervene in AI systems to explore their causal mechanisms, such as human in the loop, mechanical interpretability, and adversarial exercises we propose. It aims to enhance safety by improving interpretability and generalization, while also enhancing AI capabilities. Reflection requires AI systems not only to pursue efficient task execution, but also to examine the impact and potential risks of their own behavior, so as to ensure that safety and ethical boundaries are not breached while pursuing performance. The techniques at this stage include value based training, causal interpretability, counterfactual reasoning, etc. At present, the development of AI security and performance technology is mainly in the first stage, with some attempting the second stage. However, to truly achieve a balance between AI security and performance, we must improve the second stage and be brave enough to climb the third stage. Following the "ladder of causality" of trustworthy AGI, we believe that we can build a truly trustworthy AGI, achieving a perfect balance between the security and excellent performance of artificial intelligence. Ultimately, just as safe and controllable nuclear fusion technology brings clean and abundant energy to all humanity, we hope to develop and use this revolutionary technology safely and effectively by deeply understanding the underlying mechanisms and causal processes of AI. Just as controllable nuclear fusion is a common interest for all mankind, we firmly believe that the safety of AI is also a global public welfare. In the recently released Shanghai Declaration on Global Governance of Artificial Intelligence, it was mentioned that "we need to promote enhanced communication and dialogue among countries." We are willing to work together with everyone to promote the development of the AI-45 ° balance law, share AI security technology, strengthen global AI security talent exchange and cooperation, balance investment in AI security and capabilities, and jointly build an open and secure general AI innovation ecosystem and talent development environment. (Lai Xin She)
Edit:Xiong Dafei Responsible editor:Li Xiang
Source:WHB
Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com