When the restoration of ancient books meets artificial intelligence

2022-04-11

A series of seminars on "intelligent information processing of ancient books" jointly sponsored by the research center of Digital Humanities of Peking University, the open laboratory of Digital Humanities of Peking University and the Institute of artificial intelligence of Peking University were held online a few days ago. At the seminar, Wang Jun, director of the Digital Humanities Research Center of Peking University, calculated that there are about 200000 kinds of ancient books in China. From 1949 to 2019, nearly 38000 kinds of ancient books were repaired, sorted and published. At this speed, it may take 300 years to repair and sort out all the existing ancient books. However, if artificial intelligence technology is used to assist repair and sorting, it can be completed in about 20 or 30 years. What Wang Jun said about "using artificial intelligence technology to repair ancient books" is not a distant scientific idea. It is becoming a vivid practice in reality. Shortly after the first lecture of the series of seminars on "intelligent information processing of ancient books", byte beat announced to donate to Peking University Education Foundation to support Peking University byte beat Digital Humanities open laboratory to develop the "digital platform of ancient books", use intelligent technology to accelerate the digital construction of Chinese ancient book resources, and it is expected to complete the intelligent repair and sorting of 10000 selected ancient books within three years. The text transformation of ancient books is becoming intelligent For a long time, the protection of ancient books mainly adopts the original protection method, that is, ancient books are protected as "cultural relics". Later, there was a regenerative protection method, which reprocessed and preserved the ancient books, so that the ancient books could exist in the form of paper or microfilm. Many of the existing digital ancient books are converted from microfilm, with low resolution and mostly black-and-white images. Even if all ancient books are photocopied and published by digital means, ancient books are "dead" and cannot be easily used by people. Yang Haizheng, a professor in the Chinese Department of Peking University, gave a simple example - the photocopied ancient books have no punctuation marks and are very difficult to read. In addition, this is not conducive to the retrieval of ancient books. If you want to consult a certain content, you have to read the original text page by page, so it is difficult to find the desired knowledge quickly. Therefore, in order to improve the utilization rate of traditional ancient books, the content of ancient books must be transformed into digital text. In the past, this transformation mainly relied on manual input by experts, and the time cost was very high. "The development of information technology, especially the emergence of artificial intelligence and big data technology, has brought revolutionary changes to the restoration and sorting of ancient books." Wang Jun said that in recent years, many universities and scientific research institutions, including Peking University, have carried out a lot of pioneering work in the digitization of ancient books, and accumulated relatively mature technologies and experience in OCR (optical character recognition), AI sentence reading, entity recognition and so on. Taking OCR application as an example, once the paper-based ancient books are scanned with electronic equipment, the contents of the ancient books will be transcribed into the computer and the corresponding digital documents will be generated. The efficiency is more than ten million times higher than that of manual input. It is understood that using artificial intelligence and big data technology, the Digital Humanities Center of Peking University has realized the automatic sentence reading of ancient texts on a large-scale cross era corpus sorting from Pre-Qin to Ming and Qing Dynasties, with an average accuracy of 94%. At the same time, it has also realized the automatic recognition of people's names, place names, times, official names and book titles, with an accuracy of nearly 98% in medieval historical materials. In these aspects, Internet companies such as byte beating also have a lot of experience and technology accumulation. For example, OCR technology is widely used in today's headlines, flare and other platforms for image character recognition, subtitle translation, and tiktok and industry document recognition in commercial business. "These technologies can gradually migrate to the direction of intelligent digitization of ancient books. In the development of ancient book digitization platform, we can complement Peking University in technology and carry out effective communication and integration." Li Hang, director of Artificial Intelligence Laboratory, said. Wang Jun introduced that the "digital platform for ancient books" will further improve the accuracy, intelligence and openness of ancient books sorting. On the one hand, the key texts can be refined to meet the requirements of experts and scholars for data accuracy; On the other hand, using the character recognition and proofreading tools on the intelligent platform, scholars and ancient book lovers can finish the sorting of Ancient Books Online in one stop, instead of sorting and editing in word documents and then transmitting relevant documents, which not only improves efficiency, but also facilitates public participation. The utilization of ancient books is expected to be intelligent Wang Zhaopeng, Professor of the big data center for global dissemination of Chinese culture at Sichuan University, believes that technological progress has brought two aspects to the intellectualization of ancient book restoration and sorting: one is the intellectualization of ancient book text transformation, and the other is the intellectualization of ancient book utilization. Converting the contents of paper-based ancient books into digital texts is only the first step in the restoration and sorting of ancient books. On this basis, another problem to be solved is how to sort out and classify a large number of obscure ancient books to form interactive, touchable and visual digital humanistic works, so as to facilitate people's access and use. Otherwise, the ancient books entered into the computer will continue to "sleep". Based on artificial intelligence technology, many automation and visualization platforms for ancient book sorting have been established in China. For example, the "knowledge map visualization system of Song Yuan study plan" designed and developed by Wang Jun has processed and analyzed the text of 2.4 million words of Song Yuan study plan, and extracted more than 2000 Song Yuan Neo Confucianism scholars and characters, time, place and works involved in nearly 100 academic schools to form a knowledge map. However, the intelligence level of many platforms is still low, such as inputting keywords, and the searched contents are isolated, messy and disorderly. Wang Zhaopeng believes that a smarter platform for collating and utilizing ancient books should evolve from version 1.0 to version 2.0. For example, content retrieval should be "based on categories", and the retrieved contents should be related to each other and organically classified by artificial intelligence. The research and development of "digital platform for ancient books" jointly developed by Peking University and Beijing University is an attempt to improve the intelligent level of sorting and utilization of ancient books. "The technical core of our cooperation is to apply artificial intelligence and big data to a large number of ancient books and documents, realize the automatic generation of knowledge map of ancient books and the intelligent sorting of the contents of ancient books, so that ancient books can be retrieved, associated read and deeply mined in the form of text." Li Hang said that in the future, the "digital platform for ancient books" will not only be an intelligent sorting platform for ancient books, but also a digital reading tool for readers, which will provide free and open access services. Wang Jun predicted that with the application of artificial intelligence technology, the ancient historical and cultural knowledge contained in ancient books and documents will continue to be extracted and constructed into various knowledge bases, and will support the front-end application of the Internet in the form of knowledge map. Due to its advantages in Internet product R & D and design, the participation of Internet companies and other social forces will further ensure the service quality of the ancient book digitization platform. "We have excellent product managers, designers and software engineers, who can continuously optimize and innovate the product functions of the ancient book digitization platform and provide a better user experience." Tang Kaixin, general manager of the Beijing byte beating corporate social responsibility department, said that the development team and the team of today's headlines have joined the development of the "tiktok digital platform". Interdisciplinary cooperation is required With the wide application of artificial intelligence technology in the field of ancient book restoration and sorting, as a teacher majoring in classical literature, Yang Haizheng is often asked by students: "do you want to learn artificial intelligence while learning classical literature?" Although Yang Haizheng is not sure, the fact is that the combination of artificial intelligence technology and ancient books repair and sorting will open up a new interdisciplinary field. Using artificial intelligence technology to repair and sort ancient books will certainly need more compound talents. Wang Jun believes that in this case, how to cultivate classical philology talents with both technical and academic abilities in relevant majors such as classical philology in Colleges and universities, and how to form a multidisciplinary curriculum system are urgent problems to be solved. In addition, AI is not "extremely smart". According to Jin Lianwen, a professor at the school of electronics and information of South China University of technology, problems such as image enhancement and restoration of ancient books, image layout analysis of ancient books with complex layout and so on need to be solved. In the analysis and sorting of the contents of ancient books, the biggest technical difficulty at present is how to further realize the relationship extraction after artificial intelligence recognizes the proper nouns such as human names and place names in ancient books, so as to prepare the technical conditions for the automatic generation of ancient historical and cultural knowledge map. Therefore, Yang Haizheng believes that in the collation of ancient books, scholars of Humanities and social sciences should actively intervene and strengthen cooperation with technicians, so as to make better use of the machine rather than being led by the machine, so as to ensure the accuracy of the results. The development of artificial intelligence technology has brought about fundamental changes in the research methods and ideas of ancient books sorting. A consensus in the industry is that the use of artificial intelligence to promote the restoration and sorting of ancient books requires interdisciplinary, cross-environmental, cross-cultural and cross regional cooperation. As Wang Jun said, "the protection of ancient books requires the joint efforts of all sectors of society. More ancient book collection institutions, research institutions and individuals enthusiastic about the cause of ancient books should be welcome to join, so as to create an open 'ancient book digitization platform'.". (Xinhua News Agency)

Edit:Li Ling    Responsible editor:Chen Jie

Source:Guangming Daily

Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com

Return to list

Recommended Reading Change it

Links

Submission mailbox:lwxsd@liaowanghn.com Tel:020-817896455

粤ICP备19140089号 Copyright © 2019 by www.lwxsd.com.all rights reserved

>