The new model can generate high-definition videos based on text

2024-04-28

On the 27th, at the Zhongguancun Forum Future Artificial Intelligence Pioneer Forum, Tsinghua University and Beijing Shengshu Technology Co., Ltd. (hereinafter referred to as "Shengshu Technology") officially released China's first original fully self-developed video model - Vidu. At the forum, Professor Zhu Jun from Tsinghua University and Chief Scientist of Shengshu Technology presented Vidu generated videos to attendees, including cars driving on rugged roads, cats wearing pearl earrings, pandas playing guitar, and more. Like Sora, which previously shook the industry, Vidu is able to directly generate high-quality videos based on text descriptions. Long duration, high consistency, and high dynamism are prominent features of Vidu. Zhu Jun stated that the core technology of the R&D team lies in adopting the U-ViT architecture. It is a fusion of Diffusion and Transformer models and can support one click generation of high-definition video content for up to 16 seconds. In addition to its outstanding advantage in terms of duration, Vidu has also achieved significant improvements in video effects. Zhu Jun introduced that Vidu can simulate the real physical world, and the generated videos not only have complex scene details, but also conform to physical laws, such as reasonable light and shadow effects, delicate character expressions, etc. Vidu also possesses rich imagination, capable of generating fictional images that do not exist in the real world, creating surrealistic content with depth and complexity. In addition, Vidu can understand multi shot language, and the generated videos are no longer limited to simple fixed shots such as push, pull, and move. Instead, they can switch between different shots such as far, medium, close, and close-up around the same subject, and even directly generate effects such as long shots, focus, and transition, injecting rich shot expression into the video. Vidu also has unique cultural characteristics and can easily understand Chinese elements, generating videos of Chinese elements such as pandas and dragons. It is worth mentioning that the videos displayed on the forum are continuously generated from beginning to end, without any obvious frame insertion phenomenon. Zhu Jun said that, like Sora, Vidu is direct and continuous in the process of text to video conversion, and in the underlying algorithm, it is completely end-to-end generated by a single model, not involving frame insertion and other multi-step processing. (Lai Xin She)

Edit:Wangchen Responsible editor:Jia Jia

Source:http://digitalpaper.stdaily.com

Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email：lwxsd@liaowanghn.com

Return to list