基于文本驱动的虚拟人语音视频生成方法

TEXT-DRIVEN METHOD FOR VIRTUAL HUMAN VOICE AND VIDEO GENERATION

摘要: 针对目前文本驱动的虚拟人物形象视觉效果无法满足实际应用的问题，利用语音合成技术完成由文本到语音波形的转换；利用语音人脸生成方法生成音视频同步的虚拟人物形象；再通过薄板样条变换方法驱动虚拟人形象，合成出音视频同步的虚拟人形象。实验结果表明，该方法能够有效解决文本驱动的虚拟人物形象唇形不匹配问题、文本驱动的虚拟人物形象音视频不同步问题，具有较高的实际应用前景。

Abstract: To solve the problem that the current text-driven virtual effects cannot meet the practical application, this paper used the speech synthesis technology to complete the virtual vision from the waveform conversion to the human voice; used the face to synthesize the video effect of video synchronization; and used the thin-plate spline transformation method to drive virtual human images so as to synthesize synchronized virtual human images for audio and video. The experimental results realize the effect of virtual image, which can be used in the application problem of poor video effect and asynchronous text-driven effect, and has the same application scene.