-
In this blog, we will give you a brief introduction of what are multimodal models and what can multimodal generative models accomplish. OpenAI just released their latest text-to-video multimodal generative model "SORA" in Feb, 2024 which becomes extremely popular. SORA can generate short videos of up to 1 minute's length. Before SORA, there are also many generative multi-modal models released by various companies, such as BLIP, BLIP2, FLAMINGO, FlaVA, etc. We will summarize a complete list of these time tested multi-modal generative models, introduce the model architures (text and image encoder), the training process, tasks, latex equation of loss functions, the Vision Language capabilities (such as text-to-image, text-to-video, text-to-audio, visual question answering), etc. Tag: Multimodal, AIGC, Large Language Model
10Follow