DeepNLP AI & Robotic App Store, Reviews, Communities Machine learning,NLP,CV and related technical fields.

Navigation

rockingdingo 2024-08-25 23:05 #Multimodal Generative Models #AIGC #Large Language Model

In this blog, we will give you a brief introduction of what are multimodal models and what can multimodal generative models accomplish. OpenAI just released their latest text-to-video multimodal generative model "SORA" in Feb, 2024 which becomes extremely popular. SORA can generate short videos of up to 1 minute's length. Before SORA, there are also many generative multi-modal models released by various companies, such as BLIP, BLIP2, FLAMINGO, FlaVA, etc. We will summarize a complete list of these time tested multi-modal generative models, introduce the model architures (text and image encoder), the training process, tasks, latex equation of loss functions, the Vision Language capabilities (such as text-to-image, text-to-video, text-to-audio, visual question answering), etc. Tag: Multimodal, AIGC, Large Language Model

OTHER

Chatbot close

Bot
Hi TEMP_536bad8e,
How can I help you today?

Send

Navigation

RECOMMEND

Introduction to Multimodal Generative Models-Model Architecture Key Features and Codes

OTHER