RECOMMEND
In this blog, we will give you a brief introduction of what are multimodal models and what can multimodal generative models accomplish. OpenAI just released their latest text-to-video multimodal generative model "SORA" in Feb, 2024 which becomes extremely popular. SORA can generate short videos of up to 1 minute's length. Before SORA, there are also many generative multi-modal models released by various companies, such as BLIP, BLIP2, FLAMINGO, FlaVA, etc. We will summarize a complete list of these time tested multi-modal generative models, introduce the model architures (text and image encoder), the training process, tasks, latex equation of loss functions, the Vision Language capabilities (such as text-to-image, text-to-video, text-to-audio, visual question answering), etc. Tag: Multimodal, AIGC, Large Language Model
Machine Learning
In this blog, we will give you a brief introduction to the Huggingface Text to Video Pipeline and the wrapper API. Since installing these pipeline requires many dependencies of python package, such as transformers, torch, diffusers, we provide an API wrapper of common text to video interfaces for non AI or machine learning related experts and put it into the pypi package text2video (https://pypi.org/project/text2video/). Right now this package is still in development stage, and we will keep updating this blog. This package API Wrapper is open to contribution also.
AIGC
Please leave your thoughts in the comments
Please leave your thoughts and show me some videos or images