DeepNLP AI & Robotic App Store, Reviews, Communities Machine learning,NLP,CV and related technical fields.

RECOMMEND

Introduction to Multimodal Generative Models-Model Architecture Key Features and Codes

rockingdingo 2024-08-25 23:05 #Multimodal Generative Models #AIGC #Large Language Model

In this blog, we will give you a brief introduction of what are multimodal models and what can multimodal generative models accomplish. OpenAI just released their latest text-to-video multimodal generative model "SORA" in Feb, 2024 which becomes extremely popular. SORA can generate short videos of up to 1 minute's length. Before SORA, there are also many generative multi-modal models released by various companies, such as BLIP, BLIP2, FLAMINGO, FlaVA, etc. We will summarize a complete list of these time tested multi-modal generative models, introduce the model architures (text and image encoder), the training process, tasks, latex equation of loss functions, the Vision Language capabilities (such as text-to-image, text-to-video, text-to-audio, visual question answering), etc. Tag: Multimodal, AIGC, Large Language Model

28

8

Follow

Machine Learning

Introduction to Text to Video Generation Huggingface Pipeline and Pypi Package text2video

rockingdingo 2024-10-12 22:43 #Machine Learning #python #text2video

In this blog, we will give you a brief introduction to the Huggingface Text to Video Pipeline and the wrapper API. Since installing these pipeline requires many dependencies of python package, such as transformers, torch, diffusers, we provide an API wrapper of common text to video interfaces for non AI or machine learning related experts and put it into the pypi package text2video (https://pypi.org/project/text2video/). Right now this package is still in development stage, and we will keep updating this blog. This package API Wrapper is open to contribution also.

4

3

Follow