X

Introduction to Text to Video Generation Huggingface Pipeline and PyPI Package text2video

In this blog, we will give you a brief introduction to the Huggingface Text to Video Pipeline and the wrapper API. Since installing these pipeline requires many dependencies of python package, such as transformers, torch, diffusers, we provide an API wrapper of common text to video interfaces for non AI or machine learning related experts and put it into the pypi package text2video (https://pypi.org/project/text2video/). Right now this package is still in development stage, and we will keep updating this blog. This package API Wrapper is open to contribution also.

1. Illustration of a Huggingface Pipeline of Text to Video Generation models

1.1 Huggingface Text to Video Pipeline

This python scripts will generate a short video with the default length of 16 frames (2s at 8 fps). The models are using "damo-vilab/text-to-video-ms-1.7b" models.


	## code for huggingface diffusion pipeline

	import torch
	from diffusers import DiffusionPipeline
	from diffusers.utils import export_to_video

	pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16")
	pipe = pipe.to("cuda")

	prompt = "Spiderman is surfing"
	video_frames = pipe(prompt).frames[0]
	video_path = export_to_video(video_frames)
	video_path

	

1.2 text2video package Wrapper API of the Pipeline


	## code for text2video text 2 video wrapper

	import text2video as t2v

	input_dict = {"text": "Text to Video"}
	res_dict =t2v.api(input_dict, model=pipe, api_name="hf_diffusion_pipeline")
	video_path = res_dict["video"]

	

2. API to Download Latest Text to Video Papers from arxiv.org

Let's start by an example of fetching the latest top 3 papers with keywords "Text to Video" and print it out.


	## code for text2video latest research papers download

	import text2video as t2v
	import json 

	input_dict = {"text": "Text to Video"}

	res = t2v.api(input_dict, model=None, api_name="ArxivPaperAPI", start=0, max_results = 3)
	paper_list = json.loads(res["text"])
	print ("###### Text to Image Recent Paper List:")
	for (i, paper_json) in enumerate(paper_list):
	    print ("|" + paper_json["id"] + "|" + paper_json["title"].replace("\n", "") + "|" + paper_json["updated"] )

	
Output of latest "Text to Video" related papers
		###### Text to Image Recent Paper List:
		|http://arxiv.org/abs/2410.08211v1|LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts|2024-10-10T17:59:59Z
		|http://arxiv.org/abs/2410.08210v1|PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point  Supervised Oriented Object Detection|2024-10-10T17:59:56Z
		|http://arxiv.org/abs/2410.08209v1|Emerging Pixel Grounding in Large Multimodal Models Without Grounding  Supervision|2024-10-10T17:59:55Z
	

3. Related

Comments

Write Your Comment

Upload Pictures and Videos