Model Gallery

Discover and install AI models from our curated collection

1 models available
1 repositories
Documentation

Find Your Perfect Model

Filter by Model Type

Browse by Tags

ltx-2
**LTX-2** is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution. **Key Features:** - **Joint Audio-Video Generation**: Generates synchronized video and audio in a single model - **Image-to-Video**: Converts static images into dynamic videos with matching audio - **High Quality**: Produces realistic video with natural motion and synchronized audio - **Open Weights**: Available under the LTX-2 Community License Agreement **Model Details:** - **Model Type**: Diffusion-based audio-video foundation model - **Architecture**: DiT (Diffusion Transformer) based - **Developed by**: Lightricks - **Paper**: [LTX-2: Efficient Joint Audio-Visual Foundation Model](https://arxiv.org/abs/2601.03233) **Usage Tips:** - Width & height settings must be divisible by 32 - Frame count must be divisible by 8 + 1 (e.g., 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, 113, 121) - Recommended settings: width=768, height=512, num_frames=121, frame_rate=24.0 - For best results, use detailed prompts describing motion and scene dynamics **Limitations:** - This model is not intended or able to provide factual information - Prompt following is heavily influenced by the prompting-style - When generating audio without speech, the audio may be of lower quality **Citation:** ```bibtex @article{hacohen2025ltx2, title={LTX-2: Efficient Joint Audio-Visual Foundation Model}, author={HaCohen, Yoav and Brazowski, Benny and Chiprut, Nisan and others}, journal={arXiv preprint arXiv:2601.03233}, year={2025} } ```

Repository: localaiLicense: ltx-2-community-license-agreement