Product Information
What is Cactus?
A cross-platform framework for locally deploying LLM/VLM/TTS models in apps. Works with Flutter and React, great for cross-platform devs. Supports any GGUF model you can find on HuggingFace: Qwen, Gemma, Llama, DeepSeek, etc. Run LLMs, VLMs, embedding models, TTS models, and more. Accommodates models from FP32 down to 2-bit quantization for efficiency and lower device strain. MCP tool calling for AI actions and assistance (set reminders, gallery searches, reply to messages, etc.). Fall back to large cloud models for complex tasks and device failures. Uses Jinja2-supported chat templates with token streaming.
How to use Cactus?
Cactus is a cross-platform framework designed to help developers locally deploy large language models (LLMs), visual language models (VLMs), and text-to-speech (TTS) models in smartphone apps, enabling low latency, high privacy, and reduced server costs.
Core Functions of Cactus
Text-to-Speech
React Native
Usage Scenarios of Cactus
- Deploy AI models natively in smartphone apps.
- Run AI features on unreliable or offline devices.
- Ensure user data privacy with on-device inference.
- Enhance workflows with built-in tools, such as setting reminders, gallery searches, and replying to messages.
- Deploy multimodal models, including language, vision, and speech models.
- Fall back to cloud models for complex tasks or device failures.
Common Questions about Cactus
What does Cactus do?
How do I use Cactus?
What are the core features of Cactus?
What are the application scenarios for Cactus?




















