Stream-Omni: Supporting Multi-Modal Interactions with a Unified Text-Visual-Speech Large Model}
Stream-Omni is a versatile multimodal large model that supports combined text, visual, and speech interactions, enabling flexible and efficient multi-modal communication.