10 Simple Ways How AI is Turning IT Departments into Productivity Powerhouses!
July 16, 2024Microsoft Bails on OpenAI Board – What’s Really Going On?
July 16, 2024Moshi: A New AI Chatbot with Impressive Voice Features
Kyutai, a French AI company, has unveiled “Moshi,” a new AI chatbot designed to interpret tone and respond faster than ChatGPT’s delayed ‘Advanced Voice Mode’ GPT-4o. Moshi operates offline and is based on a 7B parameter large language model (LLM) called Helium. It supports various accents and 70 different emotional and speaking styles, with the ability to handle two audio streams simultaneously, allowing it to listen and talk at the same time. Moshi boasts a response time of just 200 milliseconds, significantly faster than GPT-4o’s 232 to 320 milliseconds.
Developing Moshi: Speed and Simplicity
Moshi, named after the Japanese greeting when answering a phone, was developed in just six months by a team of eight researchers. It was trained on 100,000 synthetic dialogues using Text-to-Speech technology, aiming to capture the nuances and tones of human conversations. Unlike GPT-4o, Moshi is relatively small but has been enhanced through collaboration with a professional voice artist to improve voice quality. Kyutai’s objective is to make Moshi an open-source project, ensuring users’ privacy and enabling safe usage.
Future Prospects: Beyond Chatbot Capabilities
Kyutai is also working on integrating an AI-powered audio identification, watermarking, and signature tracking system with Moshi. While Moshi may not be a direct competitor to ChatGPT, it represents a significant step forward in developing open-source models that can function offline. The company’s focus is on showcasing Moshi’s quick response time and its ability to replicate not only sentences but also tones and voices, paving the way for future advancements in AI-powered communication tools.
(Visit Indian Express for the full story)
*An AI tool was used to add an extra layer to the editing process for this story.