For decades, scientists have been trying to decode the mysterious “clicks,” “whistles,” and “pulses” of dolphins. Now, with Google AI’s DolphinGemma model, developed in collaboration with a long-term research initiative, we’re a step closer to understanding—and perhaps even interacting with—these intelligent marine creatures.
Have you ever gazed out at the ocean, wondering what those smart and playful dolphins might be talking about? Their clicks, sharp whistles, and short pulse sounds almost resemble an alien language. For decades, scientists have acted like cryptographers of the sea, working tirelessly to make sense of it all.
But what if we could not only hear dolphin sounds, but actually understand the intricacies of how they communicate? What if we could even simulate realistic dolphin vocalizations to respond back? Sounds a little sci-fi, right?
On this year’s National Dolphin Day, Google brought exciting news! Together with researchers from Georgia Tech and the long-running Wild Dolphin Project (WDP), they announced progress on DolphinGemma—a foundational AI model designed to learn the structure of dolphin vocalizations and even generate new sequences that sound like they could come from real dolphins. This exploration of interspecies communication not only pushes the boundaries of AI but also opens up new possibilities for how we connect with our blue planet.
Decades of Work to Hear What They’re Really Saying: The Wild Dolphin Project (WDP)
To truly understand any species, deep contextual knowledge is essential—and that’s exactly where WDP excels. Since 1985, WDP has led the world’s longest-running underwater dolphin research project, based in the Bahamas. Their focus is on a specific group of wild Atlantic spotted dolphins (Stenella frontalis), studied across multiple generations.
What makes their work unique is their non-invasive approach, guided by the philosophy: “In their world, on their terms.” This deep respect for life has resulted in a one-of-a-kind database of underwater videos and audio recordings, each meticulously labeled with the specific dolphin, its life history, and the observed behavior at the time.
Dolphins swimming underwater
A pod of Atlantic spotted dolphins (Stenella frontalis)
WDP’s core goal is to observe and analyze dolphins’ natural communication and social interactions. The benefit of underwater work is the ability to directly connect sounds with behavior—something not possible from the surface. Over the years, they’ve linked different types of vocalizations with specific contexts, for example:
- Signature whistles: Like a dolphin’s “name,” these unique calls help mothers and calves find each other.
- Burst-pulse “squawks”: Often heard during fights or disputes.
- Click “buzzes”: Typically used during courtship or when deterring sharks.
Identifying which dolphin made a sound is key to making accurate interpretations. The ultimate aim of this research is to understand the structure and possible meaning of these natural vocal sequences—to identify potential patterns and rules that suggest a kind of “language.” This long-term study of natural communication forms the foundation for any AI analysis.
Enter DolphinGemma: AI Joins the Dolphin Research Team
Analyzing the natural, complex communication of dolphins is no easy task. Fortunately, WDP’s large, well-labeled dataset provides the perfect opportunity for advanced AI technology to shine.
That’s where DolphinGemma comes in. Developed by Google, this AI model leverages Google’s proprietary audio technology: the SoundStream tokenizer, which efficiently converts dolphin vocalizations into data. Then, a specialized model architecture designed to handle complex sequences takes over. With roughly 400 million parameters, DolphinGemma has been optimized to run directly on Pixel phones used by WDP in the field.
This model was inspired by the design of Gemma, Google’s lightweight, high-performance open models, built on the same research foundation that powers the Gemini models. DolphinGemma was trained extensively on WDP’s dataset of Atlantic spotted dolphin vocalizations. It works using a “sound in, sound out” approach—processing natural dolphin sequences to find patterns and predict what sound might come next. This is quite similar to how large language models predict the next word or symbol in a sentence.
WDP is preparing to deploy DolphinGemma in this research season, and the benefits are already clear. By identifying recurring sound patterns, clusters, and reliable sequences, the model can help researchers uncover hidden structure and potential meaning in dolphin communication—something that previously required immense manual effort. Looking ahead, researchers hope to combine AI-identified patterns with synthetic sounds referring to items dolphins enjoy—like seaweed or scarves—to create a shared vocabulary for interactive communication.
Beyond Understanding: The CHAT System and Pixel Experiments for Two-Way Interaction
While analyzing natural communication is one path, WDP is also exploring a very different route: using technology to enable two-way interaction in the ocean. This led to a collaboration with Georgia Tech to develop the CHAT (Cetacean Hearing Augmented Telemetry) system.
CHAT is an underwater computer system, and its goal is not to directly decode the complexities of dolphin language, but to establish a simpler, shared vocabulary.
The idea starts with associating novel, synthetic whistles (created by CHAT and distinct from natural dolphin sounds) with specific objects that dolphins like—such as sargassum, sea grass, or scarves used by researchers. By demonstrating these associations among humans, researchers hope the naturally curious dolphins will learn to mimic these whistles to “ask for” items. Eventually, as understanding of dolphin sounds improves, natural vocalizations can also be incorporated into the system.
To enable two-way interaction, the CHAT system must:
- Accurately detect dolphin mimicry in noisy underwater environments.
- Identify in real-time which whistle the dolphin is imitating.
- Notify the researchers through underwater bone conduction headphones about what the dolphin “wants.”
- Allow researchers to respond quickly with the correct item, reinforcing the connection.
In the past, even a Google Pixel 6 could handle high-fidelity real-time analysis of dolphin sounds. The next-gen system (expected for the 2025 summer research season) will be built around the Google Pixel 9, integrating speaker/microphone functions and leveraging the phone’s enhanced processing power to run deep learning models and template-matching algorithms simultaneously.
Using Pixel smartphones drastically reduces the need for custom hardware, improves system maintainability, lowers power consumption, and shrinks the device size and cost—all critical advantages for open-ocean fieldwork. Meanwhile, DolphinGemma’s predictive abilities can help CHAT identify potential mimicry even earlier, improving the speed and quality of human-dolphin interaction.
Good Things Are Meant to Be Shared: DolphinGemma’s Open Source Future
Recognizing the value of collaboration in scientific discovery, Google plans to release DolphinGemma as an open model this summer. While it was primarily trained on Atlantic spotted dolphin vocalizations, researchers working with other cetaceans—like bottlenose or spinner dolphins—may also find it useful. Some fine-tuning may be necessary to adapt to different species’ vocal styles, but that’s exactly what the openness of the model is designed to support.
By providing tools like DolphinGemma, the goal is to empower researchers around the world to explore their own audio datasets, accelerate pattern recognition, and deepen our collective understanding of these intelligent marine mammals.
Conclusion: One Small Step Toward Interspecies Understanding
The journey to truly understand dolphin communication is still a long one, but the combined efforts of WDP’s decades of focused fieldwork, Georgia Tech’s engineering expertise, and Google’s technological power are opening up exciting new possibilities. We are no longer just passive listeners—we’re beginning to understand the patterns behind the sounds. This could be the first step toward a future where the communication gap between humans and dolphins is a little bit smaller.
If you’d like to learn more, visit the Wild Dolphin Project
This article was originally published on the Google Blog
Side Note
Reading this article reminded me of the April 1st “Text to Bark” experiment—couldn’t resist sharing it with you all.