Don’t miss OpenAI, Chevron, Nvidia, Kaiser Permanente, and Capital One leaders only at VentureBeat Transform 2024. Gain essential insights about GenAI and expand your network at this exclusive three day event. Learn More


Voice cloning company Resemble AI has released the next generation of its deepfake detection model, which has an accuracy of around 94%. 

Detect-2B uses a series of pre-trained sub-models and fine-tuning to examine an audio clip and determine whether it was generated with AI. 

“Building upon the strong foundation of our original Detect model, DETECT-2B represents a major leap forward in terms of model architecture, training data, and overall performance. The result is an extremely robust and accurate deepfake detection model that achieves a remarkable level of performance when evaluated against a massive dataset of real and fake audio clips,” the company said in a blog post

According to Resemble, Detect-2B’s sub-models “consist of a frozen audio representation model with an adaptation module inserted into its key layers.” The adaption module shifts the models’ focus towards artifacts — or the accidental sounds left in a recording — that often identify real audio from fake ones. Most AI-generated audio clips can sound “too clean.” Detect-2B can predict how much of the audio is made by AI without retraining the model every time it listens to a new clip. The sub-models are also trained on large datasets. 


Countdown to VB Transform 2024

Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now


Detect-2B aggregates its prediction scores and compares these to “a carefully tuned threshold” before determining whether a recording is real or fake. Resemble said the way its researchers structured Detect-2B makes it fast to train without needing so much computing power to deploy. 

Stochastic architectures make it easier to work with audio signals

The model’s architecture is based on Mamba-SSM or state space models, which don’t depend on static data or recurring patterns. It instead uses a stochastic, or random probabilistic, model that responds better to different variables. Resemble said this kind of architecture works well with audio detection because it captures different dynamics in an audio clip, adapts between states of an audio signal and continues to perform even if the recording is of poor quality. 

To evaluate the model, Resemble said it put Detect-2B through a test set that included unseen speakers, deepfake-generated audio and different languages. The company said the model detected deepfake audio correctly for six different languages with an accuracy of at least 93%. 

Detection performance of Detect-2B across languages
Detect-2B scored high in predicting deepfaked audio in six languages. Source: Resemble AI

Resemble launched its AI voice platform Rapid Voice Cloning in April. Detect-2B will be available through an API and can be integrated into different applications. 

Identifying deep fakes have become more important

Identifying AI-generated voices or videos is finding new importance in the run-up to the 2024 U.S. Presidential Elections. AI voices could make it easier to mislead voters and spread misinformation. Concerns over AI deepfakes, whether it is faking a politician’s voice, pretending to be a celebrity in a song or just using AI to illustrate something, have eroded trust in brands.

Tools like Detect-2B could go a long way in helping identify and prove deep fakes before these get to the public. Of course, Resemble is not the only one working to detect AI clones. McAfee launched Project Mockingbird in January to detect AI audio. Meta, on the other hand, is developing a way to add watermarks to AI-generated audio

“But our work is far from over. As generative AI capabilities continue to advance, so must our detection capabilities. We have several exciting research directions planned to further improve DETECT-2B, focusing on areas such as representation learning, advanced model architectures, and data expansion,” Resemble said. 


Leave a Reply

Your email address will not be published. Required fields are marked *