AssemblyAI has launched its newest speech recognition mannequin, Common-1, setting a brand new benchmark for automated speech recognition (ASR) accuracy. This mannequin is designed to attain near-human transcription accuracy, even in difficult audio environments with accents, background noise, and sophisticated phrases. In keeping with AssemblyAI, the Common-1 mannequin is now accessible by way of the identical internet API as earlier ASR fashions.
New Pricing Tiers for Common-1
Alongside the launch of Common-1, AssemblyAI has unveiled two new pricing tiers: Finest and Nano. The Finest tier is optimized for optimum accuracy, whereas the Nano tier affords a cheap resolution supporting transcription in 99 completely different languages. This flexibility permits builders to decide on the fitting steadiness of accuracy and price for his or her particular wants.
Getting Began with the AssemblyAI Python SDK
To facilitate the transcription course of, AssemblyAI supplies an official Python SDK. Builders can simply set up the SDK utilizing the command:
pip set up --upgrade assemblyai
After putting in, customers want to join an AssemblyAI account to acquire an API key, which is critical to authorize API calls in Python scripts.
Transcribing Audio Recordsdata with Common-1
As soon as arrange, builders can transcribe audio recordsdata by making a Python script. By default, the SDK makes use of the Finest tier for transcriptions, making certain the best accuracy. The method includes importing the SDK, configuring the API consumer with the API key, and specifying the audio file URL or native path.
import assemblyai as aai
aai.settings.api_key = "YOUR_API_KEY"
transcriber = aai.Transcriber()
audio_file = "https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3"
transcript = transcriber.transcribe(audio_file)
if transcript.error:
print(transcript.error)
else:
print(transcript.textual content)
Working the script will output the transcription ends in the terminal, demonstrating the mannequin’s spectacular capabilities.
Exploring the Nano Tier
For these in search of a extra economical choice, switching to the Nano tier is simple. Builders can alter the TranscriptionConfig
object to make the most of the Nano mannequin by setting the speech_model
parameter to “nano”.
config = aai.TranscriptionConfig(speech_model="nano")
transcriber = aai.Transcriber(config=config)
transcript = transcriber.transcribe(audio_file)
This flexibility permits for environment friendly use of assets whereas nonetheless benefiting from AssemblyAI’s strong transcription capabilities.
Past Transcription: Extra Options
AssemblyAI’s choices lengthen past fundamental transcription. The platform supplies superior options resembling entity detection, content material moderation, PII redaction, and the applying of huge language fashions (LLMs) to audio information. These capabilities improve the utility of the transcription service, making it appropriate for a variety of purposes.
Builders involved in leveraging these options can discover AssemblyAI’s documentation and analysis assets for additional insights into constructing superior speech AI options.
Picture supply: Shutterstock