Category archives: Development

RSS feed of Development

We're thrilled to introduce our Automatic Shownotes and Chapters feature. This AI-powered tool effortlessly generates concise summaries, intuitive chapter timestamps and relevant keywords for your podcasts, audio and video files.
See our Examples and the How To section below for details.

Why do I need Shownotes and Chapters?

In addition to links and other information, shownotes contain short summaries of the main topics of your episode, and inserted chapter marks allow you to timestamp sections with different topics of a podcast or video. This makes your content more accessible and user-friendly, enabeling listeners to quickly navigate to specific sections ...

In addition to our Leveler, Denoiser, and Adaptive 'Hi-Pass' Filter, we now release the missing equalization feature with the new Auphonic AutoEQ.
The AutoEQ automatically analyzes and optimizes the frequency spectrum of a voice recording, to remove sibilance (De-esser) and to create a clear, warm, and pleasant sound - listen to the audio examples below to get an idea about what it does.

Screenshot of manually ...

Today we release our first self-hosted Auphonic Speech Recognition Engine using the open-source Whisper model by OpenAI!
With Whisper, you can now integrate automatic speech recognition in 99 languages into your Auphonic audio post-production workflow, without creating an external account and without extra costs!

Whisper Speech Recognition in Auphonic

So far, Auphonic users had to choose one of our integrated external service providers (Wit.ai, Google Cloud Speech, Amazon Transcribe, Speechmatics) for speech recognition, so audio files were transferred to an external server, using external computing powers, that users had to pay for ...

Speechmatics released a new API including an enhanced transcription engine (2h free per month!) that we integrated into the Auphonic Web Service now.
In this blog post, we also compare the accuracy of all our integrated speech recognition services and present our results.


Automatic speech recognition is most useful to make audio searchable: Even if automatically generated transcripts are not perfect and might be difficult to read (spoken text is very different from written text), they are very valuable if you try to find a specific topic within a one-hour audio file or if you need the exact ...

Today we are thrilled to introduce revised parameters for the Adaptive Leveler to move our advanced algorithms out of beta.
The leveler can now run in three modes, which allow detailed Leveler Strength control and also the use of Broadcast Parameters (Max. Loudness Range, Max. Short-term Loudness, Max. Momentary Loudness) to limit the amount of leveling.

Photo by Gemma Evans.

When we first introduced our advanced parameters, we used the Maximum Loudness Range (MaxLRA) value to control the strength of our leveler. This gave good results, but it turned out that only pure speech programs give reliable and ...

Last weekend, at the Subscribe10 conference, we released Advanced Audio Algorithm Parameters for Multitrack Productions:

We launched our advanced audio algorithm parameters for Singletrack Productions last year. Now these settings (and more) are available for Multitrack Algorithms as well, which gives you detailed control for each track of your production.

The following new parameters are available:

Until recently, Amazon Transcribe supported speech recognition in English and Spanish only.
Now they included French, Italian and Portuguese as well - and a few other languages (including German) are in private beta.

Update March 2019:
Now Amazon Transcribe supports German and Korean as well.

https://auphonic.com/static/screenshots/inspector-mt-closed.png The Auphonic Audio Inspector on the status page of a finished Multitrack Production including speech recognition.
Please click on the screenshot to see it in full resolution!


Amazon Transcribe is integrated as speech recognition engine within Auphonic and offers accurate transcriptions (compared to other services) at low costs, including keywords / custom ...

In late August, we launched the private beta program of our advanced audio algorithm parameters. After feedback by our users and many new experiments, we are proud to release a complete rework of the Adaptive Leveler parameters:

In the previous version, we based our Adaptive Leveler parameters on the Loudness Range descriptor (LRA), which is included in the EBU R128 specification.
Although it worked, it turned out that it is very difficult to set a loudness range target for diverse audio content, which does include speech, background sounds, music parts, etc. The results were not predictable and ...

Large file uploads in a web browser are problematic, even in 2018. If working with a poor network connection, uploads can fail and have to be retried from the start.

At Auphonic, our users have to upload large audio and video files, or multiple media files when creating a multitrack production. To minimize any potential issues, we integrated various external services which are specialized for large file transfers, like FTP, SFTP, Dropbox, Google Drive, S3, etc.

To further minimize issues, as of today we have also released resumable and chunked direct file uploads in the web browser to ...

In a previous blogpost we talked about the Opus codec, which offers very low bitrates. Another codec seeking to achieve even lower bitrates is Codec 2.

Codec 2 is designed for use with speech only, and although the bitrates are impressive the results aren’t as clear as Opus, as you can hear in the following audio examples. However, there is some interesting work being done with Codec 2 in combination with neural network (WaveNets) that is yielding great results.

Layers of a WaveNet neural network.

Background

Codec 2 is an open source codec designed for ...