Feature Overview
The emotional classification model used in our APIs is optimized for North American English conversational data. The API includes a baseline model of 4 basic emotions. The emotions included by default are angry, happy, neutral, and sad. Our other model offerings include different subsets of the following emotions: happy, sad, angry, neutral, surprised, disgusted, nervous, irritated, excited, sleepy.Coming soon – The API will include a model choice parameter, allowing users to choose between models of 4, 5, and 7 emotions.
Choosing an API
While our APIs include the same model offerings in the backend, they are best suited for different purposes.DiscreteAPI | AsynchAPI | |
---|---|---|
Inputs | A short audio file, 4-10s in length. | A long audio file, at least 5s in length. Inputs can be up to 1 GB large. |
Outputs | A JSON that includes the primary emotion detected in the file, along with its confidence. The confidence scores of all other emotions in the model are also returned. | A time-stamped JSON that includes the classified emotion and its confidence at a rate of 1 classification per 5 seconds of audio. |
Response Time | 100-500 ms | Dependent upon file size |
Accessible outside SDK | ✅ Yes | ❌ No |
Coming soon – StreamingAPI via WebSockets for real-time analysis of an audio stream.
Ideal Inputs
The APIs expect mono audio in the .wav format. An ideal audio file is recorded at 44100 Hz (44.1 kHz), though sampling rates as low as 8 kHz can still be used with high accuracy. For custom use cases, microphone specifications can be customized based on audio environment, including optimizations for mono/stereo audio, single microphone applications, noisy environments, etc. For the DiscreteAPI, there are two input data formatting options:- raw audio file
[multipart/form-data]
- processed audio file
[application/json]
Outputs
Outputs are returned as JSONs in the following formats: DiscreteAPI:main_emotion
is the highest confidence emotion returned from the model. Within all_predictions
, each emotion is followed by its level of confidence. Some may use the top two highest confidence emotions to generate more nuanced states. We recommend dropping a main_emotion
with confidence under 0.38, but that is at the user’s discretion.
AsynchAPI:
emotions
are the highest confidence emotion returned from the model, alongside the timestamp and confidence. The number of values in emotions
correlates directly to the length of the input file. We recommend dropping emotions
with confidence under 0.38, but that is at the user’s discretion.
Looking for a different interval of timestamps? The customizability of audio length is in beta and will be released soon.