Confidence Scores and Timestamps

Last updated 2 months ago

The output of the ASR and/or the NLU can sometimes be wrong and, thus, lead to wrong actions. That's why we developed new features to better prevent those bad behaviours. The first one is ASR confidence scores (on sentences, tokens, and slot values). The second one is ASR timestamps. In order to use all the features presented here, a specialized Snips ASR component must be used. Moreover, the assistant must have been trained after the end of October 2018 release.

Enriched ASR output

The output of the ASR will be enriched with the following fields:

  • likelihood: confidence of the ASR on the text predicted. The value is between 0 and 1. The way we compute this value has been changed and now better reflects the confidence of the model in the prediction.

  • tokens: a list of dictionaries, one for each token. For instance, if the predicted sentence is next track, then there are two tokens which are next and track. For each token, the dictionary contains the following fields:

    • value: the string representation of the token. Here it would be next and track.

    • confidence: a float representing the confidence of the ASR for this particular token. The value is between 0 and 1. The higher, the more confident.

    • range_start and range_end: integers specifying the range locating the token within the sentence

    • time: a dictionary containing a start and end field which, according to the ASR, specify the time range of the utterance in the audio sequence. The time is in seconds.

This enriched output can be retrieved by listening to the hermes/asr/textCaptured topic. Here is an example on a small query:

{
"text": "next track",
"likelihood": 0.71970475,
"tokens": [
{
"value": "next",
"confidence": 0.78405327,
"range_start": 0,
"range_end": 4,
"time": {
"start":0.011338441,
"end":1.0799999
}
},
{
"value": "track",
"confidence": 0.66063744,
"range_start": 5,
"range_end": 10,
"time": {
"start": 1.0799999,
"end": 1.89
}
}
],
"seconds": 2.0,
"siteId": "default"
}

Enriched NLU output

In order to improve the user experience, we now use the tokens confidence of the ASR to compute a confidence on the slots. This confidence, one for each slot, can be retrieved by accessing the confidence field of each slot. This value has to be understood as the confidence of the ASR on the correctness of the slot. This new field is available when listening on the hermes/nlu/intentParsed topic.

{
"input": "set the volume to sixty percent",
"intent": {
"intentName": "volumeSet",
"probability": 0.60804254
},
"slots":[{
"confidence": 0.99879981,
"rawValue": "sixty",
"value": {
"kind": "Custom",
"value": "60"
},
"entity": "volumePercent",
"slotName": "volume"
}
]
}

Note that the probabilityfield, which is present at the intent level, is the probability of the intent classification alone. It does not take into account the ASR confidence score.