likelihood: confidence of the ASR on the text predicted. The value is between 0 and 1. The way we compute this value has been changed and now better reflects the confidence of the model in the prediction.
tokens: a list of dictionaries, one for each token. For instance, if the predicted sentence is
next track, then there are two tokens which are
track. For each token, the dictionary contains the following fields:
value: the string representation of the token. Here it would be
confidence: a float representing the confidence of the ASR for this particular token. The value is between 0 and 1. The higher, the more confident.
range_end: integers specifying the range locating the token within the sentence
time: a dictionary containing a
endfield which, according to the ASR, specify the time range of the utterance in the audio sequence. The time is in seconds.
hermes/asr/textCapturedtopic. Here is an example on a small query:
confidencefield of each slot. This value has to be understood as the confidence of the ASR on the correctness of the slot. This new field is available when listening on the