NLU

The Natural Language Understanding (NLU) allows the assistant to “understand’ what the written words derived from the ASR step actually mean, i.e. what the user wanted to request/tell the assistant when they said something. The NLU has three primary responsibilities: deriving the user intent, capturing other associated parameters in the intent and inform action.

Deriving the user intent

To express your desired intention to the assistant you can use natural language. In this context, ‘natural’ means that a user doesn’t have to say specific pre-programmed commands/words/expressions to trigger a specific action, but can rather speak naturally, as they would to a fellow human being. For instance, if you wanted to know the weather for Paris, you wouldn’t need to say “Weather Paris today 20:00h”. Instead you could say “I’m going out tonight, do you think I’ll need an umbrella?”. The NLU is the mechanism in charge of deriving the meaning or intention of the command independently of the way it is being expressed.

Capturing other associated parameters

The so-called slots are parameters that add information to the intent. For instance, we have an intent called playMusic to express our intention to play some music. A possible slot could be “song name” or “artist name”. A concrete command for the intent playMusic could be “Play me Beat it from Michel Jackson”. This intent would be classified as playMusic intent and “Beat it” associated with a “song name” slot along with “Michael Jackson” as an “artist name” slot.

Inform Action

The mechanism that is triggered to fulfill a command is called an action. The action could be the assistant replying with information on the weather, turning on the lights in the kitchen or playing a given song, among many others (sky's the limit!). The NLU plays a crucial part for an action to be triggered as actions triggered will be different depending on the intention of the command. It is the responsibility of the developer to trigger a specific action using the NLU output.

Natural Language Understanding (NLU) flow

The NLU is one of the most complex systems to get right in a voice assistant.

Snips NLU

The Snips NLU engine is aimed at parsing a query in text format. The parsing process is twofold:

  1. Intent classification: the engine has to guess which intent the input query corresponds to.

  2. Slot-filling: when an intent is detected, the engine must then extract the attributes (slots) of the query.

A concrete example

Let’s consider an assistant that covers the intent weatherForecast, and that is asked the following query:

Give me the weather in Paris today please

In that case:

  1. the intent classifier will identify that this query is related to a weather forecast request.

  2. the slot filler will look for different attributes of a weather forecast request: location, date…

The NLU output

Given a text input, the NLU engine outputs an object with the following three properties:

  • Input: the actual text input.

  • Intent: the result of the intent classification process. It contains the detected intent, plus a probability.

  • Slots: a list of detected slots, each slot being specified with its name, value, and span within the original query. For dates, and other built-in slot types, the value of the slot can be resolved into a directly usable object. For example, the word “Christmas” will be resolved into 2017-12-25 00:00:00 +00:00 in the NLU output object.

Let’s look at an example of an NLU output object using the weather forecast example of the previous section. An assistant including the intent weatherForecast should return the following parsed output for the query "Give me the weather in Paris today please":

{
"input": "Give me the weather in Paris today please",
"intent": {
"intentName": "SearchWeatherForecast",
"probability": 0.8302662399999999
},
"slots": [
{
"entity": "locality",
"slotName": "weatherForecastLocality",
"rawValue": "Paris",
"value": {
"kind": "Custom",
"value": "Paris"
},
"range": {
"start": 23,
"end": 28
}
},
{
"entity": "snips/datetime",
"slotName": "weatherForecastStartDatetime",
"rawValue": "today",
"value": {
"kind": "InstantTime",
"value": {
"grain": "Day",
"precision": "Exact",
"value": "2017-06-13 00:00:00 +00:00"
}
},
"range": {
"start": 29,
"end": 34
}
}
]
}

Such objects would be parsed by the developer to use the information provided to trigger actions. For our current weather example, the developer could decide to make calls to a weather API specifying Paris as a location and the current date value to retrieve weather information.

Next Steps

Great you made it until the end of the article! If you feel confident with all the above we have more advanced NLU topics to create improved experiences with your voice assistants. Checkout our articles Confidence Scores and Timestamps and Dynamic Vocabulary using Entities Injection.