The Snips Platform also has support for multiple audio input/output. It supports configurations with one device running all the components of the platform and several smaller devices running only the audio server for audio I/O. These little devices can be distributed in various rooms of a house, for example (see next figure). The main one can handle several concurrent interactions, coming from various audio servers, without loosing the context. The way it works at the protocol level is described in the next figure. Here we have two audio servers, one in the bedroom, one in the kitchen, and the main one (the ”Hub” device) running the hotword detector, the ASR, the Dialogue Manager and the Handler code is in the living room. To be distinguished, each audio server has an identifier, a siteId; in our case these are ”bedroom” and ”kitchen”. Each audio server streams audio frames to the hotword detector, and ASR in the main device.
The woman says the hotword in the bedroom
The hotword detector in the Hub detects the hotword in the audio stream coming from the audio server ”bedroom”, thus it posts a detected message, with the ”bedroom” siteId in the payload of the message.
The Dialogue Manager handles the conversation, the ASR for this session will also use the voice coming from audio server with the same siteId (”bedroom”).
At the end, the Dialogue Manager will post the
intent/<intentName> message for the Handler code with the original siteId in the payload of the message.
Note that the hotword detector also has an ID that identifies it. In the previous example, it is just default because there is only one detector but if you prefer to have a hotword detector component embedded in each audio source, to minimize energy consumption, for example, the configuration will look like the one represented below.