Listening

A fundamental part of a speech interaction aside from Speech is listening and recognizing user speech. This page describes how this is done in the Furhat system. For documentation of what roles listening plays in flows, see flow docs. For documentation of how the user speech is interpreted to some actionable meaning, see natural language understanding (NLU) docs.

Listening to and asking the user

Listening

To have Furhat listen, you use the listen command (assuming you have a microphone configured):

furhat.listen() // Listen with a default timeout of 8 seconds

furhat.listen(timeout = 4000) // Listen with a timeout of 4 seconds

Behind the scenes, this calls a listen-state where Furhat will start listening through the selected microphones and return once he receives some speech OR a timeout occurs. You catch the speech and no-response events with the onResponse respective onNoResponse handlers, which are the same as for the ask action, which is described below.

Asking

Asking (a.k.a prompting) is a short hand for combining a say and a listen. It will trigger the same events as the listen method.

furhat.ask("What is your name?") // Ask with a default timeout of 8 seconds

furhat.ask("How old are you? Now you have to answer fast", timeout = 4000) // Ask with a timeout of 4 seconds

Response handlers

onResponse - handling verbal responses from users

The onResponse handler is executed whenever a furhat.listen() or furhat.ask() is called and speech is picked up from a user.

It is also possible to add an Intent or Entity to the onResponse handler. If speech is caught, the Furhat system will use Natural language processing to parse the meaning of the utterance - for more info see the NLU documentation. Each Intent or Entity found in onResponse handlers throughout the state hierarchy is used to classify the user's intent. Similar to events, if a response is not caught it will propagate to following handlers in the same state, secondly to parent states and thirdly to calling states.

A response object has several parameters. The most important ones are:

Name Type Description
intent Intent object The (optional) intent that the utterance is classified to be.
text String The text spoken
userId String The id of the user who spoke
contains method A method to search an utterance for an entity
findFirst method A method to find the first of an entity
val MyState = state {
    onEntry {
        furhat.ask("What happens?")
    }

    onResponse<MyIntent> {
        furhat.say("I understood that you said ${it.text} and ment ${it.intent.toText()}")
    }

    onResponse { // Catches everything else
        furhat.say("I didn't understand that")
    }
}

onPartialResponse - handling multi-intent responses

Sometimes you want to catch multiple intents in one response. The response handlers makes it easy to do this when you want to support two intents, such as "Hi there, I would like to order a burger" (which could contain both a Greeting and an Order intent):

val MyState = state {
    onEntry {
        furhat.ask("How can I help you?")
    }

    onResponse<Greeting> { // Catches an isolated Greeting
        // Greet the user and reenter the state
        furhat.say("Hi there")
        reentry()
    }

    onPartialResponse<Greeting> { // Catches a Greeting together with another intent, such as Order
      // Greet the user and proceed with the order in the same turn
        furhat.say("Hi there")
        raise(it, it.secondaryIntent)
    }

    onResponse<Order> {
    /*
        Handle the order.
        This will be caught either if the user makes a direct Order
        or if it is triggered by the onPartialResponse above
    */
    }
}

Per default, the partial response has to preceed the other intent. If this should not be the case, you can pass a prefix=false parameter to onPartialResponse like onPartialResponse<Greeting>(prefix = false) { ... }. In this case, the intents can come in any order, i.e "I want to order a burger, hello" would also match.

onNoResponse handler

The onNoResponse handler is triggered when no audio was picked up from the user.

val MyState = state {
    onEntry {
        furhat.ask("What happens?")
    }

    onResponse<MyIntent> {
        furhat.say("I understood that you said ${it.text} and ment ${MyIntent}")
    }

    onResponse { // Catches everything else
        furhat.say("I didn't understand that")
    }

    onNoResponse { // Catches silence
      furhat.say("I didn't hear anything")
    }
}

onResponseFailed handler

Handler to catch the event when we have an error with the speech recognition. This is caught by the default Dialog state, but you can use this trigger if you want to override it.

Inline response handlers

Instead of adding the response handlers to the state, you can add them directly to ask:

val MyState = state {
    onEntry {
        var happy =
            furhat.ask("Are you happy?") {
                onResponse<Yes> {
                    terminate(true)
                }
                onResponse<No> {
                    terminate(false)
                }
            }
        if (happy) {
            furhat.say("You are happy")
        } else {
            furhat.say("You are not happy")
        }
    }
}

This will call a (hidden) state which has the furhat.ask() in the onEntry handler. Thus, you have to call terminate() to return from this state.

askFor

If you simply want to ask for a specific intent or entity, there is an efficient way of implementing it using askFor.

val MyState = state {
    onEntry {
        var date = furhat.askFor<Date>("Which date were you born?")
        furhat.say("You were born on $date")
    }
}

You can also add inline response handlers to askFor:

val MyState = state {
    onEntry {
        var date = furhat.askFor<Date>("Which date were you born?") {
            onResponse<DontKnow> {
                furhat.say("You should really know that!")
                reentry()
            }
        }
        furhat.say("You were born on $date")
    }
}

askYN

If you simply want to ask a yes/no question, there is an efficient way of implementing it using askYN, which returns a boolean.

val MyState = state {
    onEntry {
        var happy = furhat.askYN("Are you happy?")
        if (happy) {
            furhat.say("You are happy")
        } else {
            furhat.say("You are not happy")
        }
    }
}

You can also add inline response handlers to askYN:

val MyState = state {
    onEntry {
        var happy = furhat.askYN("Are you happy?") {
            onResponse<Maybe> {
                furhat.say("Make up your mind!")
                reentry()
            }
        }
        if (happy) {
            furhat.say("You are happy")
        } else {
            furhat.say("You are not happy")
        }
    }
}

Changing default responses

All Furhat skills comes with default response handlers that handles uncaught speech, silences and errors. The reason is that you always want to catch a response in the system to not get unexpected behavior. There may be examples when you don't want this to happen, in which case you are free to overwrite the methods. Another reason for doing this is if you want to change the default utterances (as listed below) or perhaps add support for other languages.

The recommendation is to do so in a superstate that all your interactive states inherit (if you used our default template when creating your skill, this would be the Interaction state defined in general.kt). It is as simple as implementing the same handlers and not propagating the response events further.

For your information, the implicit default state looks like this:

val dialogState = state(parent = null) {
   var noinput = 0
   var nomatch = 0

   onResponse {
       nomatch++
       if (nomatch > 1)
           furhat.say("Sorry, I still didn't understand that")
       else
           furhat.say("Sorry, I didn't understand that")
       reentry()
   }

   onNoResponse {
       noinput++
       if (noinput > 1)
           furhat.say("Sorry, I still didn't hear you")
       else
           furhat.say("Sorry, I didn't hear you")
       reentry()
   }

   onResponseFailed {
       furhat.say("Sorry, my speech recognizer is not working")
       terminate()
   }
}

Improving speech recognition with phrases

While today's speech recognition services usually perform very well in most languages, sometimes you will notice that they struggle picking up certain phrases. This may be brand-names, odd spellings or unusual words that sound quite similar to other words. Specifically, the recognition services will struggle with one word utterances, for example in a poker play if you are saying "fold". You'll notice that "I choose to fold" is much easier picked up.

To get around this, you can send a list of phrases (strings) to the recognizer with the method furhat.setSpeechRecPhrases(List<*>).

It is also possible to add these phrases directly to the NLU. EnumEntity allows you to automatically add certain phrases, by setting a flag in the constructor. In this example, all fruit words would be primed whenever this entity is being used in an active intent:

class Fruit : EnumEntity(speechRecPhrases=true) {

    override fun getEnum(lang: Language): List<String> {
        return listOf("banana", "orange", "apple", "pineapple", "pear")
    }

}

You can also let an Intent return specific words which could be hard to recognize:

class Greeting : Intent() {

    override fun getExamples(lang: Language): List<String> {
        return listOf("how are you", "how do you do", "howdy")
    }

    override fun getSpeechRecPhrases(lang: Language): List<String> {
        return listOf("howdy")
    }

}