A form-filling Pizza ordering skill

Introduction

In this tutorial, you will learn how to use form-filling to create a flexible order-handling skill, in this example we will be impersonating a Pizza seller. For a full code-example, please see PizzaOrder on Github.

This tutorial assumes that you:

Form-filling and how to use it for order-taking

In this tutorial, we want to walk you through the design-pattern called form-filling, an efficient tool to capture a big variance of user input. This pattern can be used in a multitude of interactions where you want to allow the user to answer broadly and be able to capture additional information by filling in open "slots". To visualize this, we use our Pizza seller as an example. A few example utterances that we want to be able to capture are:

"I want a pizza with tomato and ham to my home today at 6pm"
"I would like a pizza to my office at 3 pm"
"I want a pizza"
"I want to order a pizza with bacon and ham"

With these examples as basis, we identify the slots of Entities that we need to fill in order to complete an order. This is an important principle of form-filling worth repeating; we need to have a data point from the user on each of the slots in our form. To start, we mark the Entities in our examples:

"I want a pizza with tomato (Topping) and ham (Topping) to my home (Place) today (Date) at 6pm (Time)"
"I would like a pizza to my office (Place) at 3 pm (Time)"
"I want a pizza"
"I want to order a pizza with bacon (Topping) and ham (Topping)"

Our first example shows a "perfect" order since it captures all slots we are interested in. It's very unusual however that a user does this, it's more likely that they miss out a few slots or in some cases all of them - as our other three examples show.

How form-filling works is that we first identify our intent - in this case ordering a pizza, after which we fills all data slots we can. We then move through the missing/empty slots one at a time until we have a complete order. A form-filling dialogue for our Pizza order skill might look like this:

U: I want to order a pizza with bacon and ham
F: Ok, a pizza with bacon and ham
F: Where do you want it delivered?
U: To my home
F: Ok, to home
F: When do you want it delivered?
U: At 6pm
F: Ok, at 6pm
F: You want to order a pizza with bacon and ham to your home at 6pm.
F: Is this correct?
U: Yes
...

As explained above, Furhat will here ask follow-up questions until he knows an answer for each of the slots (or more technically, for all entities in the intent).

Implementing the order-taking intent

We'll start by defining our OrderPizza intent as follows:

class OrderPizza : Intent(), TextGenerator {

    var count : Number = Number(1)

    var topping : ListOfTopping? = null

    var deliverTo : Place? = null

    var deliveryTime : Time? = null

    var deliveryDate : Date? = null

    override fun getExamples(lang: Language): List<String> {
        return listOf(
                "I would like a pizza to my office at 3 pm",
                "I want a pizza tomorrow",
                "I want to order a pizza with bacon and ham")
    }

    override fun toText(lang : Language) : String {
        return generate(lang, "${if (count.value?:1>1) "${count.value} pizzas" else "a pizza"} [with $topping] [delivered $deliverTo] [$deliveryDate] [$deliveryTime]")
    }

    override fun toString(): String {
        return toText()
    }
}

Note how we are also letting our intent implement the TextGenerator interface and implement a toText() method. Together with the Kotlin built-in toString() method, this allows us to get a pretty text representation of our intent based on filled in slots. We use this to repeat back the order to the user.

Catching the Pizza order intent and saving it to the user

To catch this intent, repeat it back the order to the user and finally saving it to the user object, we do like this:

onResponse<OrderPizza> {
  furhat.say("Okay, you want ${it.intent}")
  users.current.order.adjoin(it.intent)
  goto(CheckOrder)
}

Now, you might wonder where we define the order of the current user (users.current.order) as well as what the method adjoin() does. We associate the user to an order through an extention variable to the User by using a delegate (an advanced Kotlin concept - interested developers can read more in the kotlin docs):

val User.order by NullSafeUserDataDelegate { OrderPizza() }

Adjoin is a method on the Record() class - a JSON like data class that most classes in the Furhat SDK inherits. The method joins two data objects, overwriting any existing field if a new value exists. For example, if you have the existing order:

OrderPizza {
  topping: ["ham", "bacon"]
  deliverTo: "home"
  deliveryDate: "today"
  deliveryTime: "5 pm"
}

and you adjoin it with

OrderPizza {
  topping: ["tomato", "bacon"]
  deliverTo: "office"
  deliveryDate: "tomorrow"
}

the result will be

OrderPizza {
  topping: ["tomato", "bacon"]
  deliverTo: "office"
  deliveryDate: "tomorrow"
  deliveryTime: "5 pm"
}

In other words, all fields will be overwritten except deliveryTime which will remain.

Defining the states for form-filling

State checking the current slots of the order

The next crucial part of a form-filling design is a state checking all slots. We define it like this (With when {} instead of cascading if/else as Kotlin style guide suggests):

val CheckOrder = state {
  onEntry {
    val order = users.current.order
    when {
      order.deliverTo == null -> goto(RequestDelivery)
      order.deliveryTime == null -> goto(RequestTime)
      order.topping == null -> goto(RequestTopping)
      else -> goto(ConfirmOrder)
    }
  }
}

This should be fairly straight forward at this point, we simply check if we have any un-filled slots and if so go to slot-filling states where we request this data point.

Our first slot-filling state

Let's define our first slot-filling state, RequestDelivery as follows:

val RequestDelivery : State = state(parent = General) {
    onEntry() {
        furhat.ask("Where do you want it delivered?")
    }

    onResponse<RequestOptions> {
        raise(TellDeliveryOptions())
    }

    onResponse<TellPlace> {
        furhat.say("Okay, ${it.intent.deliverTo}")
        users.current.order.deliverTo = it.intent.deliverTo
        goto(CheckOrder)
    }
}

We note right away that we're inheriting an General state that we haven't defined yet. The reason is that we want to abstract some handlers that we want to use in more than one place, such as our OrderPizza intent handler.

In addition, we note that we catch the (built-in) RequestOptions intent, matching questions like "what are the options?", and then raise TellDeliveryOptions event. This is our first example handling of a context-dependent intents. When in this context, after asking the user where she wants the pizza delivered, the user utterance "what are the options?" implicitly refers to delivery options (as opposed to for example pizza topping options). When in a broader context, this utterance likely has a different meaning. This will be cleared when we look at the parent state General below.

Thirdly, when we catch the intent TellPlace using the intent Place:

class TellPlace : Intent() {

    var deliverTo : Place? = null

    override fun getExamples(lang: Language): List<String> {
        return listOf("home", "to my home")
    }
}

class Place : EnumEntity() {

    override fun getEnum(lang: Language): List<String> {
        return listOf("home", "office")
    }

    override fun toText(lang: Language): String {
        return generate(lang, "to your $value");
    }
}

The place is finally saved on the user's order object after which the skill transitions back to the CheckOrder form-filling state.

Parent state for common answers

The above mentioned parent state used to abstract common answers is defined as:

val General: State = state(Interaction) {
    onResponse<RequestDeliveryOptions> {
      raise(TellDeliveryOptions())
    }

    onEvent<TellDeliveryOptions> {
        furhat.say("We can deliver to your home and to your office")
        reentry()
    }

    onResponse<RequestOpeningHours> {
        raise(TellOpeningHours())
    }

    onEvent<TellOpeningHours> {
        furhat.say("We are open between 7 am and 8 pm")
        reentry()
    }

    onResponse<RequestToppingOptions> {
        raise(TellToppingOptions())
    }

    onEvent<TellToppingOptions> {
        furhat.say("We have " + Topping().optionsToText())
        reentry()
    }

    onResponse<OrderPizza> {
        furhat.say("Okay, you want ${it.intent}")
        users.current.order.adjoin(it.intent)
        goto(CheckOrder)
    }
}

Here we see that for have a pair of onResponse and onEvent handlers for each type of query. Starting from the top, we have:

  • Another onResponse handler raising the same TellDeliveryOptions event we did in our RequestDelivery state, this time with a more specific intent RequestDeliveryOptions defined like:
class RequestDeliveryOptions : Intent()  {
  override fun getExamples(lang: Language): List<String> {
    return listOf("where can you deliver")
  }
}
  • Next, we see that we have an onEvent<TellDeliveryOptions> handler that catch the above mentioned event. To be extra clear, this is the handler that catches both the previously raised events (both here in General and in the RequestDelivery state), answers the query and then reenters.

  • Next, we find a similar pattern with two handlers for opening hours.

  • After that, we have a similar pattern with two handlers for topping options. This time, we took the chance to introduce a new method available for EnumEntity, namely optionsToText() which will return the options comma-separated with an "and" before the last entry - i.e for example "ham, bacon and tomato" for a Topping entity.

  • Finally, we note that we have our previously defined onResponse<OrderPizza> handler. We want to put this here since user's might answer a specific question about for example delivery options with additional slots, for example "I want the pizza delivered to my home at 3pm" in which case we want to catch the time information in addition to the delivery address.

Additional slot-filling states

=======================================================================

// Request delivery time val RequestTime : State = state(parent = OrderHandling) {

onEntry() {
    furhat.ask("At what time do you want it delivered?")
}

onResponse<RequestOptions> {
    raise(TellOpeningHours())
}

onResponse<TellTime> {
    furhat.say("Okay, ${it.intent.time}")
    users.current.order.deliveryTime = it.intent.time
    goto(CheckOrder)
}

}

// Request toppings val RequestTopping : State = state(parent = OrderHandling) {

onEntry() {
    furhat.ask("Any extra topping?")
}

onResponse<RequestOptions> {
    raise(TellToppingOptions())
}

onResponse<Yes> {
    furhat.ask("What kind of topping do you want?")
}

onResponse<No> {
    furhat.say("Okay, no extra topping")
    users.current.order.topping = ListOfTopping()
    goto(CheckOrder)
}

onResponse<AddTopping> {
    furhat.say("Okay, ${it.intent.topping}")
    users.current.order.topping = it.intent.topping
    goto(CheckOrder)
}

}

Thus, we can focus on the active part of the interaction, starting with the Start state.

Fetching question from user

To start off our skill, once the interaction has started we simply want to greet the user and ask if he/she has a question for us. Then, we want to catch the response we get, with special handlers for Yes and No intents.

In this tutorial we will let Wolfram Alpha handle any speech response we get, i.e. we will not use any natural language processing to preprocess the input to try to determine if it is a question or not. Thus, we catch a "naked" onResponse where we will do the API call.

We start of like this:

val Start : State = state(Interaction) {

    onEntry {
        furhat.ask("Hi there! Do you have any question?")
    }

    onResponse<Yes>{
        furhat.ask("What is it?")
    }

    onResponse<No>{
        furhat.say("Okay, no worries")
        goto(Idle)
    }

    onResponse {
        // Handle question here
    }
}

For now, nothing special is happening here. The last onResponse will catch any response that has not already been caught (i.e the Yes and No intents). If you want to test this, you can simply have Furhat repeat the input with furhat.say(it.text) where it refers to the response object or print the same using println(it.text).

Getting access to the Wolfram Alpha API

Now, we head over to Wolfram Alpha and check out their API explorer for Spoken results.

We note that API requests are of the following format https://api.wolframalpha.com/v1/spoken?i=How+old+is+Michael+Jordan%3F&appid=DEMO

We see that we need to pass the question and an application id in a query-string with an i parameter with the question delimited by "+" and a appid parameter with the application id. We create a free account and acquire our app id and are then ready to roll!

Setting up a HTTP library - khttp

Next up, we realize we need a http library to call the API. You can use whatever you prefer here, but for this tutorial we will use khttp - a simple and neat Kotlin http library (similar to Python's requests module according to the author).

We see that khttp is available on Jcenter (which already exist in the default skill template's repositories list) so we only need to add it as a dependency to add it to our skill. This is done with the following line on the bottom dependencies (i.e. not to the buildscript { ... }) part of our build.gradle file:

dependencies {
    // ...
    compile 'khttp:khttp:0.1.0'
}

Gradle will now fetch the library for you.

Querying the API

Querying the API with khttp is easy, we just have to make sure we build the right url and then use khttp.get(url).text to do the GET request and get the answer as a text String.

To build the URL we define a few constants:

val BASE_URL = "https://api.wolframalpha.com/v1/spoken"
val APP_ID = "YOUR_APP_ID_HERE"

We then build the url and the query. We have to do some string manipulation since the API requires words to be "+"-separated. Assuming that the API parses "+" and "plus" as the same, we replace any "+" signs with "plus" and then in turn replace all spaces with "+". We then patch together our query url:

val question = it.text.replace("+", " plus ").replace(" ", "+")
val query = "$BASE_URL?i=$question&appid=$APP_ID"

The API call, along with replying to the user with the given response from the API, is then done with:

// Call API
val response = khttp.get(query).text

// Reply to the user with the given response and allow them to ask a followup question
furhat.say(response)
furhat.ask("Anything else?")

Since API calls might take time, we want to do things to make the interaction better. First, we want to add a filler speech and gesture to signal to the user that an answer is coming shortly. So, before we do the API call, we do this:

// Filler speech and gesture
furhat.say({
    +"Let's see"
    +Gestures.GazeAway
}, async = true)

// API call
val response = khttp.get(query).text

// ...

This syntax is using an (inline definition of a) utterance, which combines a say and a gesture in one command and then in this case, executes them asynchronously (through the async = true parameter) since we want the API-call to be run immediately after.

Secondly, we want to handle potential timeouts of the API, or slow response times. This can be done in several ways but we recommend using a specific state for the API-call together with an anonymous sub-state for the actual API-call and a timer that cancels the call if the API-call hasn't returned within a certain time.

Our response handler that takes the user input now looks like this:

onResponse {
  // Filler speech and gesture
  furhat.say({
      +"Let's see"
      +Gestures.GazeAway
  }, async = true)

  // Query done in query state below, with its result saved here since we're doing a call
  val response = call(Query(it.text)) as String

  // Reply to user
  furhat.say(response)
  furhat.ask("Anything else?")
}

Instead of doing the API call, we are calling a state Query with the user input and then saving the returned value of this state call in a response variable. Note that the returned value needs to be cast to a String (with as String) since Kotlin's type inference doesn't work here.

Our Query state is now defined as follows:

// Constant for timeout of API call
val TIMEOUT = 4000

fun Query(question: String) = state {
  onEntry {
    // Query building
    val question = question.replace("+", " plus ").replace(" ", "+")
    val query = "$BASE_URL?i=$question&appid=$APP_ID"

    // Calling API (in an anonymous sub-state)
    val response = call {
      khttp.get(query).text
    } as String

    // Return the response
    terminate(response)
  }

  onTime(TIMEOUT) {
    // If timeout is reached, we return an error
    terminate("I'm having issues connecting to my brain. Try again later!")
  }
}

The call to the API is here made in an anonymous sub-state (https://docs.furhat.io/flow/#calling-anonymous-states) through the call { ... } to allow our timeout (onTime(TIMEOUT)) to stop the call if it takes to long. This is needed since otherwise the API-call would block any other event from happening which would cause the system to be unresponsive until a response is received. Note that here as-well, you explicitly have to cast the result of the state to a String.

Once we get a result, we return it to the caller state through the terminate call.

The onTime, that currently uses the TIMEOUT constant set to 4 seconds, makes sure we don't get stuck here if the API stops responding. Once it executes, it will terminate the state (including the called anonymous sub-state) providing a hard-coded error message.

Error handling

Finally, we want to catch some of the error messages that Wolfram Alpha returns if they can't parse the question or can't answer it.

We add two known error strings to a list and checks if the returned response matches, in which case we reply with a custom error message.

val FAILED_RESPONSES = listOf("No spoken result available", "Wolfram Alpha did not understand your input")

// ...

// Api Call
val response = call {
  khttp.get(query).text
} as String

// Error handling
val reply = when {
  FAILED_RESPONSES.contains(response) -> {
    println("No answer to question: $question")
    "Sorry bro, can't answer that"
  }
  else -> response
}

// Return the response
terminate(reply)

This concludes this tutorial. For a working code-example, please see Github.