Introduction to Furhat skills
Before you start developing your first Furhat skill, let's go over the basics. We'll cover the vocabulary used in Furhat skills, differences between Furhat skills and other types of applications, competences you need to succeed developing on Furhat as well as a suggested process to start your skill development.
Social robot applications
Applications for social robots and specifically Furhat differ from applications on other platforms (mobile, smart speakers, desktop, web) in several ways. The two most important differences are described below.
Voice application specifics
For a social robot like Furhat, voice is a key control mechanism. While being extremely intuitive for humans to interact with compared to most other types of interfaces (graphical, touch), voice also comes with a few challenges;
Output capabilities need to be limited: We humans are very good at speaking but when it comes to listening, we have a restricted input due to our memory limitations. For example, a voice-system that attempts to inform a user of what pizza toppings are available quickly becomes overwhelming as the options grow beyond just a handful. Thus, when designing a voice system you have to make sure the output is graspable by your users by being careful when designing its output.
Note: To overcome this, graphical interfaces (see Adding a GUI) can be used in combination with the voice output to give the user a chance to view all the options on a screen, making it much easier for the user to take it all in.
Input is much more complex: humans have an immense expressiveness and it's literally possible to say most things in 100 different ways. This serves as one of the greater challenges in a voice interface and typically is something that you will have to continuously improve as you test your application with real users. The Furhat system comes with a sophisticated NLU (Natural language understanding) system that makes it easier to handle the richness of user utterances.
Situated application specifics
In contrast to most virtual systems, Furhat is an embodied agent and has a physical presence as well as awareness of his surrounding using a visual sensors. The Furhat system is under the hood fusing the different visual and audio inputs into a situation model - basically Furhat's map of the world. This enables Furhat to identify users and objects and attend them individually or as a group.
Different types of applications
A system like Furhat is versatile and can be used for almost any type of interaction. To make it easier to talk about human-robot interactions, we will go through a few key aspects of social robot applications:
Is input required?
This differentiates applications that are presentation only or interactive.
Single or multi party
Should the robot be able to cater to one user at a time or several? If several, the complexity grows a lot. Usually, it makes sense to start developing a skill as single-party and to add multi-party functionality at a later stage. This is usually possible unless the multi-party aspect is very key to the application.
Robot initiative, human initiative or mixed?
In other words, who drives the interaction - the robot, the user or is it passed between the two parties? A robot initiative interaction is often less complex than a human initiative interaction since you know what answers to expect when the robot asks the questions. A mixed initiative interaction is the most complex since this require you to worry about turn-taking, i.e when the robot should take the turn - take the initiative. This can for example be after a few seconds of silence, or to clarify something the user said.
Single or multi language?
Should the interaction be in one specific language or should it allow users to engage in many different languages? Language switching may also be a key concept of the application, for example in a language learning application.
Should the skill be autonomous? Users probably expect most robot applications to be autonomous, i.e work without a human controlling them, but this is often not the case. Many applications are wizarded, i.e controlled by an operator called wizard that is hidden from the user. This is a smart way to test out a skill in an early stage, before you have invested enough in the language model and automatic state transitions needed to automate the interaction.
Should the users be kept anonymous or do we want them to identify themselves? If they do identify themselves, how do we authenticate them if they are returning users? How do we manage personal information?
Competences needed to build Furhat applications - called skills
To successfully build skills for Furhat generally two types of roles are needed; content creators and developers. These roles can be the same person for smaller projects and prototypes, and likely split on several people while building larger real-life skills.
- Knowledge of WHAT we want to communicate
- Domain knowledge of user needs - what the users requirements would be.
- Knowledge of how these needs are catered to today.
- Ideas of how a social robot interface could add additional value.
- Knowledge of HOW we want to communicate to our users
- Voice and tonality
- Face, gestures
- Knowledge of domain-specific API's and integrations needed for the Furhat skill
- Experience from the Kotlin programming language and other languages on the JVM (Java, Groovy, Closure etc.)
- Experience from real-time systems with multiple inputs and outputs
- Experience building user-facing interfaces
- Experience building voice controlled interfaces
- Experience of building conversational interfaces, for example chat-bots
- Experience with NLU (Natural language understanding) tools and processes
Suggested process for skill building on the Furhat platform
Below is a suggested process for building skills on the Furhat platform. By no means it is a mandatory process, instead see it as a suggestion of steps to go through. The different steps are divided into two phases, an Explore-phase and a Design-phase.
Idenfity your users and their needs
This phase is similar to building applications in other domains, but important non-the-less, and resolves around asking questions like:
- Who are your users? Some interesting aspects may be; nationality, language proficiency, age, gender split, technical sophistication, experience using voice interfaces, concern of privacy.
- What problem are they facing today? Are the users looking for help? Are they facing too long waiting times?
- What are the current solutions to their problems? For an information retrieval type system, do they go to a homepage for information today? Do they stand in line for human customer service? For an elder companionship type system, do they get occasional social stimuli from friends and family, perhaps social wellfare workers? Do they occupy themselves with other, less personal technology?
- What are the limitations of the current solutions? Is it too costly? Too time consuming? Too low-tech? To high-tech? Too impersonal technology?
By answering questions like these, you will have a good foundation that will guide you through the design phase.
The goal of this phase is to generate ideas of how we can solve the problems and meet the needs of users as identified in the previous phase. This can be done in many different ways, one process could be;
- Brainstorm ideas: Go crazy and try to come up with innovative, out-of-the-box type solutions to the problems.
- Feasibility: Get an understanding of what the Furhat technology is and can do. And what it can not do. Social robotics is an early technology and at this point, it's important to align expectations. For example,
- General domain social artificial intelligence is quite a few years down the line.
- Hardware limitations on sensors make some setups tricky. Microphones and video cameras are getting better but still limitations exist. For example, running a multiparty interaction in a noisy environment without near-field microphones is very challenging.
- Evaluate the ideas: Decide what ideas makes would create most value to the users and also are technically feasible and go ahead with those.
Once you think you know your users and you have one or several ideas to test it's time to enter the design face. This is usually an iterative phase where you build a prototype and try to test it as soon as you can with internal or external users and repeat until you have reached a state that is good enough to release.
Some key steps that we recommend in the design phase are:
Examplify the interaction
The verbal conversation is likely going to play an important role in your skill. As a first step, think about the service you’re designing for and map out the desired conversation (or likely, conversations) that you want your users to be able to have with the robot. It can be helpful to write the conversation as a manuscript for a theatre play like below.
User: Hey Robot: Hello there, can I take your order? User: Ok, yes I want to order a hamburger with fries Robot: A hamburger with fries. Do you want a drink also? User: Yes, Coca cola. Robot: Anything else? User: No, thats all. Robot: Ok, that will be 10$ please. Do you pay with card? User: Yes Robot: Ok, use the terminal here ... Robot: Hi there, who are you here to meet? User: eh, Mark Robot: I have two Marks working here, Mark Peterson and Mark Smith. Which one are you here to see? User: Peterson Robot: Okay, I will let him know. Please have a seat over there while you wait ...
Add situated aspects of the interaction
Since the Robot is situated (i.e has a physical presence and is aware of his physical surrounding) also think about what and when the robot does non-verbal behaviors and changes his attention. Also, it adds another dimension to what can trigger an interaction compared to non-situated systems where voice commands are the only way to start an interaction.
Interaction triggers and exits
There are several ways to trigger an interaction:
- Users Entering: Furhat comes with a visual sensor making it possible to identify when users enter into its interaction space. This is a common way to start an interaction.
- Users saying something: Depending on the use-case, you might want to wait until the user addresses the robot before starting an interaction.
- Skill start: If you have an operator manually starting the skill, you can simply start the interaction on skill start. This is the easiest way to start off.
- Wizard action: If you have an operator wizarding the skill, the wizard will be starting and controlling the interaction from the wizard interface.
- Timer: You can have timer that for example triggers some action of the interaction.
- Other: You can call the skill from other systems or trigger it based on other system events than the ones above.
Adding interaction triggers to an interaction script might look like below:
<User Entering> Robot: Hi there, who are you here to meet? User: eh, Mark Robot: I have two Mark working here, Mark Peterson and Mark Smith. Which one are you here to see? User: Peterson Robot: Okay, I will let him know. Please have a sit over there while you wait <User Leaving>
Adding non-verbal behavior
Non-verbal behavior is a key aspect in making your interaction feel human and authentic. This is where the skill becomes more of an art than science.
Furhat supports non-verbal behavior in several ways:
- Gestures: Momentary gestures such as smiles, winks and gasps as well as lasting emotions such as happy, sad, excited. The gestures provided with Furhat can be easily extended with your own. Gestures can be blocking or non-blocking, i.e Furhat can smile before saying something or while saying something.
- Vocal gestures: Furhat comes with several TTS voice providers, each of which provides a set of voices in different dialects and genders. Each voice comes with different non-verbal voice gestures, such as "ehh", "ahh" "uhm", coughs, laughs yawns etc.
Adding non-verbal behavior to an interaction script might end up like below:
<User Entering> Robot: <ahh> Hi there /Smile/, who are you here to meet? User: eh, Mark Robot: /BrowRaise/ <ehh> I have two Marks working here, Mark Peterson and Mark Smith. Which one are you here to see? User: Peterson Robot: Okay, I will let him know. Please have a seat over there while you wait /Smile/ <User Leaving>
Draw interaction charts
Once you have an interaction script (or likely several for a larger skill), it is good to draw an interaction flow chart composing of States and transitions between them triggered by Events.
Start by building the simplest possible path through your skills.
Draw states Start to think about different sections of the interaction as States. For example, Greeting might be a good state, QueryForName might be another. States will be the key building block once you start building your skills.
Draw transitions In your chart, draw arrows meaning transitions between the states based on certain Events happening, such as the user saying a specific thing or a user entering.
Once the basic path is there, add different paths through the skill similar to the above.
Build your skill
For a tutorial of how to actually build a skill using FurhatOS, please see tutorials (first one here).
Test your MVP
Eventually the interaction is going to reach an MVP (minimum viable product) stage - a definition you have to decide to yourself, but typically it should include the minimum value proposition your users require. In other words, it should solve at least the bare minimum needs the users might have. Once you have an MVP, start testing it with users, learn from it and keep improving the interaction!