Tutorial: Integrating with OpenAI

Introduction

In this tutorial, you will learn how to integrate your skill with OpenAI's chat completion service.

As you explore integrating large language models with the Furhat social robot, ponder the profound questions of trust, human emotions, and empathy. AI-driven interactions can significantly impact human perceptions and emotions. While this tutorial focuses on technical integration, it's vital to remember that social robot interactions can yield unexpected responses in people with real consequences.

We encourage you to approach this subject responsibly, but also with curious playfulness and share your insights with the community of researchers and explorers to enhance our collective understanding of human-AI-social robot dynamics.

This tutorial assumes that you:

  • have completed the first tutorial
  • have a working and running SDK
  • have an OpenAI account with a service key

Creating the skill

Follow the instructions on how to create a skill with the basic template. Name it for example OpenAISkill.

Adding dependencies

To integrate with the OpenAI service, we will use the Simple-OpenAI library. Open build.gradle and add to the dependencies section:

    implementation 'io.github.sashirestela:simple-openai:3.8.2'

Make sure to press the Gradle reload button after making changes to build.gradle.

Writing a utility function

We start by defining a utility function getDialogCompletion that we can call from the flow. To use the OpenAI text completion service, we need to simply feed it with a prompt. The prompt should contain a description of the interaction (instructions for the robot), as well as the dialog history.

Create a new file called openai.kt with the following contents:

import furhatos.flow.kotlin.DialogHistory
import furhatos.flow.kotlin.Furhat
import io.github.sashirestela.openai.SimpleOpenAI
import io.github.sashirestela.openai.domain.chat.ChatMessage
import io.github.sashirestela.openai.domain.chat.ChatRequest

/** Open AI API Key **/
val serviceKey = "YOUR_API_KEY"

val systemPrompt = "You are chatty robot. You should speak in a conversational style. Never say more than two sentences."

val openAI = SimpleOpenAI.builder()
    .apiKey(serviceKey)
    .build();

fun getDialogCompletion(): String? {
    val chatRequestBuilder = ChatRequest.builder()
        .model("gpt-4o-mini")
        .message(ChatMessage.SystemMessage.of(systemPrompt))

    Furhat.dialogHistory.all.takeLast(10).forEach {
        when (it) {
            is DialogHistory.ResponseItem -> {
                chatRequestBuilder.message(ChatMessage.UserMessage.of(it.response.text))
            }
            is DialogHistory.UtteranceItem -> {
                chatRequestBuilder.message(ChatMessage.AssistantMessage.of(it.toText()))
            }
        }
    }

    var futureChat = openAI.chatCompletions().create(chatRequestBuilder.build())
    var chatResponse = futureChat.join()
    return chatResponse.firstContent().toString()
}

As can be seen, to build up the dialog history in the prompt, we use the Furhat.dialogHistory object. The last 10 turns of the dialog are included in the prompt. You can also try the different models that OpenAI have available. Here we use gpt-4o-mini.

Don't forget to replace YOUR_SERVICE_KEY with your OpenAI service key.

Calling the utility function from the flow

Open up greeting.kt, which is the main interaction state in the blank template. Replace the contents of the Greeting state with the following (you will also have to add relevant imports):

val Greeting: State = state(Parent) {
    onEntry {
        furhat.ask("Hi there")
    }

    onResponse<Goodbye> {
        furhat.say("Goodbye")
        goto(Idle)
    }

    onResponse {
        val robotResponse = call {
            getDialogCompletion()
        } as String?
        furhat.ask(robotResponse?:"Could you please repeat that")
    }

    onNoResponse {
        furhat.ask("Sorry, I didn't hear anything")
    }
}

We now have a running loop which creates a dialog where Furhat simply listens for speech from the user and then uses OpenAI's text completion to generate a response, conditioned on the dialog that unfolds.

Note that the getDialogCompletion() call is embedded in a call {...} construct. The reason for this is that we want to avoid blocking calls in the flow, and the call to OpenAI might take a few seconds. The call {...} construct makes sure the flow stays responsive. You can read more about this here.

The onResponse<Goodbye> handler will only trigger if the Goodbye intent is detected. Otherwise, the generic onResponse handler will be triggered, calling the OpenAI service. Note that you can also use this to create hybrid interactions, where parts of it are scripted through states and parts are generated through OpenAI. You can also add other response handlers with intents (but make sure to place them before the generic one) if you want to have specific behaviors triggered by certain intents.

Dealing with response delays

The response from OpenAI is not very fast and this might cause some confusion for the user. One way to deal with this is to generate a turn filler before the answer is ready.

    onResponse {
        furhat.say(async = true) {
            +Gestures.GazeAway
            random {
                +"Let's see"
                +"Let me think"
                +"Wait a second"
            }
        }
        val robotResponse = call {
            getDialogCompletion()
        } as String?
        furhat.ask(robotResponse?:"Could you please repeat that")
    }