Realtime API Events

Speak

request.speak.text

Make the robot say something using text-to-speech.

Parameter	Type	Required	Default	Description
text	string	✅	—	The text to speak
abort	boolean	❌	false	Abort any earlier planned or ongoing speech. If false, new speech is queued
monitor_words	boolean	❌	false	Whether to send back `response.speak.word` while speaking

request.speak.audio

Make the robot play/say some audio from a URL. The audio must be in WAV format (not MP3).

Parameter	Type	Required	Default	Description
url	string	✅	—	The url from which to play audio
text	string	❌	"AUDIO"	The corresponding text (for display purposes)
lipsync	boolean	❌	true	Whether to perform lipsync
abort	boolean	❌	false	Whether to abort any earlier planned or ongoing speech

request.speak.stop

Make the robot stop speaking and/or abort any planned speech, or speech being generated.

response.speak.start

The robot has started speaking.

Parameter	Type	Required	Default	Description
text	string	✅	—	The synthesized text
gen_time	number	✅	—	Time to synthesize (ms)

response.speak.end

The robot has stopped speaking.

Parameter	Type	Required	Default	Description
text	string	✅	—	The synthesized text (what was actually said if aborted)
aborted	boolean	❌	false	Whether the speech was stopped prematurely
failed	boolean	❌	false	Whether the speech synthesis failed

response.speak.word

Sent while speaking, if monitor_words is true.

Parameter	Type	Required	Default	Description
word	string	✅	—	The word spoken
index	number	✅	—	The index of the word spoken

Speak (Streaming)

request.speak.audio.start

Start speaking from streaming audio.

Parameter	Type	Required	Default	Description
sample_rate	int	❌	16000	The sample rate of the audio data
lipsync	boolean	❌	true	Whether to add automatic lipsync

request.speak.audio.data

Send audio data for streaming speech.

Parameter	Type	Required	Default	Description
audio	string	✅	—	Base64-encoded audio data, 16-bit, mono, little-endian

request.speak.audio.end

Marks the end of audio data transmission. This should be sent after all audio chunks have been sent.

request.speak.stop

Stop speaking immediately. This command can be used to halt audio playback at any time.

response.speak.end

This event is sent back when all audio has been played.

response.speak.audio.buffer

This event is sent back with the current audio buffer status while audio is being played.

Parameter	Type	Required	Default	Description
played	int	❌	—	The number of bytes of audio played
received	int	❌	—	The number of bytes of audio received

Listen

request.listen.config

Configure speech recognition.

Parameter	Type	Required	Default	Description
languages	list	❌	["en-US"]	List of languages to listen for
phrases	list	❌	[]	List of words or phrases to listen extra carefully for

request.listen.start

Make the robot listen for speech. If it is already listening, it will reset the current result.

Parameter	Type	Required	Default	Description
partial	boolean	❌	false	Whether to produce results while user is speaking
concat	boolean	❌	true	Whether to concatenate results during the same listening session
stop_no_speech	boolean	❌	true	Stop listening if user silent for too long (no_speech_timeout)
stop_robot_start	boolean	❌	true	Stop listening when robot start speaking
stop_user_end	boolean	❌	true	Stop listening when end-of-speech is detected
resume_robot_end	boolean	❌	false	Resume listening when robot stopped speaking
no_speech_timeout	number	❌	8.0	Timeout to use for stop_no_speech (seconds)
end_speech_timeout	number	❌	1.0	Amount of silence needed to detect end-of-speech (seconds)

request.listen.stop

Force the robot to stop listening.

response.listen.start

The robot has started listening for speech.

response.listen.end

The robot has stopped listening for speech.

Parameter	Type	Required	Default	Description
cause	string	✅	—	The reason for stopping: 'stopped', 'robot_speak', 'speech_end', 'silence_timeout'

response.hear.start

The user has started speaking, or re-started speaking if listening continued.

response.hear.end

The user has stopped speaking and speech is recognized.

Parameter	Type	Required	Default	Description
text	string	✅	—	Recognized speech

response.hear.partial

The robot has recognized partial speech from the user. These events are only sent if partial results are requested.

Parameter	Type	Required	Default	Description
text	string	✅	—	Recognized speech

Voice

request.voice.config

Set the current voice, using either voice id, name, language, gender, or provider (or a combination).

Parameter	Type	Required	Default	Description
voice_id	string	❌	—	Voice ID
name	string	❌	—	Voice name (partial, case insensitive)
provider	string	❌	—	Voice provider (partial, case insensitive)
language	string	❌	—	Voice language (e.g. en-US)
gender	string	❌	—	Voice gender (male/female)
input_language	boolean	❌	true	Set input language to same as voice

request.voice.status

Gets the current and available voices.

Parameter	Type	Required	Default	Description
voice_id	boolean	❌	true	Current voice id
voice_list	boolean	❌	true	List of available voices

response.voice.status

Returns current and available voices.

Parameter	Type	Required	Default	Description
voice_id	string	❌	null	Current voice id
voice_list	list	❌	null	List of available voices

Attention

request.attend.user

Make the robot attend to a user. If the user is lost, the robot will attend nobody. If the user comes back later, the robot will attend to that user again. If the 'closest' user is specified, the robot will always attend the closest user.

Parameter	Type	Required	Default	Description
user_id	string	❌	closest	The ID of the user to attend, or 'closest'
slack_pitch	number	❌	15	Max difference (deg) between head pitch and gaze pitch
slack_yaw	number	❌	5	Max difference (deg) between head yaw and gaze yaw
slack_timeout	number	❌	3000	Max time (ms) head direction can diverge from gaze (-1 is infinite)
speed	string	❌	medium	Speed of head movement (xslow, slow, medium, fast, xfast)

request.attend.location

Make the robot attend to a specific location, as specified in meters, relative to the robot.

Parameter	Type	Required	Default	Description
x	number	✅	—	Horizontal position
y	number	✅	—	Vertical position
z	number	✅	—	Distance from robot
slack_pitch	number	❌	15	Max difference (deg) between head pitch and gaze pitch
slack_yaw	number	❌	5	Max difference (deg) between head yaw and gaze yaw
slack_timeout	number	❌	3000	Max time (ms) head direction can diverge from gaze (-1 is infinite)
speed	string	❌	medium	Speed of head movement (xslow, slow, medium, fast, xfast)

request.attend.nobody

Make the robot attend to nobody.

response.attend.status

The robot's attention status has changed.

Parameter	Type	Required	Default	Description
target	string	✅	—	The target attention (nobody, location, closest, user-id)
current	string	✅	—	The current attention (nobody, location, user-id)

Gestures

request.gesture.start

Make the robot perform a gesture.

Parameter	Type	Required	Default	Description
name	string	✅	—	Name of the gesture (see API for values)
intensity	number	❌	1.0	Intensity of the gesture
duration	number	❌	1.0	Duration of gesture
monitor	boolean	❌	false	Whether to receive events when gesture starts and ends

response.gesture.start

Received when the gesture starts.

response.gesture.end

Received when the gesture ends.

Face

request.face.params

Set facial animation parameters directly.

Parameter	Type	Required	Default	Description
params	object	✅	—	Face parameters

request.face.headpose

Control the head pose of the robot, as specified in degrees along yaw, pitch and roll.

Parameter	Type	Required	Default	Description
yaw	number	❌	0	Yaw (sideways) angle in degrees
pitch	number	❌	0	Pitch (up/down) angle in degrees
roll	number	❌	0	Roll angle in degrees
relative	boolean	❌	false	Whether to use relative or absolute control
speed	string	❌	medium	Speed of head movement (xslow, slow, medium, fast, xfast)

request.face.config

Set the current mask and character (face_id), and/or face visibility.

Parameter	Type	Required	Default	Description
face_id	string	❌	KEEP	face ID
visibility	boolean	❌	true	Turn on or off face visibility
microexpressions	boolean	❌	true	Turn on or off facial microexpressions
blinking	boolean	❌	true	Turn on or off blinking
head_sway	boolean	❌	false	Turn on or off automatic minor head sways

request.face.status

Gets the current and available masks and characters (face_id).

Parameter	Type	Required	Default	Description
face_id	boolean	❌	true	Current face id
face_list	boolean	❌	true	List of available faces

request.face.reset

Resets all facial parameters to default.

response.face.status

Returns current and available masks and characters (face_id).

Parameter	Type	Required	Default	Description
face_id	string	❌	null	Current face id
face_list	list	❌	null	List of available faces

LED

request.led.set

Set the color of the LED.

Parameter	Type	Required	Default	Description
color	string	✅	—	Color in hex format

Users

request.users.once

Detect users once.

request.users.start

Start detecting users.

request.users.stop

Stop detecting users.

response.users.data

New users are detected, users are lost, or users move.

Parameter	Type	Required	Default	Description
users	list	✅	—	List of user objects, sorted by proximity

Audio

request.audio.start

Start capturing audio from the microphone and/or speaker. Per default, only microphone will be captured.

Parameter	Type	Required	Default	Description
sample_rate	int	❌	16000	The sample rate of the audio data to receive
microphone	boolean	❌	true	Whether to capture the microphone
speaker	boolean	❌	false	Whether to capture the speaker

request.audio.stop

Stops capturing audio from the microphone and/or speaker.

response.audio.data

Audio data.

Parameter	Type	Required	Default	Description
microphone	string	❌	—	Base64-encoded audio data, 16-bit, mono, little-endian
speaker	string	❌	—	Base64-encoded audio data, 16-bit, mono, little-endian

Camera

request.camera.once

Capture one frame from the camera.

request.camera.start

Start capturing video from the camera.

request.camera.stop

Stops capturing video from the camera.

response.camera.data

Camera video data.

Parameter	Type	Required	Default	Description
image	string	❌	—	Base64-encoded jpeg image

Realtime API Events ​

Speak ​

request.speak.text ​

request.speak.audio ​

request.speak.stop ​

response.speak.start ​

response.speak.end ​

response.speak.word ​

Speak (Streaming) ​

request.speak.audio.start ​

request.speak.audio.data ​

request.speak.audio.end ​

request.speak.stop ​

response.speak.end ​

response.speak.audio.buffer ​

Listen ​

request.listen.config ​

request.listen.start ​

request.listen.stop ​

response.listen.start ​

response.listen.end ​

response.hear.start ​

response.hear.end ​

response.hear.partial ​

Voice ​

request.voice.config ​

request.voice.status ​

response.voice.status ​

Attention ​

request.attend.user ​

request.attend.location ​

request.attend.nobody ​

response.attend.status ​

Gestures ​

request.gesture.start ​

response.gesture.start ​

response.gesture.end ​

Face ​

request.face.params ​

request.face.headpose ​

request.face.config ​

request.face.status ​

request.face.reset ​

response.face.status ​

LED ​

request.led.set ​

Users ​

request.users.once ​

request.users.start ​

request.users.stop ​

response.users.data ​

Audio ​

request.audio.start ​

request.audio.stop ​

response.audio.data ​

Camera ​

request.camera.once ​

request.camera.start ​

request.camera.stop ​

response.camera.data ​

Realtime API Events

Speak

request.speak.text

request.speak.audio

request.speak.stop

response.speak.start

response.speak.end

response.speak.word

Speak (Streaming)

request.speak.audio.start

request.speak.audio.data

request.speak.audio.end

request.speak.stop

response.speak.end

response.speak.audio.buffer

Listen

request.listen.config

request.listen.start

request.listen.stop

response.listen.start

response.listen.end

response.hear.start

response.hear.end

response.hear.partial

Voice

request.voice.config

request.voice.status

response.voice.status

Attention

request.attend.user

request.attend.location

request.attend.nobody

response.attend.status

Gestures

request.gesture.start

response.gesture.start

response.gesture.end

Face

request.face.params

request.face.headpose

request.face.config

request.face.status

request.face.reset

response.face.status

LED

request.led.set

Users

request.users.once

request.users.start

request.users.stop

response.users.data

Audio

request.audio.start

request.audio.stop

response.audio.data

Camera

request.camera.once

request.camera.start

request.camera.stop

response.camera.data