The recently introduced Amazon Echo Show provides developers new opportunities to develop skills that integrate voice control, visual feedback, and tactile input. David Isbitski, Amazon chief evangelist for Alexa, summarized the key points of developing Alexa skills for the Echo Show.
Custom skills for Echo Show may use four types of interactions:
- Voice, which remains the primary means of interaction.
- Alexa app, used to display additional information through cards in a mobile or web app.
- Screen display, which allows to display custom content. Cards sent to the Alexa app are by default shown on the display.
- Screen touch, which makes it possible to react to touch actions.
The first step to support multimodal interfaces is enabling the Render Templates option for you skill, which can be done in the Skill Information page. Two templates are available:
- A
body
template, which displays images and text. - A
list
template, which display a scrollable list of items.
To properly support all available Alexa devices in your skill implementation, you should check for a device’s supported interfaces. This can be done by checking event.context.System.device.supportedInterfaces
coming with the Alexa request. For example, this is how an Alexa request looks like when the Display
, AudioPlayer
, and VideoApp
interfaces are available:
{
"context":{
"device":{
"supportedInterfaces":{
"Display":{},
"AudioPlayer": {},
"VideoApp":{}
}
}
}
}
Once you know the device you are running on supports a display, you can display content by including it in the Display.RenderTemplate
directive of your response. For example, you can display a text and an image using a body template named BodyTemplate1
by including the following:
{
"directives": [
{
"type": "Display.RenderTemplate",
"template": {
"type": "BodyTemplate1",
"token": "CheeseFactView",
"backButton": "HIDDEN",
"backgroundImage": ImageURL,
"title": "Did You Know?",
"textContent": {
"primaryText": {
"type": "RichText",
"text": "The world’s stinkiest cheese is from Northern France"
}
}
}
}
]
}
Another new feature provided by Echo Show is video Playback, which can be enabled with the corresponding option in the Skill Information page. To start video playback, you include in your response the VideoApp.Launch
directive, as shown below:
"response": {
"outputSpeech": null,
"card": null,
"directives": [
{
"type": "VideoApp.Launch",
"videoItem":
{
"source": "https://www.example.com/video/sample-video-1.mp4",
"metadata": {
"title": "Title for Sample Video",
"subtitle": "Secondary Title for Sample Video"
}
}
}
],
"reprompt": null
}
Finally, touch input can be handled by means of a number of predefined intents, such as, AMAZON.ScrollUpIntent
, AMAZON.ScrollLeftIntent
, etc., which will trigger the execution of your custom code associated to them.
For a list of all possibilities offered by Echo Show to developers, make sure to check Isbitski’s post.