3D Gesture Interaction for TV: A Wizard of Oz Behavioral Prototype

Design Challenge:

The challenge for this assignment was to build and test a behavioral prototype for a gestural user interface for a TV system. They system had to allow for basic video function controls (play, pause, stop, fast forward, rewind, etc.) Our team chose to pursue a 3D gesture system. The goal of our prototype and evaluation was to explore the following design research and usability questions:

  • How can a user effectively control video playback using hand gestures?
  • What are the most intuitive gestures for this application?
  • What level of accuracy is required in this gesture recognition technology?


We began by discussing how to set up our prototype to allow for quick iteration and modification. Two key elements to our prototype was 1. figuring out how to manipulate the video content to simulate the controls carried out by a user’s 3D gestures outside of the user’s sight and 2. how to provide feedback to the user while carrying out a specific gestures.

Initially we decided to use Google Chrome Cast to wirelessly control the television through a laptop. However, we soon realized that running Chrome Cast using a Mac had some glitches and delays that would negatively impact our test. We opted instead to use a mini display port to VGA adapter to connect the laptop directly into the television set. This allowed us to control the video in real time without any perceived delay from the user.

To simulate the feedback, we used a laser pointer aimed at the television that mimicked the user’s gesture. We also used this to indicate to the user when the gestures were outside of “the system’s” visible range.

Next, we created three sets of gesture styles to test, which included an open palm style, a fist style, and a thumb style of interaction. 

Palm Instructions-01.png
Fist Instructions-01.png
Thumb Instructions-01.png

With these three varied sets of gestures we wanted to understand how users would perceive the required actions, how easy they would be to conduct, how intuitive each set was, how easy it would be to remember each set of gestures, and whether or not a user would prefer one set to the others.

Next, we decided on roles. We needed a Wizard/operator, a moderator, and a laser feedback controller. We also needed to recruit participants and find a location to conduct the test in the context of a home living room, where a television would likely be set up “in the wild”. 

Setting up our prototype

Setting up our prototype

In our final set up used for testing users in context, we had the moderator and user sitting on a sofa facing the television. The operator sat facing the user able to clearly see each of the user’s gestures, but out of the user’s direct attention. The laser feedback controller also stood out of the user’s attention facing the television in order to be able to recreate the user’s gestures while pointing the laser into the television. A camera set on a tripod was set up behind the user to simultaneously capture the user’s gestures and television response.

The user was given three sets of printed instructions describing each of the three gesture sets, which could be referenced at anytime during the test. A Kinect device was added to the television set up as a prop to represent the way in which the system would capture the user’s gesture input. 

Behavioral Prototype Set Up.png

For the evaluation itself, we put together a script describing what we were testing. We asked the user to review each set of instructions before performing the following predefined tasks while watching an episode of Sherlock:

  • Start the video
  • Fast forward to wedding scene and play the video from there
  • Pause the video
  • Rewind to the scene where Sherlock & Aunt are sharing tea
  • If you only wanted to see the last scene in the show, how would you get there?
  • Play again

We also encouraged the user to use a think aloud protocol, so that we could capture his thoughts during the test.

After testing all three gesture sets, we surveyed the user to gather feedback on how easy it was to remember the gestures, gesture preferences, whether or not there were gestures that were particularly awkward or difficult to conduct, our feedback mechanism, and any abnormal or unexpected behaviors.  We ended with an open feedback session to allow the user to share any additional thoughts. 


What worked well:

Testing in a real living room was helpful for understanding the context in which a user would be using the product. This helped us realize quickly that gestures would most likely be conducted while users were sitting; so recommended gestures needed to take this into account.

Regarding gesture types, the user found the palm style and fist style gesture sets equality intuitive, but found the fist style gestures less easy to remember. The user had a preference for “the palm style…for sure because it’s the most simple...and the instructions were super simple.”

The user also noticed and appreciated the feedback of the green laser light because it was a “nice visual to know that action is registering on the screen.” He also stated that it was “helpful for tracking his motion.”

What needed improvement:

Although, we felt that the thumb gestures could easily be recognized by they system because of the distinct directional cue provided by the thumb pointers created by the shape of the hand, it was not as well received by the user as the other gesture sets. The user found the rewind gesture for the thumb style set to be particularly awkward when using just the right hand, and introduced the idea of using the left hand for the same gesture.

The user felt that when performing fast forward and rewind using the fist style gestures, it was difficult to know how far to stretch his arm or how sensitive the system would be. 

Although, the palm style set of gestures were preferred by the user, we noticed while analyzing the video that when the user repeated the fast forward gesture quickly, the motion could easy be misinterpreted for fast forward, rewind, fast forward, rewind, fast forward. Although the wizard knew the user’s intention and could operate the video correctly, it could be difficult for “the system” to distinguish the direction intent of the user.

We also realized through the testing that one of the main challenges, for all parties – user, Wizard, and laser operator alike, was understanding when the user’s actions were out of range. Our team discussed the possibility of providing some sort of calibration to find a users midpoint and mapping the video content timeline directly to the distance between the users hands when performing the most extreme fast forward and rewind gestures to make this experience more accurate.

Since our Wizard of Oz prototype did not account for other possible moving objects in the space, we felt it would be important to consider how the system would deal with “noise” from other moving objects during viewing.


Overall, we felt that our behavior prototype was successful in providing a “quick and dirty” way to test assumptions about the 3D gesture interactions we developed. We learned a great deal not only about the user’s gesture preferences, but also about the level of accuracy that the gesture recognition technology would need to account for.