New Project Plan for Gesture Control

10.12.2015 |

I like to call this point in my thesis my “mid-thesis life crisis.” I believe it is fitting. Up until now, I have been working on testing touch and gesture control on three screen sizes: small (iPhone), medium (laptop), large (tv/projector). I started working with gesture first, using the Myo armband, because it was the least explored interaction method of the two. My goal was to see how device size impacted the interaction method. For gesture, I realized I was only testing how device size impacted the Myo’s gesture scheme. Right now, there isn’t a standard for gesture control. Do we use our fingers? Do we use both arms? Is our head, or our whole body, involved too? With so many things uncertain, what type of gesture should I test then?

There is only 9 month left to finish my master’s degree. If I want to test how screen size affects gesture control through task analysis, I need a tested gesture scheme to use. But this standard does not exist. Already I have been asked, why not devote your thesis to creating a standard then, even if it is for gesture control using one arm (or something more specific like that)? My answer is, 9 month is not enough time for that either.

All my research and testing so far has taught me something though. I have learned where gesture excels and where it struggles. My experiments have showed me that gesture control is great for dragging objects and moving them across the screen. Gesture struggles with selecting/clicking object. And through my readings, I have also learned about the biggest obstacles faced when designing a gesture control device.

##Gesture Control Obstacles

#1 Live Mic

Gesture control devices suffer from “live mic.” It means that the actions of a user using gesture control are always being recorded. Any action the user does can be misinterpreted as a command. If the user sneezes, the gesture device may think they are trying to give a command and execute it. This is called a false positive error; the user did not intend to give a command but the device reads it as one anyway.

Some gesture control devices, like the Myo, have a virtual clutch, that toggles the device to read commands. The user preforms the clutch command, like double tapping your thumb and middle finger for the Myo, to tell the device he or she is about to enter in a command. The user then does the command again to signal the device to stop reading his or her actions. This can reduce the amount of false positive errors, but now you have a command that can never mean anything else. Also, unless your clutch command is definitely something the user will never do on their own, there is still a chance for false positive errors for that command.

#2 Selecting doesn’t always happen with the first attempts

When a user does give a command, but the gesture device does not read it, this is called a false negative error. Through my experiment I have seen this a lot with the Myo armband. I have seen it happen the most for their selecting commands (making a fist) and their clutch command. My experiment used a mouse like controller powered by the Myo. When people tried to click on things, they had to execute the command multiple times in some instances to get the device to read it. When clicking on a touch screen, users do not have this problem. As long as the button is big enough, and the screen is clean, users do not need to “touch” multiple times to get the device to read it.

#3 Executing commands makes the user move

This obstacle is obvious. When using touch or gesture, the user has to move to execute a command. With gesture, the user does have to move even more. The Myo armband has the user keep their arm in the air too, so there is nothing for the user to lean on. When using the Myo armband in my experiment, users struggled with selecting specific objects because every command was read as a directional movement as well. So ”clicking” with the Myo (make a fist) was read as “move the cursor AND click.” Because that’s what the user was doing. They were moving and clicking. This meant that the user often could not select the object they wanted to since the cursor moved off of it.

##New Proposal

Because of these obstacles, I am changing my thesis to focus on designing a solution to gesture’s pitfalls using touch interaction. Touch does not have any of the obstacles listed above because it is binary. Is the user touching the screen; yes or no? That is it. Gesture suffers from “live mic.” This is the root of all its problems. Combining touch and gesture can solve this. My plan is to explore using a smartphone, like the iPhone, as a remote for gesture control. The smartphone can be moved around like a wand and be a receptor for gesture commands. But the binary screen input of the iPhone can help solve gesture’s pitfalls. Users can tap the screen, like they would pressing a button on a remote, to signal they are going to start executing a command. This would be the gesture scheme’s clutch. Also, tapping the screen replaces any selecting or “clicking” commands, lowering the chance on false negative errors.

To do this, I am going to be accessing the iPhone accelerometer and gyroscopic data through the iOS browser. I can then use WebSocket to connect this data to a desktop display. For the next few week I will be focusing on learning how to use WebSocket, and how to get the data on the iPhone to control something on the desktop.

##Term Project Plan

I will be reading WebSocket Essentials – Building Apps with HTML5 WebSockets since it focuses on building mobile applications.

Updated Plan:

Week 05 - Read 1/2 of WebSocket Essentials and do exercises w/ book
Week 06 - WebSocket Experiment #1 Complete (Click iPhone screen to change color of screen on desktop)
Week 07 - Read all of WebSocket Essentials and do exercises w/ book
Week 08 - WebSocket Experiment #2 Complete (iPhone accelerometer and gyroscopic data being sent to desktop)
Week 10 - WebSocket Experiment #3 Complete (Multiple people interacting with desktop through their own iPhone)

This week I purchased and began to read WebSocket Essentials. I also began to revise my proposal. So that’s 20 pages that needs to be re-written. I also researched existing demos/projects that use WebSocket and the iPhone directional data. I need to prove to people that this can be done so finding demos like this is important. The best one is HelloRacer. It uses WebGL, three.js, WebSocket, and HTML5. It combines all of these to allows users to tilt their phone to control the race driver.

Time Breakdown

Revising Proposal (idea & written paper) - 5 hrs
WebSocket Demo Research - 1 hr 15 mins
Reading WebSocket Essentials - 1 hr
Meeting w/ Troy - 45 mins
Project Planning - 30 mins
Reviewing W3C WebSocket API - 1 hr 20 mins
Total: 9 hrs 50 mins