Text entry is extremely difficult or sometimes impossible in the scenarios of situationally-induced or physical impairments and disabilities. As a remedy, many rely on gaze typing which commonly uses dwell time as the selection method. However, dwell-based gaze typing could be limited by usability issues, reduced typing speed, high error rate, steep learning curve, and visual fatigue with prolonged usage. We present a dwell-free, multimodal approach to gaze typing where the gaze input is supplemented with a foot input modality. In this multi-modal setup, the user points her gaze at the desired character, and selects it with the foot input. We further investigated two approaches to foot-based selection, a foot gesture-based selection and a foot press-based selection, which are compared against the dwell-based selection.
We evaluated our system through three experiments involving 51 participants, where each experiment used one of the three target selection methods: dwell-based, foot gesture-based, and foot press-based selection. We found that foot-based selection at least matches, and likely improves, the gaze typing performance compared to dwell-based selection. Among the four foot gestures (toe tapping, heel tapping, right flick and left flick) we used in the study, toe tapping is the most preferred gesture for gaze typing. Furthermore, when using foot-based activation users quickly develop a rhythm in focusing at a character with gaze and selecting it with the foot. This familiarity reduces errors significantly. Overall, based on both typing performance and qualitative feedback the results suggest that gaze and foot-based tying is convenient, easy to learn, and addresses the usability issues associated with dwell-based typing. We believe, our findings would encourage further research in leveraging a supplemental foot input in gaze typing, or in general, would assist in the development of rich foot-based interactions.
Gaze input has been a promising substitute for mouse input for point and select interactions. Individuals with severe motor and speech disabilities primarily rely on gaze input for communication. Gaze input also serves as a hands-free input modality in the scenarios of situationally-induced impairments and disabilities (SIIDs). Hence, the performance of gaze input has often been compared to mouse input through standardized performance evaluation procedure like the Fitts' Law. With the proliferation of touch-enabled devices such as smartphones, tablet PCs, or any computing device with a touch surface, it is also important to compare the performance of gaze input to touch input.
In this study, we conducted ISO 9241-9 Fitts' Law evaluation to compare the performance of multimodal gaze and foot-based input to touch input in a standard desktop environment, while using mouse input as the baseline. From a study involving 12 participants, we found that the gaze input has the lowest throughput (2.55 bits/s), and the highest movement time (1.04 s) of the three inputs. In addition, though touch input involves maximum physical movements, it achieved the highest throughput (6.67 bits/s), the least movement time (0.5 s), and was the most preferred input. While there are similarities in how quickly pointing can be moved from source to target location when using both gaze and touch inputs, target selection consumes maximum time with gaze input. Hence, with a throughput that is over 160% higher than gaze, touch proves to be a superior input modality.