iD – H2020-MSCA-CYENS-2019 – aRTIFICIAL iNTELLIGENCE for the Deaf
Duration: 48 months ( 1/12/2019-31/11/2023 )
CUT Budget: 600000
aiD aims to address the challenge of deaf people communication and social integration by leveraging the latest advances in ML, HCI and AR. Specifically, speech-to-text/text-to-speech algorithms have currently reached high performance, as a product of the latest breakthrough advances in the field of deep learning (DL). However, the commercially available systems cannot be readily integrated into a solution targeted to the communication between deaf and hearing people. On the other hand, existing research efforts to tackle the problem of transcribing SL video or generating synthetic SL footage (SL avatar) from text have failed to generate a satisfactory outcome.
aiD addresses both these problems. We develop speech-to-text/text-to-speech modules tailored to the requirements of a system addressing the communication of the deaf. Most importantly, we systematically address the core technological challenge of SL transcription and generation in an AR environment. Our vision is to exploit and advance the state-of-the-art in DL to solve these problems with groundbreaking accuracy, in a fashion amenable to commodity mobile hardware. This will be in stark contrast to existing systems which either depend on sophisticated costly equipment (multiple vision sensors, gloves and wristbands), or are lab-only systems limited to fingerspelling as opposed to the official SL that deaf people actually use. Indeed, the current state-of-the-art requires expensive devices and operates on a word-by-word basis, thus missing the syntactic context. Finally, these solutions are not amenable to commodity mobile devices. Our vision is to resolve these staggering inadequacies so as to offer a concrete solution that addresses real time interaction between deaf and hearing people. Our core innovation lies in the development of new algorithms and techniques that enable the real-time translation of SL video to text or speech and vice-versa (SL avatar generation from speech/text in an AR environment), with satisfactory accuracy, in a fashion amenable to commodity mobile devices such as smartphones and tablets.