,
Norwich UK: Researchers at the University of East Anglia are about to embark on an innovative new project to develop computer lip-reading systems that could be used for fighting crime. The three-year project, which starts next month, will collect data for lip-reading and use it to create machines that automatically convert videos of lip-motions into text. It builds on work already carried out at UEA to develop state-of-the-art speech reading systems.
The university is teaming up with the Centre for Vision, Speech and Signal Processing at Surrey University, who have built accurate and reliable face and lip trackers, and the Home Office Scientific Development Branch, who want to investigate the feasibility of using the technology for crime fighting.
The team also hope to carry out computerised lip-reading of other languages.
While it is known that humans can and do lip-read, not much is known about exactly what visual information is needed for effective lip-reading. Human lip-reading can be unreliable, even using trained lip-readers.
Dr Richard Harvey, senior lecturer at UEA's School of Computing Sciences, is leading the project, which has been awarded 391,814 pounds by the Engineering and Physical Sciences Research Council.
“We all lip read, for example in noisy situations like a bar or party, but even the performance of expert lip readers can be very poor,” he said. “It appears that the best lip-readers are the ones who learned to speak a language before they lost their hearing and who have been taught lip-reading intensively. It is a very desirable skill.”
Dr Harvey added: “The Home Office Scientific Development Branch is interested in anything that helps the police gather information about criminals or gather evidence.”
As well as crime fighting there could be other potential uses for the technology, such as installing a camera in a mobile phone, or on the dash board for in-car speech recognition systems.
Another reason for developing computerised lip-reading is that the number of trained lip-readers is falling, mainly because people tend to be taught to sign instead.
Dr Harvey said: “To be effective the systems must accurately track the head over a variety of poses, extract numbers, or features, that describe the lips and then learn what features correspond to what text.
“To tackle the problem we will need to use information collected from audio speech. So this project will also investigate how to use the extensive information known about audio speech to recognise visual speech.
“The work will be highly experimental. We hope to have produced a system that will demonstrate the ability to lip-read in more general situations than we have done so far.”