Software to digitise Arabic text online

Computer scientists are developing software to scan Arabic documents, including handwritten ones, for words and phrases, filling a void that became apparent after the September 11 attacks in the US.

Software will allow digitised Arabic on the web

Besides helping with intelligence gathering, the software should expand access to modern and ancient Arabic manuscripts. It will allow Arabic writings to be digitised and posted on the internet.

 

“The whole internet is skewed towards people who speak English,” said Venu Govindaraju, director of the Centre for Unified Biometrics and Sensors at the University at Buffalo, where the software is being developed.

 

Govindaraju fears that if optical character recognition software is not developed for a particular language, “then all the classic texts in that language will disappear into oblivion”.

 

Caution

 

Bill Young, an Arabic language specialist at the University of Maryland, said the software could help scan through masses of typed pages for specific names or words, although he cautioned that handwritten Arabic presented serious challenges for computers.

 

For instance, the word mas’uul, meaning responsible, can be written in more than one way, he said. So the software would have to be given instructions about possible variations.

 

Govindaraju, who helped develop software to recognise handwritten addresses in English, said the Arabic software would take into account the fact that characters may take different forms depending on where within a word they appear, and that Arabic vowels are pronounced but often not written.