A Language-Independent, Open-Vocabulary System Based on HMMs for Recognition of Ultra Low Resolution Words

dc.creatorEinsele,Farshideh
dc.creatorIngold,Rolf
dc.creatorHennebert,Jean
dc.date2008
dc.date.accessioned2024-02-06T12:57:11Z
dc.date.available2024-02-06T12:57:11Z
dc.descriptionIn this paper, we introduce and evaluate a system capable of recognizing words extracted from ultra low resolution images such as those frequently embedded on web pages. The design of the system has been driven by the following constraints. First, the system has to recognize small font sizes between 6-12 points where anti-aliasing and resampling filters are applied. Such procedures add noise between adjacent characters in the words and complicate any a priori segmentation of the characters. Second, the system has to be able to recognize any words in an open vocabulary setting, potentially mixing different languages in Latin alphabet. Finally, the training procedure must be automatic, i.e. without requesting to extract, segment and label manually a large set of data. These constraints led us to an architecture based on ergodic HMMs where states are associated to the characters. We also introduce several improvements of the performance increasing the order of the emission probability estimators, including minimum and maximum width constraints on the character models and a training set consisting all possible adjacency cases of Latin characters. The proposed system is evaluated on different font sizes and families, showing good robustness for sizes down to 6 points.
dc.formattext/html
dc.identifierhttps://doi.org/10.3217/jucs-014-18-2982
dc.identifierhttps://lib.jucs.org/article/29209/
dc.identifier.urihttps://openrepository.mephi.ru/handle/123456789/9933
dc.languageen
dc.publisherJournal of Universal Computer Science
dc.relationinfo:eu-repo/semantics/altIdentifier/eissn/0948-6968
dc.relationinfo:eu-repo/semantics/altIdentifier/pissn/0948-695X
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rightsJ.UCS License
dc.sourceJUCS - Journal of Universal Computer Science 14(18): 2982-2997
dc.subjectultra low resolution text recognition
dc.subjectweb document analysis
dc.subjectHMMs
dc.subjectweb image indexation and retrieval
dc.titleA Language-Independent, Open-Vocabulary System Based on HMMs for Recognition of Ultra Low Resolution Words
dc.typeResearch Article
Файлы