Ensemble LUT classification for degraded document enhancement
The fast evolution of scanning and computing technologies have led to the creation of large collections of scanned paper documents. Examples of such collections include historical collections, legal depositories, medical archives, and business archives. Moreover, in many situations such as legal litigation and security investigations scanned collections are being used to facilitate systematic exploration of the data. It is almost always the case that scanned documents suffer from some form of degradation. Large degradations make documents hard to read and substantially deteriorate the performance of automated document processing systems. Enhancement of degraded document images is normally performed assuming global degradation models. When the degradation is large, global degradation models do not perform well. In contrast, we propose to estimate local degradation models and use them in enhancing degraded document images. Using a semi-automated enhancement system we have labeled a subset of the Frieder diaries collection. This labeled subset was then used to train an ensemble classifier. The component classifiers are based on lookup tables (LUT) in conjunction with the approximated nearest neighbor algorithm. The resulting algorithm is highly efficient. Experimental evaluation results are provided using the Frieder diaries collection. © 2008 SPIE-IS&T. 1 1
Document degradation models, Document image analysis, Ensemble classification, Historical documents, Image enhancement
Obafemi-Ajayi, Tayo, Gady Agam, and Ophir Frieder. "Ensemble LUT classification for degraded document enhancement." In Document Recognition and Retrieval XV, vol. 6815, p. 681509. International Society for Optics and Photonics, 2008.
Proceedings of SPIE - The International Society for Optical Engineering