Ensemble LUT classification for degraded document enhancement

Abstract

The fast evolution of scanning and computing technologies have led to the creation of large collections of scanned paper documents. Examples of such collections include historical collections, legal depositories, medical archives, and business archives. Moreover, in many situations such as legal litigation and security investigations scanned collections are being used to facilitate systematic exploration of the data. It is almost always the case that scanned documents suffer from some form of degradation. Large degradations make documents hard to read and substantially deteriorate the performance of automated document processing systems. Enhancement of degraded document images is normally performed assuming global degradation models. When the degradation is large, global degradation models do not perform well. In contrast, we propose to estimate local degradation models and use them in enhancing degraded document images. Using a semi-automated enhancement system we have labeled a subset of the Frieder diaries collection. This labeled subset was then used to train an ensemble classifier. The component classifiers are based on lookup tables (LUT) in conjunction with the approximated nearest neighbor algorithm. The resulting algorithm is highly efficient. Experimental evaluation results are provided using the Frieder diaries collection. © 2008 SPIE-IS&T. 1 1

Document Type

Conference Proceeding

DOI

https://doi.org/10.1117/12.767120

Keywords

Document degradation models, Document image analysis, Ensemble classification, Historical documents, Image enhancement

Publication Date

3-31-2008

Journal Title

Proceedings of SPIE - The International Society for Optical Engineering

Share

COinS