Noise Characterization in Ancient Document Images Based on DCT Coefficient Distribution

Publication Name : 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR)

DOI :

Date : 2015


Ancient document images date back to several hundred years are commonly suffered from noises and degradations, such as ink-seeping from the back page, 'fox; that is local-brown discolorations of paper, text fading, background spots, uneven background and so on. Noise reduction (or denoising) is an important step in document image processing, because the step can enhance the optical character recognition (OCR) performance. Prior to employing a noise reduction algorithm, it is important to characterize noise types exist in the document. This paper proposes a method to characterize noise types exist in ancient document based on the DCT coefficient distribution of the image. The characterization are accomplished by analyzing the standard deviation of distribution of DCT coefficient higher frequency-band of cropped (localized) noise image. In simulations, three noise types exist in Acehnese ancient documents namely 'fox', spots, and uneven background are characterized using the proposed method. The results suggest that the DCT coefficient distributions can be used to characterize the noises in ancient document. In addition, it has been shown that the proposed method can be used for document image classification.

Type
Book in series
ISSN
1520-5363
EISSN
Page
971 - 975