Pixel-Wise Method for Enhanced Tesseract OCR Accuracy Using Colour and Spatial Distances

Authors

DOI:

https://doi.org/10.70594/brain/16.2/29

Keywords:

OCR, noise reduction, Tesseract, image preprocessing, image enhancement, OCR accuracy improvement, contrast correction

Abstract

Digital images often contain noise introduced during acquisition, storage, or transmission, which can hinder the performance of Optical Character Recognition systems. Effective noise reduction is essential for improving the accuracy of these systems, as noise can obscure text and reduce recognition rates. The problem of removing noise from images is widely studied in computer vision but remains challenging due to the variety of noise types and the risk of introducing artifacts or blurring. In this work, we propose a new preprocessing algorithm that is used in conjunction with the Tesseract engine, in order to improve its overall accuracy. We test this method against the SmartDoc dataset, which contains images taken from mobile devices, and obtain an improvement over the original accuracy of 6.5%. The method is also compared to several other classical algorithms such as Mean Filter, Median Filter, Bilateral Filter, Adaptive Smoothing, and others showing improved results over each individual one.

Author Biographies

  • Mihai-Lucian Voncilă, National University of Science and Technology Politehnica, Bucharest, Romania

    Computer Science and Engineering Department
    Faculty of Automatic Control and Computers
    National University of Science and Technology Politehnica, Bucharest, Romania

  • Nicolae Tarbă, National University of Science and Technology Politehnica, Bucharest, Romania

    Computer Science and Engineering Department
    Faculty of Automatic Control and Computers
    National University of Science and Technology Politehnica, Bucharest, Romania

  • Cosmin-Dumitru Oprea, National University of Science and Technology Politehnica, Bucharest, Romania

    Computer Science and Engineering Department
    Faculty of Automatic Control and Computers
    National University of Science and Technology Politehnica, Bucharest, Romania

  • Costin Anton Boiangiu, National University of Science and Technology Politehnica, Bucharest, Romania

    Computer Science and Engineering Department
    Faculty of Automatic Control and Computers
    National University of Science and Technology Politehnica, Bucharest, Romania

  • Nicolae Goga, National University of Science and Technology Politehnica, Bucharest, Romania

    Faculty of Engineering in Foreign Languages
    Faculty of Automatic Control and Computers
    National University of Science and Technology Politehnica, Bucharest, Romania

Downloads

Published

2025-06-01

Issue

Section

Artificial Intelligence