Document Analysis And Classification Based On Passing Window

Publish Year: 1399
نوع سند: مقاله ژورنالی
زبان: English
View: 328

This Paper With 8 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

این Paper در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

JR_JACET-6-1_006

تاریخ نمایه سازی: 24 تیر 1399

Abstract:

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorithm is proposed to segment a document image into homogenous regions. In document classification, Neural Network (Multilayer Perceptron- Back propagation) classifier is applied to classify each region to text or non text based on a number of features extracted in feature extraction. These features are collected from different other researchers’ works. Experiments were conducted on 398 document images selected randomly from printed Arabic text database (PATDB) which was selected from various printing forms which are advertisements, book chapters, magazines, newspapers, letters and reports documents. As results, the proposed segmentation algorithm achieved only 0.814% as ratio of the overlapping areas of the merged zones to the total size of zones and 1.938% as the ratio of missed areas to total size of zones. The features, that show the best accuracy individually, are Background Vertical Run Length (RL) Mean, and Standard Deviation of foreground.

Authors

ZAHER BAMASOOD

Computer Science Department, Hadhramout University

مراجع و منابع این Paper:

لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :
  • H. Jaekyu, R. M. Haralick, and I. T. Phillips, Recursive ...
  • G. Nagy, S. Seth, and M. Viswanathan, A prototype document ...
  • J. Liu, Y. Y. Tang, and C. Y. Suen, Chinese ...
  • J. Liang, I. T. Phillips, J. Ha, and R. M. ...
  • N. Otsu, A Threshold Selection Method from Gray-Level Histograms, IEEE ...
  • S. A. Mahmoud, Pergamon Arabic Character Recognition Using Fourier Descriptors ...
  • S. Inglis and I. H. Witten, Document Zone Classification Using ...
  • Y. Wang, I. T. Phillips, and R. M. Haralick, Document ...
  • A. G. AL-Hashim, Arabic database for automatic printed Arabic text ...
  • A. G. Al-Hashim and S. A. Mahmoud, Benchmark Database and ...
  • A. G. Al-Hashim and S. A. Mahmoud, Printed Arabic Text ...
  • F. Shafait, D. Keysers, and T. M. Breuel, Performance Evaluation ...
  • نمایش کامل مراجع