Text Detection and Recognition for Robot Localization

Z. Raisi; J. Zelek

Text Detection and Recognition for Robot Localization

Publish place: Journal of Electrical and Computer Engineering Innovations، Vol: 12، Issue: 1

Publish Year: 1403

نوع سند: مقاله ژورنالی

زبان: English

This Paper With 12 Page And PDF Format Ready To Download

دریافت فایل کامل Paper

Certificate
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

https://civilica.com/doc/1866023

شناسه ملی سند علمی:

JR_JECEI-12-1_011

تاریخ نمایه سازی: 5 دی 1402

Abstract:

kground and Objectives: Signage is everywhere, and a robot should be able to take advantage of signs to help it localize (including Visual Place Recognition (VPR)) and map. Robust text detection & recognition in the wild is challenging due to pose, irregular text instances, illumination variations, viewpoint changes, and occlusion factors.Methods: This paper proposes an end-to-end scene text spotting model that simultaneously outputs the text string and bounding boxes. The proposed model leverages a pre-trained Vision Transformer based (ViT) architecture combined with a multi-task transformer-based text detector more suitable for the VPR task. Our central contribution is introducing an end-to-end scene text spotting framework to adequately capture the irregular and occluded text regions in different challenging places. We first equip the ViT backbone using a masked autoencoder (MAE) to capture partially occluded characters to address the occlusion problem. Then, we use a multi-task prediction head for the proposed model to handle arbitrary shapes of text instances with polygon bounding boxes.Results: The evaluation of the proposed architecture's performance for VPR involved conducting several experiments on the challenging Self-Collected Text Place (SCTP) benchmark dataset. The well-known evaluation metric, Precision-Recall, was employed to measure the performance of the proposed pipeline. The final model achieved the following performances, Recall = ۰.۹۳ and Precision = ۰.۸, upon testing on this benchmark.Conclusion: The initial experimental results show that the proposed model outperforms the state-of-the-art (SOTA) methods in comparison to the SCTP dataset, which confirms the robustness of the proposed end-to-end scene text detection and recognition model.

Keywords:

Text detection , Text Recognition , Robotics Localization , Deep Learning , Visual Place Recognition

Authors

Z. Raisi

University of Waterloo, Waterloo, Canada and Chabahar Maritime University, Chabahar, Iran.

J. Zelek

Systems Design Engineering Department, University of Waterloo, Canada.

مراجع و منابع این Paper:

لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :

A. Anoosheh, T. Sattler, R. Timofte, M. Pollefeys, L. Van ...
R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, J. Sivic, ...
R. Atienza, “Vision transformer for fast and efficient scene text ...
J. Baek, G. Kim, J. Lee, S. Park, D. Han, ...
Y. Baek, B. Lee, D. Han, S. Yun, H. Lee, ...
Y. Baek, S. Shin, J. Baek, S. Park, J. Lee, ...
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, ...
W. Chan, C. Saharia, G. Hinton, M. Norouzi, N. Jaitly, ...
C. K. Ch’ng, C. S. Chan, “Total-text: A comprehensive dataset ...
K. Choromanski, V. Likhosherstov, D. Dohan, X. Song, A. Gane, ...
M. Cummins, P. Newman, “Fab-map: Probabilistic localization and mapping in ...
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, ...
S. Fang, H. Xie, Y. Wang, Z. Mao, Y. Zhang, ...
W. Feng, W. He, F. Yin, X. Y. Zhang, C. ...
S. Garg, T. Fischer, M. Milford, “Where is your place, ...
A. Gupta, A. Vedaldi, A. Zisserman, “Synthetic data for text ...
K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, ...
K. He, X. Chen, S. Xie, Y. Li, P. Dollar, ...
K. He, G. Gkioxari, P. Dollar, R. Girshick, “Mask R-CNN, ...
K. He, X. Zhang, S. Ren, J. Sun, “Deep residual ...
S. Hochreiter, J. Schmidhuber, “Long short-term memory,” Neural Comput., ۹(۸): ...
Z. Hong, Y. Petillot, D. Lane, Y. Miao, S. Wang, ...
M. Iwamura, N. Morimoto, K. Tainaka, D. Bazazian, L. Gomez, ...
M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, “Synthetic data ...
D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, ...
D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. ...
S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. ...
Y. Kittenplon, I. Lavi, S. Fogel, Y. Bar, R. Manmatha, ...
A. B. Laguna, K. Mikolajczyk, “Key. net: Keypoint detection by ...
J. Lee, S. Park, J. Baek, S. Joon Oh, S. ...
H. Li, P. Wang, C. Shen, “Towards end-to-end text spotting ...
Y. Li, S. Xie, X. Chen, P. Dollar, K. He, ...
M. Liao, G. Pang, J. Huang, T. Hassner, X. Bai, ...
M. Liao, B. Shi, X. Bai, “Textboxes++: A single-shot oriented ...
T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. ...
V. Nazarzehi, R. Damani, “Decentralised optimal deployment of mobile underwater ...
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, ...
X. Liu, D. Liang, S. Yan, D. Chen, Y. Qiao, ...
Y. Liu, H. Chen, C. Shen, T. He, L. Jin, ...
Y. Liu, C. Shen, L. Jin, T. He, P. Chen, ...
S. Lowry, N. S. Underhauf, P. Newman, J. J. Leonard, ...
S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. ...
P. Lyu, M. Liao, C. Yao, W. Wu, X. Bai, ...
C. Masone, B. Caputo, “A survey on deep visual place ...
M. J. Milford, G. F. Wyeth, “Seqslam: Visual route-based navigation ...
A. Mishra, K. Alahari, C. V. Jawahar, “Scene text recognition ...
S. Qin, A. Bissacco, M. Raptis, Y. Fujii, Y. Xiao, ...
T. Q. Phan, P. Shivakumara, S. Tian, C. Lim Tan, ...
Z. Raisi, M. Naiel, P. Fieguth, S. Wardell, J. Zelek, ...
Z. Raisi, M. A. Naiel, P. Fieguth, S. Wardell, J. ...
Z. Raisi, M. A. Naiel, G. Younes, S. Wardell, J. ...
Z. Raisi, M. A. Naiel, G. Younes, S. Wardell, J. ...
Z. Raisi, G. Younes, J. Zelek, “Arbitrary shape text detection ...
Z. Raisi, J. Zelek, “Occluded text detection and recognition in ...
Z. Raisi, J. S. Zelek, “End-to-end scene text spotting at ...
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You ...
S. Ren, K. He, R. Girshick, J. Sun, “Faster R-CNN: ...
A. Risnumawan, P. Shivakumara, C. S. Chan, C. L. Tan, ...
D. E. Rumelhart, G. E. Hinton, R. J. Williams, “Learning ...
A. Shahab, F. Shafait, A. Dengel, “ICDAR ۲۰۱۱ robust reading ...
B. Shi, M. Yang, X. Wang, P. Lyu, C. Yao, ...
Y. Sun, Z. Ni, C.-K. Chng, Y. Liu, C. Luo, ...
Y. Tay, M. Dehghani, D. Bahri, D. Metzler, “Efficient transform- ...
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, ...
K. Wang, S. Belongie, “Word spotting in the wild,” in ...
C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, ...
L. Yuliang, J. Lianwen, Z. Shuaitao, Z. Sheng, “Detecting curve ...
X. Zhang, Y. Su, S. Tripathi, Z. Tu, “Text spotting ...
X. Zhang, L. Wang, Y. Su, “Visual place recognition: A ...
X. Zhu, W. Su, L. Lu, B. Li, X. Wang, ...
S. X. Zhang, X. Zhu, J. B. Hou, C. Liu, ...
L. Xing, Z. Tian, W. Huang, M. R. Scott, “Convolutional ...
I. Loshchilov, F. Hutter, “Decoupled weight decay regularization,” in Proc. ...
G. Liao, Z. Zhu, Y. Bai, T. Liu, Z. Xie, ...
X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou,W. ...
C. K. Ch'ng, C. S. Chan, “TotalText: A comprehensive dataset ...
L. Yuliang, J. Lianwen, Z. Shuaitao, Z. Sheng, “Detecting curve ...
D. M. Katz, M. J. Bommarito, S. Gao, P. Arredondo, ...
B. M. Lake, T. D. Ullman, J. B. Tenenbaum, S. ...

نمایش کامل مراجع