CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

PTokenizer: POS Tagger Tokenizer

عنوان مقاله: PTokenizer: POS Tagger Tokenizer
شناسه ملی مقاله: JR_JKBEI-2-7_006
منتشر شده در شماره 7 دوره 2 فصل October در سال 1395
مشخصات نویسندگان مقاله:

Saeed Rahmani - Department of Computer and IT Engineering, Shiraz University, Shiraz, Iran
Seyyed Mostafa Fakhrahmad - Department of Computer and IT Engineering, Shiraz University, Shiraz, Iran
Mohammad Hadi Sadredini - Department of Computer and IT Engineering, Shiraz University, Shiraz, Iran

خلاصه مقاله:
By the advent of new information sources and the expansion of text data, natural language processing (NLP) has become one of the key parts of all the systems dealing with human written texts, and part of speech (POS) tagging is an inseparable part of all NLP tasks. As a result, it is of the paramount importance to enhance the accuracy of POS tagging. In this paper, applying language model and statistical information, we introduce a new approach to tokenize sentences and prepare them to be labeled by POS taggers. An evaluation shows that the proposed method yields a precision of 98 percent for tokenizing, and

کلمات کلیدی:
Tokenizer, Part of Speech Tagging, Probabilistic Model, Compound Tokens

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/589369/