Investigating Hostile Post Detection in Gujarati: A Machine Learning Approach

B. J. Rameshbhai; K. Rana

Investigating Hostile Post Detection in Gujarati: A Machine Learning Approach

Publish place: International Journal of Engineering (IJE)، Vol: 37، Issue: 7

Publish Year: 1403

نوع سند: مقاله ژورنالی

زبان: English

This Paper With 12 Page And PDF Format Ready To Download

دریافت فایل کامل Paper

Certificate
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

https://civilica.com/doc/1965652

شناسه ملی سند علمی:

JR_IJE-37-7_008

تاریخ نمایه سازی: 9 اردیبهشت 1403

Abstract:

Hostile post on social media is a crucial issue for individuals, governments and organizations. There is a critical need for an automated system that can investigate and identify hostile posts from large-scale data. In India, Gujarati is the sixth most spoken language. In this work, we have constructed a major hostile post dataset in the Gujarati language. The data are collected from Twitter, Instagram and Facebook. Our dataset consists of ۱,۵۱,۰۰۰ distinct comments having ۱۰,۰۰۰ manually annotated posts. These posts are labeled into the Hostile and Non-Hostile categories. We have used the dataset in two ways: (i) Original Gujarati Text Data and (ii) English data translated from Gujarati text. We have also checked the performance of pre-processing and without pre-processing data by removing extra symbols and substituting emoji descriptions in the text. We have conducted experiments using machine learning models based on supervised learning such as Support Vector Machine, Decision Tree, Random Forest, Gaussian Naive-Bayes, Logistic Regression, K-Nearest Neighbor and unsupervised learning based model such as k-means clustering. We have evaluated performance of these models for Bag-of-Words and TF-IDF feature extraction methods. It is observed that classification using TF-IDF features is efficient. Among these methods Logistic regression outperforms with an Accuracy of ۰.۶۸ and F۱-score of ۰.۶۷. The purpose of this research is to create a benchmark dataset and provide baseline results for detecting hostile posts in Gujarati Language.

Keywords:

Hostile Text Detection , Machine Learning , Hate Text Detection , Text Classification , Gujarati Text Dataset

Authors

B. J. Rameshbhai

Department of Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

K. Rana

Department of Computer Engineering, Sarvajanik College of Engineering and Technology, Gujarat Technological University, Ahmedabad, Gujarat, India

مراجع و منابع این Paper:

لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :

Bhatnagar V, Kumar P, Bhattacharyya P. Investigating hostile post detection ...
Bhardwaj M, Akhtar MS, Ekbal A, Das A, Chakraborty T. ...
Dowlagar S, Mamidi R. Hasocone@ fire-hasoc۲۰۲۰: Using bert and multilingual ...
Alshaalan R, Al-Khalifa, H.,, editor Hate speech detection in saudi ...
Joshi R, Karnavat R, Jirapure K, Joshi R, editors. Evaluation ...
Bhatnagar V, Kumar P, Moghili S, Bhattacharyya P, editors. Divide ...
Velankar A, Patil H, Gore A, Salunke S, Joshi R. ...
Phung TM, Cloos J. An exploratory experiment on Hindi, Bengali ...
Kamble S, Joshi A. Hate speech detection from code-mixed hindi-english ...
Velankar A, Patil H, Gore A, Salunke S, Joshi R. ...
Glazkova A, Kadantsev M, Glazkov M. Fine-tuning of pre-trained transformers ...
Chavan T, Patankar S, Kane A, Gokhale O, Joshi R. ...
Kamal O, Kumar A, Vaidhya T, editors. Hostility detection in ...
Bhardwaj M, Sundriyal M, Bedi M, Akhtar MS, Chakraborty T. ...
Khan MM, Shahzad K, Malik MK. Hate speech detection in ...
Haq NU, Ullah M, Khan R, Ahmad A, Almogren A, ...
Anbukkarasi S, Varadhaganapathy S. Deep learning-based hate speech detection in ...
Farooqi ZM, Ghosh S, Shah RR. Leveraging transformers for hate ...
Biradar S, Saumya S, editors. Iiitdwd@ tamilnlp-acl۲۰۲۲: Transformer-based approach to ...
Mohapatra SK, Prasad S, Bebarta DK, Das TK, Srinivasan K, ...
Nayak R, Joshi R. Contextual hate speech detection in code ...
Sreelakshmi K, Premjith B, Soman K. Detection of hate speech ...
Luo X. Efficient English text classification using selected machine learning ...
Sanoussi MSA, Xiaohua C, Agordzo GK, Guindo ML, Al Omari ...
Felber T. Constraint ۲۰۲۱: Machine learning models for COVID-۱۹ fake ...
Fahad N, Goh KM, Hossen MI, Shopnil KS, Mitu IJ, ...
Defersha N, Tune K. Detection of hate speech text in ...
Badjatiya P, Gupta S, Gupta M, Varma V, editors. Deep ...
Bangyal WH, Qasim R, Rehman NU, Ahmad Z, Dar H, ...
Aggarwal CC, Zhai C. A survey of text classification algorithms. ...
Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, ...
Akram MW, Salman M, Bashir MF, Salman SMS, Gadekallu TR, ...
Qasim R, Bangyal WH, Alqarni MA, Almazroi AA. A fine-tuned ...
Aluru SS, Mathew B, Saha P, Mukherjee A. Deep learning ...
Hassan SU, Ahamed J, Ahmad K. Analytics of machine learning-based ...
Indrawan G, Setiawan H, Gunadi A. Multi-class svm classification comparison ...
Balamurugan V, Vedanarayanan V, Sahaya Anselin Nisha A, Narmadha R, ...
Dorrani Z. Traffic Scene Analysis and Classification using Deep Learning. ...
Zare F, Mahmoudi-Nasr P. Feature Engineering Methods in Intrusion Detection ...
Banerjee S, Sarkar M, Agrawal N, Saha P, Das M. ...
Warjri S, Pakray P, Lyngdoh SA, Maji AK, editors. Fake ...

نمایش کامل مراجع