AUTOMATIC ARABIC KEYWORD EXTRACTION USING LOGISTIC REGRESSION

Authors:

Noor T. Jabury,Nada A.Z. Abdullah,

DOI NO:

https://doi.org/10.26782/jmcms.2020.10.00002

Keywords:

Arabic keywords,keywords extraction,logistic regression,

Abstract

Keywords Express the main content of the document or article, they are an important component since they provide a summary of the article’s content. Keywords also play an important role in information retrieval systems, bibliographic databases, and search engine optimization. The manual assignment of high-quality keywords is expensive, time-consuming, and error-prone. In this paper, an automatic keyword extraction model, based on the Logistic Regression algorithm is proposed and implemented. The model consists of three main stages:  preprocessing, feature extraction, and classification stage to select the keywords. In experimental results 40 Arabic documents are used from two Arabic journals (AJSP and JJSS ), the results are promising; the average accuracy is 0.91 with average precision 0.86 for the AJSP dataset, the average accuracy is 0.90with average precision 0.83 for the JJSS dataset.

Refference:

I. Aarti Sangwan, Partha Pratim Bhattacharya, “A Hybrid Cryptography and Authentication based Security Model for Clustered WBAN”, J.Mech.Cont.& Math. Sci., Vol.-13, No.-1, March – April (2018) Pages 34-54
II. A. A. Awajan, “Unsupervised Approach for Automatic Keyword Extraction from Arabic Documents”. In Proceedings of the 26th Conference on Computational Linguistics and Speech Processing (ROCLING 2014), pp. 175-184.2014.
III. A.Bilski, “A review of artificial intelligence algorithms in document classification”. International Journal of Electronics and Telecommunications, Vol. 57, Issue 3, pp. 263-270, 2011.
IV. B. Armouty, and S.Tedmori, “Automated Keyword Extraction using Support Vector Machine from Arabic News Documents”. In 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), pp. 342-346. IEEE, 2019. ‏
V. D. Suleiman, and A. Awajan, “Bag-of-concept based keyword extraction from Arabic documents”. In 2017 8th International Conference on Information Technology (ICIT), pp. 863-869, 2017.‏
VI. C. Zhang, “Automatic keyword extraction from documents using conditional random fields”. Journal of Computational Information Systems, Vol. 4, Issue 3, pp. 1169-1180, 2008.
VII. D. Suleiman, and A. Awajan, “Bag-of-concept based keyword extraction from Arabic documents”. In 2017 8th International Conference on Information Technology (ICIT), pp. 863-869, 2017.‏
VIII. E.H. Omoush, and V.W. Samawi, “Arabic keyword extraction using SOM neural network”. International Journal of Advanced Studies in Computers, Science and Engineering, Vol. 5, Issue 11, pp. 7, 2016.
IX. F. Sebastiani, “Machine learning in automated text categorization”. ACM computing surveys (CSUR), 2002. Vol. 34. Issue 1, p. 1-47, 2002.
X. K. Sarkar, M. Nasipuri, and S. Ghose “A new approach to keyphrase extraction using neural networks”. arXiv preprint arXiv:1004.3274, 2010.‏
XI. Kesana Mohana Lakshmi, Tummala Ranga Babu, “Robust Algorithm for Telugu Word Image Retrieval and Recognition”, Robust Algorithm for Telugu Word Image Retrieval and Recognition, J.Mech.Cont.& Math. Sci., Vol.-14, No.-1, January-February (2019) pp 220-240
XI. M. Al-Kabi, H. Al-Belaili, B. Abul-Huda, and A. H. Wahbeh, ” Keyword extraction based on word co-occurrence statistical information for Arabic text”. Abhath Al-Yarmouk” Basic Sci. Eng, Vol. 22, Issue 1, pp: 75-95,‏2013.
XII. M. M. Abdulwahid, O. A. S. Al-Ani, M. F. Mosleh, and R. A. Abd-Alhmeed. “Optimal access point location algorithm based real measurement for indoor communication”. In Proceedings of the International Conference on Information and Communication Technology, pp: 49-55, 2019.‏
XIII. M. Labidi, “New Combined Method to Improve Arabic POS Tagging”. Journal of Autonomous Intelligence, Vol. 1, Issue 2, pp.23-28, 2019.
XIV. P.-I. Chen, and S.-J. Lin, “Automatic keyword prediction using Google similarity distance”. Expert Systems with Applications. Vol. 37, issue 3, pp. 1928-1938, 2010.
XV. R. Feldman, and J. Sanger, “The text mining handbook: advanced approaches in analyzing unstructured data”. 2007: Cambridge university press.
XVI. R.M. Alguliev, and R.M. Aliguliyev. “Effective summarization method of text documents”. The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05). 2005.
XVII. Sallam, R.M., H.M. Mousa, and M. Hussein, “Improving Arabic text categorization using normalization and stemming techniques”. International Journal of Computer Applications, Vol. 135, Issue 2, pp. 38-43, 2016.
XVIII. S. K. Shevade and S. S. Keerthi. “A simple and efficient algorithm for gene selection using sparse logistic regression”. Bioinformatics, Vol. 19, Issue 17, pp: 2246-2253.
XIX. S. Lee, I., Lee, H., Abbeel, P., and A. Y. Ng, “Efficient l~ 1 regularized logistic regression. In AAAI”, Vol. 6, pp. 401-408, 2016.
XX. T. Jo, “Neural based approach to keyword extraction from documents “. In International Conference on Computational Science and Its Applications. 2003. Springer.
XXI. V. Singh B. Kumar, and T. Patnaik, “Feature extraction techniques for handwritten text in various scripts: a survey”. International Journal of Soft Computing and Engineering (IJSCE), Vol. 3, Issue 1: pp. 238-241, 2013.
XXII. محقق, ن., نیلوفر, اطلسی, علی بیک, صالحی, حجتی زاده, … & باقری . “The Relationship between Number of Keywords Used in Titles of Articles and Number of Citations to These Articles in Selected Journals Published by Tehran University of Medical Sciences”. مطالعات کتابداری و علم اطلاعات,
XXIII. Y. Wang and X.-J. Wang. “A new approach to feature selection in text classification”. in 2005 International conference on machine learning and cybernetics. 2005.
XXIV. Y. Ying, Qingping, T. Qinzheng, Z. Ping, and L. Panpan “A graph-based approach of automatic keyphrase extraction”. Procedia Computer Science, Vol. 107, pp. 248-255, 2017.

View Download