Improving Kurdish Web Mining through Tree Data Structure and Porter’s Stemmer Algorithms
Stemming is one of the main important preprocessing techniques that can be used to enhance the accuracy of text classification. The key purpose of using the stemming is combining the number of words that have same stem to decrease high dimensionality of feature space. Reducing feature space cause to decline time to construct a model and minimize the memory space. In this paper, a new stemming approach is explored for enhancing Kurdish text classification performance. Tree data structure and Porter’s stemmer algorithms are incorporated for building the proposed approach. The system is assessed through using Support Vector Machine (SVM) and Decision Tree (C4.5) to illustrate the performance of the suggested stemmer after and before applying it. Furthermore, the usefulness of using stop words are considered before and after implementing the suggested approach.
Alami, N., Meknassi, M., & Ouatik, S. A. (2016). Impact of stemming on Arabic text summarization. International Colloquium on Information Science and Technology (CiSt). Tangier, Morocco: IEEE.
Bahassine, S., Mohamed, K., & Abdellah, M. (2014). New stemming for Arabic text classification using feature selection and decision trees. 5th International Conference on Arabic Language. Oujda, Morocco: IEEE. p. 200-205.
Danisman, T., & Adil, A. (2008). Feeler: Emotion classification of text using vector space model. In: AISB 2008 Convention Communication Interaction and Social Intelligence. Vol. 2. Aberdeen, UK: AISB.
Duwairi, R., Al-Refai, M., & Khasawneh, N. (2007). Stemming versus light stemming as feature selection techniques for arabic text categorization. Innovations in Information Technologies (IIT). Dubai, Dubai: IEEE.
Esmaili, K. S., Donya, E., & Shahin, S. (2013). Building a Test collection for Sorani Kurdish. International Conference on Computer Systems and Applications (AICCSA). Ifrane, Morocco: IEEE.Khalid, A., Zakir, H., & Baig, M. A. (2016).
Arabic stemmer for search engines information retrieval. (IJACSA) International Journal of Advanced Computer Science and Applications, 7(1), 407-411.
Mamoun, R., & Mahmoud, A. (2016). Arabic text stemming: Comparative analysis. Conference of Basic Sciences and Engineering Studies (SGCAC). Khartoum, Sudan: IEEE.
Mohammed, F. S., Zakaria, L., & Omar, N. (2012). Automatic Kurdish SORANi text categorization using N-gram based model. International Conference on Computer and Information Science (ICCIS). Kuala Lumpeu, Malaysia: IEEE.
Mustafa, A. M., & Rashid, T. A. (2017). Kurdish stemmer pre-processing steps for improving information retrieval. Journal of Information Science, 44(1), 15-27.
Karthik, P., Saurabh, M., & Chandrasekhar, U. (2016). Classification of text documents using association rule mining with critical relative support based pruning. International Conference on Advances in Computing, Communications and Informatics (ICACCI). Jaipur, India: IEEE.
Rahman, A., & Usman, Q. (2016). A Bayesian classifiers based combination model for automatic text classification. International Conference on Software Engineering and Service Science (ICSESS). Beijing, China: IEEE.
Rashid T.A., Mustafa A.M., & Saeed A.M. (2018). Automatic Kurdish text classification using KDC 4007 dataset. In: Barolli, L., Zhang, M., & Wang X., editors. Advances in Internetworking, Data and Web Technologies. EIDWT 2017. Lecture Notes on Data Engineering and Communications Technologies. Vol. 6. Cham: Springer.
Saeed, A. M., Rashid, T. A., Mustafa, A. M., Al-Rashid Agha, R. A., Shamsaldin, A. S., & Al-Salihi, N. K. (2018). An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification. Iran Journal of Computer Science, 1(2), 99-107.
Salavati, S., Sheykh, E.K., & Akhlaghian, F. (2013). Stemming for Kurdish information retrieval. In: Banchs, R.E., Silvestri, F., Liu, T.Y., Zhang, M., Gao S., & Lang, J., editors. Information Retrieval Technology. AIRS 2013. Lecture Notes in Computer Science. Vol. 8281. Berlin, Heidelberg: Springer.
Sharma, N., A. S., & V. T. (2016). Text classification using combined sparse representation classifiers and support vector machines. 4th International Symposium on Computational and Business Intelligence (ISCBI). Olten, Switzerland: IEEE.
Tanja Gaust, G. B. (2002). Accurate stemming of Dutch for text classification (language and computers: Studies in practical linguistics). In: Theune, M., Nijholt, A., & Hondorp, H., editors. Computational Linguistics in the Netherlands. Amsterdam: Rodopi. pp. 104-117, 14
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-ND 4.0] that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).