MATEC Web Conf.
Volume 135, 20178th International Conference on Mechanical and Manufacturing Engineering 2017 (ICME’17)
|Number of page(s)||10|
|Published online||20 November 2017|
A Comparative Study of Open-Domain and Specific-Domain Word Sense Disambiguation Based on Quranic Information Retrieval
Faculty of Information Science and Technology, UKM, 43000 Bangi, Malaysia
* Corresponding author: email@example.com
Information retrieval is the process of analysing typed query as well as to retrieve relevant document according to the user query. Several issues can significantly affect the effectiveness of information retrieval. One of the common issue is the ambiguity lies on the words where a single word could yield several meanings. The process of identifying the exact sense of word is called Word Sense Disambiguation (WSD). Quran is the holly book for nearly 1.5 billion Muslims around the world. In particularly, Quran contains numerous words that can undergone multiple meanings. Therefore, there is a vital demand to apply WSD approach on Quran, in order, to improve the information retrieval. Several WSD approaches have been proposed for Quranic retrieval. However, these approaches are divided into two main categories; open-domain WSD approach and specific-domain WSD approach. Open-domain WSD is an approach that utilizes an open-domain dictionary such as WordNet, that is exploited to provide the exact sense. Whereas, domain-specific WSD approach aims to utilize a restricted training data that contain specific senses related to the domain of Quran. Hence, this study aims to establish a comparative study to investigate the two WSD categories including domain-specific and open-domain. For the domain-specific approach, a predefined example data has been collected to train Yarwosky algorithm which is a semisupervised machine learning technique. Then, based on the training, such algorithm can classify the exact sense for the words. In contrast, WordNet which is an open-domain dictionary has been used in this study with semantic distances, in order, to identify the similarity between the query word and the results of WordNet’s concepts. That dataset that has been used in this study is a Quranic translation. The experimental results have shown the mixed superiority of Yarwosky algorithm and WordNet WSD approach.
© The Authors, published by EDP Sciences, 2017
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (http://creativecommons.org/licenses/by/4.0/).
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.