Multi-Language Sentiment Analysis for Hotel Reviews

Touristes and traveler use avariety of information sources (e.g. travelportals, blogs, or social networkingsites like twitter) to help themdecide for a hotel room. Thesesources allcontainhighly subjective textthat expresses theopinions of many. Wetook a preliminaryview on user generatedhotelreviewsfromtwotravelportals in English andThai. Wedeveloped a taxonomy of features and specificallyinvestigated how accuratelytheycanbepredictedwiththreeclassification methods. The resultsindicatethat support vector machines perform best for thisspecificdomain.


Introduction
People clearly enjoy traveling.Travel and tourism is continuously on the rise over the past six decades and has become one of biggest and fastest-growing economic sectors on the planet, regardless any economic crisis.According to the World Tourism Organization UNWTO, international tourist arrivals have increased from 25m globally in1950, to 278m in 1980, 527m in 1995, and have now reached an impressive 1133m in 2014.Likewise, tourist spending has risen from US$ 2b in 1950 to US$ 1245b in 2014, based on the 2015 edition of the World Tourism Highlights report [1].The amount of information about hotels has increased with it and there was never a time where travellers had better and easier access to hotel reviews than today.This however also means that travellers have to base their decisions on a lot more detailed information about hotels and an even larger amount of generally very subjective information from previous guests about their quality.Decision making has therefore become more challenging and information systems use sentiment analysis to assist in the analysis of human opinion to help users make a sensible choice.In [4], for example, supervised learning with two types of information (frequency and TF-IDF) are used to realize polarity classification of reviews.
In this paper we took a preliminary view on user generated hotel reviews from two travel portals.We developed a taxonomy of features for these reviews and specifically investigated how accurately they can be predicted with three different classification methods.
We selected reviews for English, as the major spoken language in the travel domain, and Thai as a local comparison.After reviewing related work in the next section, we highlight our evaluation methodology and describe our data set in Section 3. Results are presented and discussed in Section 4 before reaching our conclusions.

Related work
This section describes an overview of sentiment analysis related disciplines such as Text mining, Feature Extraction/Identification, Sentiment classification techniques and also performance evaluation Text mining is conventional data mining done using text features.Text features are usually keywords which extracted based-on the language model.In this paper, the keyword means the word that can be used to classify reviews in terms of polarity.The accuracy of classification depends on the text features which extracted from the reviews [2].The goal of sentiment analysis is to detect opinion, identify the sentiments people express by classifying the contextual polarity of a given text.The polarity can be positive, negative or neutral.In general, there are three different levels in sentiment analysis: document, sentence, and feature/aspect level respectively.Document level in sentiment analysis is applied to detect the polarity of the whole product or service reviews [3], [8].Sentence level is applied to define a given text usually a sentence or phrase while feature/aspect level determine the opinion expressed on different feature or aspects (attribute or component) of entities [3].The results of sentence level sentiment analysis are dependent on the definition of subjectively used when annotating texts [5].The advantage of feature-based sentiment analysis is the possibility to extract about object of interest and determine an opinion expressed on each feature/aspect is positive, negative or neural [4].
In this paper, we applied three classification approaches, Naive Bayes (NB) [9], Decision Trees (DT) [10] and Support Vector Machine (SVM) [11], for the experimental comparison for Thai and English.These methods are selected as a common technique for this paper as they represent a very common and universally applicable approaches in an attempt to get initial results on our data.

Evaluation methodology
This section explained the main module that used to evaluate a reviews: One is reviews weight analysis, Second is classifier module which compared from three type of classification algorithms such as Naïve Bayes, Decision tree and Support Vector Machine (SVM).Third is testing module which the best classifier is applied.

Proposed framework
There are three main parts of sentiment analysis of hotel reviews framework.Part 1 is a reviews analysis which an input is both Thai and English reviews and the output is a reviews separated into positive and negative.Weight calculation is applied in this part in order to assign weight for all properties (Room, Location, Service, Price and Facilities) Part 2 is a training module which reviews separated used as input and output is the best model of classification process.Part 3 is testing module which model is applied to classify a hotel review in to positive and negative as shown in Fig. 1.Table 1.Taxonomy of categories and sub-categories.

Data collection and process
We collected a total of 3,000 hotel reviews for English and Thai from the two well-known websites TripAdvisor a and Agoda b between January and March 2015.We had an equally sized set for each languages (1500 reviews each).The data was collected with the tool import.ioc , and free web site scraping tool, to transform the review pages into text.A domain expert then assigned the text to the five categories Room, Location, Service, Price, and Facilities where each was labelled as a class of polarity (Positive and Negative).Then we manual separate and labelled polarity from each reviews into five categories: Room, Location, Service, Facilities, and Price.All reviews are labelled as a class of polarity (Positive and Negative) by experts.Data is spitted into three sets of training, validating and testing with 60%, 20% and 20% respectively.Then three classifier were tested based-on the same data in order to compare the performance in terms of accuracy.Fig. 2 and Fig. 3 show an example of Thai reviews and English reviews respectively.

Weight calculation
In order to calculate weight property, five properties (Room, Location, Service, Price and Facilities)and fourteen sub-property need to be extracted from the reviews.The weight for all sub properties under the core property are sum into 1.The weight ratio of sub-property can be calculated as follows:

Results and discussion
The classification results are summarized in Table 4 and Fig 4 .The result show the accuracies of the correct classification as described in Table 1.We have separated them by properties and the base classifier and highlighted the language for each group in the last column.The highest accuracy for each partition and language is highlighted in bold.The performance in terms of accuracy depends on the text features used in part of text processing.The accuracy in Thai language is less than English in all properties because in Thai reviews, there is a little comment and not all sub-property are mentioned while in English language reviews, there are almost cover all sub-property which we used to calculate weight statistical.

Conclusion
This paper proposed sentiment analysis based on statistic weight that is calculated from hotel reviews in both Thai and English language.We applied three classification methods -naïve Bayes, decision tree, and support vector machine have been used to classify the hotel reviews into positive and negative polarities.The 5 experiments were conducted using hotel reviews obtained from the TripAdvisor and the Agoda portals.The results revealed that support vector machines given the best accuracy, outperforming naïve Bayes and decision tree classification.However, properties weight which calculated using training data effected the performance of the system in terms of accuracy.

Figure 1 .
Figure 1.Proposed framework of sentiment analysis.

Figure 4 .
Figure 4. Performance comparison from three classifier of Thai and English language.

Table 4 .
Performance comparison in terms of accuracy.