Automatic Essay Scoring in E-learning System Using LSA Method with N-Gram Feature for Bahasa Indonesia

. In the world of education, e-learning system is a system that can be used to support the educational process. E-learning system is usually used by educators to learners in evaluating learning outcomes. In the process of evaluating learning outcomes in the e-learning system, the form type of exam questions that are often used are multiple choice and short stuffing. For exam questions in the form of essays are rarely used in the evaluation process of educational because of the difference in the subjectivity and time consuming in the assessment process. In this design aims to create an automatic essay scoring feature on e-learning system that can be used to support the learning process. The method used in automatic essay scoring is Latent Semantic Analysis (LSA) with n-gram feature. The evaluation results of the design features automatic essay scoring showed that the accuracy of the average achieved in the amount of 78.65 %, 58.89 %, 14.91 %, 71.37 %, 64.49 % in the LSA unigram, bigram, trigram, unigram + bigram, unigram + bigram + trigram.


Introduction
Along with advancement of internet technology rapidly allows the application of technology in various fields including one in the education field.One of the technologies that can be utilized to support the learning process is e-learning system.In e-learning system many features are developed to support the learning process such as online exam features who provided by educators to learners in evaluating learning outcomes.
In e-learning system, evaluation of learning outcomes in essay form is important because evaluation results in essay form can be used by educators as an indicator to measure the level of learners understanding of the exams given.Therefore, to solve the problem is designing e-learning system with automatic essay scoring feature that can be used to support the learning process between educators and learners.
There are several methods that can be used to perform assessments in automatic essay scoring such as string matching algorithms like Rabin Karp, Knuth Morris Pratt, Levenshtein distance, etc.However, for this designing is used method Latent Semantic Analysis (LSA) which can be used to find the similarity between answer student essays with answer key model lecturer in automatic essay scoring.
LSA is a method in information retrieval that represents words along with sentences in a matrix with mathematical calculations.Mathematical calculations are performed by mapping the presence or absence of words from word groups in the matrix.LSA method uses a linear algebra technique called Singular Value Decomposition (SVD) to find patterns of relationships between words and contexts that contain the words in semantic matrix.
Generally existing Automatic Essay Scoring (AES) techniques which are using LSA method do not consider word order in a sentence.To solve the problem, the LSA method used in automatic essay scoring for this designing will be added with n-gram feature so that word order in sentence can be considered.

Literature Review
In this case, the Latent Semantic Analysis (LSA) method with n-gram feature are used to create an automatic essay scoring in e-learning system.

Pre-processing
Pre-processing is an early stage in processing input data before entering the main stage process.Pre-processing consists of several stages.The pre-processing stages are performed in the system are case folding, lexical analysis, tokenization.Here is an explanation of the pre-processing steps performed as follows: i) Case Folding Case folding is the stage that changes all the letters in the corpus into lowercase.The purpose of case folding is to uniform input data because the input can be lowercase and uppercase.ii) Lexical Analysis Lexical analysis is the process of eliminating numbers, punctuation and characters which cannot be read by the system.iii) Tokenization Tokenization is the process of cutting or separating words from a collection of text or corpus.

Latent semantic analysis (LSA)
LSA stands for Latent Semantic Analysis is an information retrieval method that works on a set of texts in which words can be represented along with their context in a matrix with mathematical calculations.The formed matrix is a mapping of the presence or absence of a word or term from a particular group of words or terms.In addition, the context used can be a list of keywords, sentences, even paragraphs.
Latent Semantic Analysis (LSA) was proposed and patented in 1988 by Scott Deerwester, Thomas Launder, and Susan Dumais [1].This method is widely used in the field of Information Retrieval and Natural Language Processing.One example of the use of the LSA method is the assessment of essay answers.The LSA method uses the mathematical technique of linear algebra called Singular Value Decomposition (SVD) in forming semantic matrix.
The semantic matrix that has been formed is a matrix that contains the pattern of relationship between words with the context in matching between the answer and answer key [2].LSA methods generally work based on the unigram word occurring simultaneously in certain contexts so that LSA method is more concerned with the keywords contained in a sentence regardless of the word orders and grammar in a sentence, Therefore, to solve the problem, in this LSA method will be added n-gram feature so that LSA method can work on the appearance of word and phrase in certain context so that word order in sentence can be considered [3].
The designing of automatic essay scoring using LSA method, the first step is to represent the student answer and lecturer answer key in a matrix by forming term frequency matrix of student answer and term frequency matrix of lecturer answer key.The term frequency matrix is formed by counting the number of times the word appears on the lecturer answer key.The frequency of occurrence of a word or term that counts can be a set of unigram, bigram, trigram, or all of them.
The formed term frequency matrix consists of two are the term frequency matrix of the lecturer answer key and the frequency term matrix of the student answer.The frequency term matrix for the lecturer answer key consists of rows and columns where each row in the matrix represents a unique word in the lecturer answer key while each column in the matrix represents each sentence of lecturer answer key.In addition, each cell in the matrix represents the number of times the occurrence of a unique word that exists in each sentence of lecturer answer key.
In the term frequency matrix for student answers also consists of rows and columns where each row in the matrix represents a unique word on the lecturer answer key while each column in the matrix represents every student answers sentence then each cell on the matrix represents the number of occurrences of a unique word lecturer answer key is in every student answer sentence.The weighting used in each matrix cell is the frequency of the student's answer and the lecturer's answer key are the weighting of the term frequency (tf).
The next stage is to calculate Singular Value Decomposition (SVD) on the term frequency matrix that has been formed in the previous stage is the term frequency matrix for lecturer answer key and term frequency matrix for student answers.The purpose of SVD is discover a new pattern or relationship "latent" between words and sentences containing these words.The result of the SVD calculation process produces three matrices components are two orthogonal matrices and one diagonal matrix.The factoring of SVD from a matrix A dimension txd is as follows: Based on Figure 1, the column of the matrix U is an orthogonal eigenvector of A × AT.In column matrix V is an eigenvector of orthogonal A T × A. In the matrix S is a diagonal matrix whose value is the root of the eigenvalues of the matrix U or V.After doing the SVD process, the next step is to reduce the matrix of SVD calculation result or smoothing process.The purpose of reducing the size or dimension of the matrix is to remove noise or words that are considered unimportant without eliminating the correlation or the relationship between words and sentences [4].The reduction of the matrix dimension is done by reducing the dimension of the diagonal matrix containing the singular value of the second matrix result of the SVD process.
When reducing the size of the diagonal matrix S containing the singular value, the first step is, choose a value of k.The value of k which can be selected in reducing the matrix size is k n.After that the value of k will be used on the matrix of the SVD calculation results by taking the firsts row data and the first column k data on the diagonal matrix containing the singular value [5].Unselected dimension the singular value on the diagonal matrix S will be set to zero.The reduction in the diagonal matrix (S) also affects both orthogonal matrices U and V.
The result of the reduced SVD matrices can be used for reconstructing the matrix by using a lower dimension (A k ).Reconstructed matrix (A k ) has the same size as the previous matrix (A).However, the reconstructed matrix (A k ) is a matrix with approximation of matrix A using a lower dimension which can show the hidden relationship between words in the sentence so that it can cover all important words and can ignore unimportant words that are not considered.
In addition, the reconstructed matrix can estimate the hidden new relationship of a group of words or terms in a particular sentence so that the word or term group contained in the sentence may be close to or relate to each other in space k although the word or term is not ever present or appear on the sentence.Reconstruction matrix of SVD results can be seen in the Equation 2 and Figure 2.After the reconstruction of the student answer matrix and the lecturer answer key matrix has been formed, the next step is to form the answer vector and answer key vector of the matrix result from the student answer and the lecturer answer key who have been reconstructed again using the lower dimension.Reconstruction using the lower dimension will produce a new representation or relationship on the student answer and the lecturer answer [6].In this case the answer vector is the student answer sentence and the answer key vector is the lecturer answer key sentence.The final stage of the automatic essay scoring using LSA is to measure the similarity between the vector of students answers and the vector of lecturer answer key.

Cosine similarity
Cosine Similarity is a technique for measuring the similarity of the angle cosine value between document vectors and query vectors.In this case the document vector represents the sentence in the student answer while the query vector represents the sentence in the lecturer answer key.After the document vector (d) and query vector (q) has been formed, the similarity between the document vector and the query vector can be calculated using the Equation (3).
Where: q; the student answer vector.d: the lecturer answer key vector.

N-gram
Basically, the n-gram model is a probabilistic model designed by mathematicians in the early 20th century and later developed to predict the next item in the order of items.The item can be a sequence letter/character or a word sequence [7].In the word generation process, n-gram consists of words sequence along n words in a sentence.Where in word, ngram is can be used to extract pieces of n words from a sentence or paragraph.A n-gram of size 1 is called unigram, for n-gram of size 2 called bigram, for n-gram of size 3 called trigram, whereas for n-gram is a term when it has a size greater than 3.For example, the word sequence "perangkat keras komputer" (computer hardware), the word sequence in ngram as follow.Unigram : perangkat, keras, komputer.(device, hard, computer) Bigram : perangkat keras, keras komputer.(hard device, computer hard) Trigram : perangkat keras komputer.(computer hardware)

Evaluation
Evaluation of an automated essay scoring system is needed to assess how the Latent Semantic Analysis method works good in assessing or scoring the essay answers.This evaluation is done by comparing the judgments which generated by the system with human judgment, in this case the lecturer.After all data was already obtained, the next step will be analyzing mean absolute error and mean absolute percentage error result.The result of mean absolute error is the average difference between value from lecturer and systems.
Where: MAE : average value from lecturer and system.X i : value from lecturer.
Y i : value from system.n : numbers of data.
While the result of mean absolute percentage error is the average difference between value from lecturer and systems.Then the result of mean absolute percentage error calculation can also be measured the success rate of the automatic essay scoring system in assessing or scoring student essay answers.Y i : value from system.n : numbers of data.
3 Planning and implementation

System plan
At this stage will explain the planning that needs to be done in system plan.This stage is done by first determining the purpose and use of the system to be designed.The designed system is an e-learning system with automatic essay scoring feature.The system is built and is intended for students and lecturers in the learning process where in this system planning students can do the exam given by the lecturer in the form of essays while the lecturer can give the test to the student in the form of essay which then will be automatically assessed by the system with the answer key who has been included by lecturer.
The initial stage of the automatic essay scoring system is the preparation stage of the required data such as student answers, questions and models of lecturer answers key.Data consisting of questions and answers key are taken from the book "Discovering Computers 2010: Living in Digital World", websites, ppt and pdf about course material "introductory information technology".
Meanwhile the student answer data is collected by distributing questionnaires containing questions about course material introductory information technology to the students of the information technology faculty Tarumanagara University.In designing automatic essay scoring system that is built, to facilitate the design of the system in assessing essay answers, then the collected data used in the system such as student answers, questions and models of lecturer answer key will be stored in the database.Automatic essay scoring system designed to receive input in the form of questions, answer keys, and answers essay.Questions and answers were obtained from the lecturer while the essay answers were obtained from the student.Then the answers key and answers will be done pre-processing, after that the system will perform the similarity calculation between the student's essay answers with each model of lecturer's answer key using Latent Semantic Analysis (LSA) where the highest similarity values obtained using cosine similarity will be used to produce the student scores output.
In the automated essay scoring system using LSA designed as shown in Figure 3, this system runs by first managing the data entered by the user consisting of lecturers and students.Data that become system input is lecturer answer key data and student answer data.Once the system gets input from the user, the system goes into the pre-processing stage, the process of calculating the Latent Semantic Analysis algorithm, the result of the student's answer score.
Based on the flow in Figure 3, there is a pre-processing stage.Pre-processing is a series of steps performed to manage input data before entering the next stage.In the preprocessing staged there are several stages performed such as case folding, lexical analysis, and tokenization.
Further data that has been pre-processing this will be input or enter in the process of latent semantic analysis algorithm.Initial stages performed in the process of latent semantic analysis algorithm is the formation of term frequency matrix for answer and term frequency matrix for answer key.The process of forming term frequency matrix by counting the number of words or terms that appear in each sentence in the matrix, then the term frequency matrix that has been formed will be calculated SVD or Singular Value Decomposition that will produce the matrices U, S and V.The result of SVD matrices calculation will be reduction process which then be used to reconstruct the matrix by using a lower dimension.In the formation of the term frequency matrix of the answer, the set of terms or words used to calculate the frequency of words occurrence in each sentence answer is a collection of terms or words derived from the pre-processing answers key.Likewise, for the process of forming the term frequency matrix of answer key.If the reconstruction answer matrix MATEC Web of Conferences 164, 01037 (2018) https://doi.org/10.1051/matecconf/201816401037ICESTI 2017 and answer key matrix has been formed, then the next process is to form an answer vector and answer key vector.
The answer vector and the answer key vector that has been formed will be calculated by the value of similarity using Cosine Similarity method where a vector formed is a representation of a sentence in the matrix column.Then the value used to evaluate each sentence answer is the value with the highest similarity between answers vector with all the answers key vector.
In the process of assessing student answers, the value to be taken to be the output is the average value obtained from the sum of each value of the highest similarity between the student answer sentences that is compared with all key lecturer answer sentences divided by the number of sentences of student answers.Then the average value of similarity that has range (0 to 1) multiplied by value 100.

System implementation
After the design phase of the system has been designed then the next stage is to make software adjustments with the system design that has been made before.The first thing to consider in the making of this system is the system needs consisting of hardware and software from the server.Server serves as a mediator during the process of data transfer using internet connection, on the manufacture of the system will use the hosting server.
The hardware used for system development used a computer with IBM Server computer specifications as a server.As for the client side can use a computer that has a web browser like internet explorer, google chrome, mozilla firefox, etc. Software used on the server that is IIS manager and SQL Server Management Studio are used as central data storage server, Matlab used to perform mathematical calculations of linear algebra, and ASP.NET C# and Perl is used as a programming language (back end).
After all the hardware and software that will be used in the manufacture is available, then the creation of the system can be started by first setting up the database needed to store data before and after processing.The existing database will be created using the SQL Server database server.
After the database is created, then the next stage to do is create interface display for each module that has been designed previously.Display interface is made as simple as possible so that users can use the existing program easily.Once the existing interface is created, it will make the program code for the existing interface so that the interface is made to work properly.
If the existing program code has been created, then will be tested against each module that has been made.Testing is made to be known whether the existing program has been running as expected or not.In addition, through the tests made can also be known whether the program is made there is still an error or not.
Once the interface has been completed, program coding will be made available for automatic essay scoring features with the algorithm and the theoretical base described earlier.When the coding program has been completed, it will be tested against the features that have been made.Feature testing is done to find out whether the program has been made output according to the designed.

Result and discussion
Testing conducted consist of two stages is black box testing and automatic essay scoring features testing.Black box testing is done to test whether each function in each module has been able to function properly.In testing the automatic essay scoring feature is performed to determine whether the system created can assess the essay answers according with the judgments made by humans.

Modules testing
This testing is performed on all modules that exist on the system created.Outline, the modules that exist in the system is divided into 2 main modules namely the module for student and modules for lecturer.

Automatic essay scoring modules testing
The results of the testing conducted found that all the functions of the existing automatic essay scoring module had been able to run according with its function.Here is a picture showing the test results from one of the automatic essay scoring modules.

Automatic essay scoring feature testing
The process of automated essay scoring system is done to find out whether the system created can assess the essay answers according with the assessment conducted by human or lecturers and find out how much the average level of accuracy generated by the automatic essay scoring system.The process of automated essay scoring system test is divided into two parts namely the testing process by the researchers and the testing process by the respondents.

Automatic essay scoring feature testing by researchers
In the process of testing conducted by researcher, the initial process is to collect a set of questions and answers, then the data collected will be tested whether the automatic essay scoring system can assess the essay answers automatically according with the assessment of humans or lecturers.After that the collected data will be validated by Faculty of Technology Tarumanagara University lecturers.Testing conducted by the researchers there are 2 parts are testing the perfect answer with the same word order and testing the perfect answer with different word order.

Automatic essay scoring feature testing by respondents
In the process of testing by respondents, the testing begins by distributing questionnaires to Faculty of Technology Tarumanagara University students consisting of questions about the course material "Introduction to Information Technology".Total respondents who fill the questionnaire as many as six people while the total answers of students who successfully collected 30 answers.Before the testing started, first the questionnaire will be assessed manually by experts who are lecturers of Faculty of Technology Tarumanagara University.The following are some of the data for questions on the questionnaire: Apa yang dimaksud dengan sistem operasi pada komputer ?Jelaskan beserta fungsinya?(What is an operating system on a computer?Describe along with the function?) Data in the form of questionnaires that have been collected will be tested to find out how much the average mean absolute error between the assessment by the system with the manual assessment by humans and the average mean absolute percentage error of the automatic essay scoring for assessing essay answers.Then with the average level of error percentage can also be measured how much the system accuracy level.
Testing toward the automatic essay scoring feature is conducted through five scenarios, namely: 1   Based on the test results the average accuracy rate of the resulting assessment of automatic essay scoring system is quite good or almost similar for value provided by the lecturer.In addition, from five testing scenarios performed, the test with the unigram term gives the best results with an accuracy of 78.65 % or a system error rate of 21.35 % and the mean value difference between the manual value and the system value is 16.672.In the testing process with bigram, trigram, unigram + bigram, unigram + bigram + trigram the resulting accuracy is 58.89 %, 14.91 %, 71.37 %, 64.79 % with average value difference of 45.067, 63.710, 22.420, 27.380.The cause of small accuracy in the testing process of bigram terms and trigram term testing is the rarity of the frequency of occurrence of word or term bigram or trigram found in the lecturer answer key.Another thing that affect the accuracy level of the automatic essay scoring system is: i) There are still many error words in the answer when the system created can not handle the error correction word.ii) Besides the other cause is the lack of an alternative answer key that is incorporated into the system.iii) In addition, from the answers students still use a lot of words abbreviated for example "Internet Protocol" abbreviated to IP. iv) Another thing that causes the difference of judgment is that the answers key entered by the system can have a similarity of meaning or understanding with other words such as there are similarities of words or words in a foreign language.v) Another cause is the assessment of the system is very affected on the number of words that exist in each sentence.The results of the system assessment are the average value of similarity between term relationship with the sentence between the student's answer with the lecturer answer key.So that the assessment system will see the completeness of an answer based on each relationship on a term with a sentence whereas in the manual assessment if there is one sentence correct answer then the value given high.This is because the manual assessment sees the completeness of the answer overall than the whole sentence.

Conclusion
Based on the results of tests that have been done, can be obtained conclusions on the system that has been made as follows: i) Each module on the website has been running well.
: average percentage error value.X i : value from lecturer.

Fig. 4 .
Fig. 4. Testing results of the automatic essay scoring module when the student chooses the exam.

10 MATECFig. 5 .
Fig. 5. Testing results of the automatic essay scoring module when the student submits the exam.

Fig. 6 .
Fig. 6.Testing results of the automatic essay scoring module when the lecturer gives the value manually.

Fig. 7 .
Fig. 7. Graph the average rate of system accuracy percentage by respondents.

Table 1 .
Some of the data for questions on the questionnaire.

Table 2 .
The results of testing scenarios by respondents.Based on the results of testing conducted by researchers, the system has been able to assess the essay answers with the range (1 to 100).Scores that can be generated system is unigram score, bigram score, trigram score, unigram + bigram score, and unigram + bigram + trigram score.Here is one example of testing conducted by researchers.Hardware is the physical part of the computer that serves to provide input, display and process the output.This hardware is used by the system to run a command that has been programmed which means it is related to the software.This hardware is divided into four types consisting of input devices, processing, output and storage.) Answer:Perangkat keras merupakan bagian fisik dari komputer yang berfungsi untuk memberi masukan, menampilkan dan mengolah keluaran.Perangkat keras ini digunakan oleh sistem untuk menjalankan suatu perintah yang telah diprogramkan artinya berhubungan dengan perangkat lunak.Perangkat keras ini dibedakan menjadi empat jenis yaitu perangkat input, pemrosesan, output dan penyimpanan.

Table 3 .
The results of the automatic essay scoring system by respondents.