THE IMPLEMENTATION OF CACHING DATABASE TO REDUCE QUERY’S RESPONSE TIME

Performance is an imperative aspect of a web-based application. Response time is one of the most important performance parameters. Most of web-based applications are driven by database system. Reducing response time of its database system will improve the performance of a web-based application. Caching is one of techniques that can be implemented to reduce response time. The implementation of caching can be conducted in the application or database side. Caching is a technique to keep the executed query and hence the query will not be executed when called in the later execution. The result of an executed query will be stored in the cache therefore when that query is called, the database will retrieve the stored result without re-executing query. This mechanism will reduce the time to re-execute queries. However, this technique may be not suitable for all queries. This research is intended to investigate the effect of the implementation of caching into performance of a database. A series of experiments have been conducted by applying query cache. The response time of the application of query cache is compared with the response time of database without query. The results show that the implementation of query cache reduces response time significantly. However, this technique is more suitable for large database with many frequently asked queries. The number of clients accessing the database is found to be influencing the performance of caching. The more clients access the database, the more improvement is provided by query cache.


INTRODUCTION
Many web-based applications are built using a multi-tier environment (Larson et al., 2004). Multi-tier client environment involves the use of client-tier, middletier and database-tier. Client-tier is the part that relates directly to the client, one of which is the browser. Through a browser, the client can connect with middle-tier. Middletier components relate directly to the database tier. Middletier contains business process that serves to process client requests that come through the client-tier to retrieve the data in the database-tier. Middle-tier itself can be divided into several types in use, such as web-based application or desktop-based. By the time the client is using a browser, an application that is used is a web-based application consisting of a web server and web client. Results from the web client request will be returned through the web server.
Each user who accesses the application will make a request through the browser, which is then forwarded to the web server. On the web server, the request is processed and routed to the database. Then, the application returns the response to the user via the browser. Web servers always retrieve data from the database to process each request that comes, regardless of whether a new request comes together with a previous request. The heavy workload of web servers can cause a problem into the network traffic , Charles et al. 2008, Paweł and Krzysztof, 2011.
Caching is used to overcome the workload problem including Web Caching (Jesse et al., 2002), Web Databases Caching (Alexandros, 2010) and Database Caching , Charles et al. 2008, Paweł and Krzysztof, 2011. This research investigates the affect the Query Cache into Database. Database Caching is an imperative method to overcome the problem in high-traffic web-based applications (Priya et al., 2011).
With the implementation of Query Cache, is expected to reduce the response time in executing the query. On applying the Query Cache database, user queries are executed in cache on the local database or the database contained on the database server.
This research is intended to examine how the effect of the application of caching in the database of the performance, especially in query response time. From this study, it is expected to obtain the guidance of applying caching in database. This research is conducted by using empirical approach. A series of experiments are held that is by applying the Query Cache in a subject application.

LITERATURE REVIEW A. Performance Tuning Overview
Database performance tuning consist of three steps: Performance Planning, Instance Tuning and SQL Tuning. Performance tuning needs to be well planned involving all resources. Maintenance is important to decrease the problem in performance. Good management of all components of database in a system will reduce or avoid the problems during operation (Immanuel and Lance, 2014). Instance Tuning is involved when a database initialized that is to prevent bottlenecks. In the initial configuration of a system, resource allocation will be involved. The most important of instance tuning is to identify bottlenecks and make appropriate changes to eliminate the problem of the bottleneck. Tuning is usually done repeatedly, that is when the system starts to operate or while running. In general, tuning implement improvements in performance problems. In other words, tuning should be the part of the life cycle of an application system. Typically, tuning phase will be ignored until the database is generated. Tuning can be a reactive process, where most major bottleneck problems will be identified and repaired (Immanuel and Lance, 2014).
When the SQL statements executed on the database, query optimizer will determine the exact execution plan and factors relating to the object as well as the conditions specified in the query. The main purpose of query optimizer is to process the SQL statement and reduce the execution time. Throughout the process of execution, the query optimizer statistics gathered and used in reviewing the system to determine the best address data access and other considerations. We can reuse the (override) execution plan of the query optimizer by giving different input into the SQL statement.

B. Caching
Caching is a technique that can be used to improve the performance of a system. The cache can be interpreted as part of a computer's memory where information is stored so that the computer can find the information faster (http://www.merriam-webster.com/dictionary/cache, 2014). The stored information can be a result of the previous computation process or a duplication of the original information. There are some caching techniques that can be done to improve the performance of web-based applications that implement multi-tier concept. Caching can be performed on the database tier, the middle-tier and the client tier.
Caching technique in MySQL is named as Query Cache. The architecture of MySQL is shown in Figure 1.  (Schwartz B. et al., 2012) The upper layer of MySQL architecture contains one or more services that are needed by client/server (Schwartz B. et al., 2012), such as connection handling, authentication, security and so on. Important processes are held in the second layer, including code for query parsing, analysis, optimization, caching and all built-in function. The third layer contains storage engine that responsible for storing and retrieving data. The followings are steps of query execution (Schwartz B. et al., 2012): 1. Before parsing a query, MySQL checks whether Query Cache is active or not. If active, MySQL checks if the query has been stored in cache. MySQL the checks the privilege, if allowed MySQL returns Query Cache results to the client. When query is not in the cache, Query Optimization process will be activated to obtain the result of the query. 2. In Query Optimization process, MySQL parser divides a query into tokens and builds parse tree of those tokens. Parser ensures that the token is valid, and every part is in proper order. 3. Preprocessor checks related tables and columns and make sure the referenced name or alias is not ambiguous. Furthermore, the preprocessor checks privileges. 4. After the process of the parser and preprocessor, parse tree is ready to be converted into execution plans by the optimizer. The query is executed on the Query Execution Engine in accordance with the execution plan that has been generated previously.
Query Cache is a technique that can be performed on systems with multi-tier concept. The purposes of applying the Query Cache is to increase performance and scalability of applications with distributed query processing . At the time of Query Cache is placed on the middle-tier along with the application server, the Query Cache works like a database server. Query Cache can keep most of the existing data on the database server . Thus, the server workload can be shared on the Query Cache. With the division of the workload of the database server, the response time to the user can be reduced, especially when the database server does the heavy work .
To ensure the beneficial use of Query Cache, server operations have to be tested both with the cache on and off (http://dev.mysql.com/doc/refman/5.5/en/querycache.html, 2014). The efficiency of Query Cache can be changed when the workload of the server is changed.
Query to be executed is compared with the correspondent query in Query Cache (http://dev.mysql.com/doc/refman/5.5/en/query-cacheoperation.html, 2014). They have to be exactly the same. The value of variable Qcache_hits is incremented each time the query cache is being used. When a table has been changed/updated, all related queries in the cache will be deleted including all queries that have join operation with that deleted table.
MySQL stores query cache in the memory. The stored information in query is not just the result set, but also all the structure of the query including the relation between all involved tables with its query result and query text. Besides the housekeeping (base structure), query cache memory pool is provided in variable-sized block (Schwartz B. et al., 2012). When server is run, memory for query cache is initialized. Memory pool is initialized as an empty single block. When the server caches a result set, the size of allocated block in memory pool is set to query_cache_min_res_unit (Schwartz B. et al., 2012).
There are some constraints of Query Cache. Query Cache is not applicable to the following cases: query is a sub-query, the executed query is in a function, trigger or event, referring to the user or local stored program variables, referring to INFORMATION_SCHEMA, or PERFORMANCE_SCHEMA, referring to partition table, query uses temporary tables, query without table and user without any privilege to access the correspond tables (http://dev.mysql.com/doc/refman/5.5/en/query-cacheoperation.html, 2014).

BISSTECH 2015
Query Cache store the text of the SELECT statement and the results of the statement are sent to the client (http://dev.mysql.com/doc/refman/5.5/en/query-cache-inselect.html, 2014). If the server receives the same statement/query as the previous statement, then the server will take the result of Query Cache and will not re-execute the statement. Query Cache is divided into sessions, so that a result set resulting from the query execution carried out by a client, can be reused as a response to another client (http://dev.mysql.com/doc/refman/5.5/en/query-cache-inselect.html, 2014). Data stored in the cache is always the last data and trustworthy. All statements INSERT, UPDATE, DELETE or other modifications at a table will make the relevant data on the Query Cache eliminated (flushed) (http://dev.mysql.com/doc/refman/5.5/en/query-cache.html, 2014).

THE MATERIAL AND METHODOLOGY
This research is conducted by using empirical approach. A series of experiments are held that is by applying the Query Cache in a subject application. Del's library database system is used for the experiments. It is analyzed in order to design a caching application on the database. Data and running queries on the database will be reviewed if it is adequate for use in this study. Dummy data and the query will be added to improve the accuracy of the results.
A web-based application is used to simulate the application of the studied method. There are two types of environment being applied: standalone and client-server system. The experiments performed on a standalone system involve the database server and web server. Application on a web server executes queries in the database server. While the client-server system involves five clients that simultaneously execute queries in the database server. The response time of query execution with caching and without caching will be compared in both environments. The less response time the better method.
There are 17 queries involved in the experiments. Those queries use three tables in the database. Table 1 lists the used tables with their size. There are four phases (P1, P2, P3 and P4) of implementation that are intended to observe the influence of the size of the database. The phases are respectively involving the original tables, tables with 200% additional records of P1, tables with 300% additional records of P2, and tables with 400% additional records of P3.

RESULTS AND DISCUSSIONS
In this section, the results of the experiments are presented. Table 2 presents the response time of query execution in the first two phases (P1 and P2) of standalone system.

Table-2. The result of P1 and P2 in standalone system
The response time of query execution in the phases P3 and P4 of standalone system are presented in Table 3.
The results in Table 2 and Table 3 show that response time of query execution for all queries with caching is lower than without caching. The improvement is significant with all types of queries. The bigger the involved data the higher improvement is resulted by applying caching.   Table 4 and Table 5 present the results for clientserver system.  The experiments in client-server system indicate the same results. The response time of query execution for all queries with caching is lower than without caching. The results with bigger data (P4) indicate that caching produce bigger saving. In this case, saving is the ratio of the response time of the system with caching and without caching. For example, saving for Q1 with P1 in clientserver system is 94.20% ((1 -(0.0019607 / 0.0338124)) * 100). The average saving for all queries in both environment is presented in Table 6.  Table 6 shows that the saving for all phases are mostly more than 90%, this indicates that the implementation of caching can reduce response time significantly. The highest improvement is achieved when it is applied into big size of data in client-server system A statistical analysis is conducted to find out the difference of the two studied methods. The response time of each query in all phases is compared by using Wilcoxon Signed Test. This non-parametric test is chosen since the number of sample data is small (17). The pvalues resulted from all comparison (all phases in both environment are less than 0.05. It indicates that the two methods are significantly different.

CONCLUSIONS AND SUGGESTIONS
The implementation of query cache into a database has been investigated experimentally in this research. The response time of query execution in database with caching is compared with database without caching. The two methods are implemented and compared in two environments, standalone and client-server system. The results show that the application of caching reduce the response time of query execution with big saving. The results also indicate that the application of caching in bigger data will produce bigger benefit. It has been analyzed statistically that the query caching can significantly improve the response time of a database.
However, the validity of this experiment needs to be enhanced in the next research. The application and data under tests with real operation need to be applied. It is also suggested to investigate the methods in more complex database with varies of query.