The research of regression model in machine learning field

The paper herein will analyze the sale of iced products affected by variation of temperature. Firstly, we will collect the data of the forecast temperature last year and the sale of iced products and then conduct data compilation and cleansing. Finally, we will set up the mathematical regression analysis model based on the cleansed data by means of data mining theory. Regression analysis refers to the method of studying the relationship between independent variable and dependent variable. Linear regression model that corresponds to the practical situation is proposed in the paper, which is to set up simple linear regression model based on practical problem and then to implement the following with the help of the latest and most popular Python3.6. Python3.6 boasts the features of pure object-oriented, platform independence and concise and elegant language. So we will call the corresponding library function to predict the sale of iced products according to the variation of temperature, which will provide the foundation for the company to adjust its production each month, or even each week and each day. As a result, the situation of overproduction can be avoided. Moreover, the other situation as the profit will be affected by the lack of production since the rise of temperature will also be avoided. So the regression model also has reference value for the other fields of marketing.


INTRODUCTION
With the increasing growth of the Internet, various kinds of data have been increasing in an explosive way.For example, twitter produced the data at the volume of 8TB each day [1] in 2011, Facebook produced the data at the volume more than 500TB a day in 2012 [2] and moreover as the biggest Internet company, Google might work on the data at the volume of 20PB each day in 2008 [3].So, it will be so important to collect, analyze and model a large amount of data in order to conduct valuable prediction in various fields.Nowadays many machine learning systems running in data center have emerged continuously, such as Graph Lab [4] , Mahout [5] , Mad LINQ [6] and so on.Among them, Mahout based on distributed machine learning library has been widely used by lots of corporations such as Yahoo, Twitter, Linked In and so on.Under this background, Python, as the most popular programming language in the field of machine learning, has been used more and more widely.The paper herein will conduct the programming analysis on the model based on the latest Python3.6.
Python3.6 is a generalized programming language with strong function, that has been widely used and is easy to learn at the same time.Moreover, it fits for software programming of various scale [7].Nowadays Python3.6 has mainly been used in fields of data mining, analysis and some others.Python boasts the features of unified writing form ， strong readability, diverse algorithms and rich data structures, which means it can package both data structures and general algorithms.With its various advantages, it can not only integrate with other programming languages but also maximum its advantage of mixed programming language, leading to its great convenience of utility.Prediction analysis is one problem that has to be solved in data mining field.Generally speaking, we can choose the most suitable algorithm and model to conduct the analysis according to the practical case.The paper herein will mainly study on how to define the fitting relations between independent and dependent variables and then confirm parameters of the model, finally go back to the assumed equation to predict the variation tendency of dependent variable based on the typical linear regression model of predicting the sale of iced products according to the temperature variation.In the paper, we will establish linear regression model of one-variable with the help of Python3.6, which may predict the effect of temperature variation on the sale of iced products ideally.

PYTHON 3.6 2.1 Introduction
In recent years, further study [8][9] has gradually been the research focus of machine learning field, which may give rise to the prevailing of the object-oriented programming, Python3.6 in the field of machine learning and being well-received by learners.
There have been many new features added to Python3.6 on the basis of Python 2.7, such as formatting string and literals, variable comment syntax, underlining literals, asynchronous generator, asynchronous derivation and so on.Python3.6 has showed its great advantage on data analysis and data mining.Python3.6 has attracted great attention because of its convenience of learning and strong functions and boasted great advantage in data analysis and data mining.

Advantage
Python3.6 is an object-oriented language design program with pure script, which combines essence and designing rules of various design languages showing the features of interactive connections and type of explanation.Python has been focused by researchers due to its concise, elegant and clear language.Google was the first to use the Python as the development tool of network applications.
Python3.6 is a pure object-oriented language that can be used in the development of large-scale software due to its object-oriented mechanism, efficient execution and platform independence.It can express almost every comprehensive object with advanced data of tuple, list, dictionary and so on.
Python3.6 can write codes while running and each code can be tested immediately after being finished, which can dramatically improve the efficiency of engineers.Moreover, procedures written by c/c++ can be easily changed into the expansion mode of Python due to its strong expansibility.Python3.6 simplifies the duplicated codes in the software giving rise to the feature of permitting dynamic construction and execution of procedure.
The paper herein will predict the sale of iced products affected by the variation of temperature based on the library function such as linear regression by concise programming language.

Introduce
The Nowadays commonly used algorithms of data mining can be divided into several kinds of classification, cluster, association rules and time series prediction.Data mining may mainly be used in aspects of banking service, telecommunication, information security and scientific study.Furthermore, the popular tools of data mining are Weka, statistical analysis software Spass, Clementine, Rapidminer, Orange, Knime, Keel, Tanagra and so on.
Sorting algorithms that are often used in data mining are mainly linear regression algorithm, logistic regression algorithm, Bayesian decision theory and classifier, Support Vector Mouhime proposed by Cortes and Vupnik in 1995.Among these SVM algorithms, they can also be divided into several kinds according to its linear situation and the kernel function is used in the widest way.
Clustering algorithms commonly used in data mining are mainly hierarchical clustering algorithm, partition clustering algorithm, clustering algorithm based on density, clustering algorithm based on network, clustering algorithm based on model which may also include statistical method and neural network method.

Trend
Data The paper herein may mainly propose the linear regression algorithm belonging to sorting algorithm, including data collecting, data cleansing.We will set the forecast temperature as the independent variable and the sale of iced products as the dependent variable.The results of data analysis show that the factors which may influence the sale of the company are chosen totally correct.

Theoretical
Linear regression analysis can be divided into simple linear regression and multiple linear regression.The paper will mainly analyze simple linear regression model that is the analysis method of studying the relations between independent variable and dependent variable.We will set the model of dependent variable y and the independent variable  � (i=1,2,3……) that will influence the variable y and the predict the development trend of y ,Simple linear regression model will be expressed as followed: 0 1 y a a x e    y is the dependent variable and x is the independent variable.0 a , the constant term, is the intercept of the regression line on the vertical axis and 1  a is regression coefficient that is the slope of the regression line.e is the random error which will be used to express the effect of random factors on dependent variable.

Methods
Regression analysis will evaluate 0 a and 1 a by observing the sample , ) i i x y ( in practical application.The paper will draw the scatter diagrams of dependent variable and independent variable by Python3.6 to evaluate the model parameter and establish regression model.After that, the regression equation and accuracy will be defined by the judgement of coefficient.For simple linear regression, coefficient of determination will be defined as: ŷ is the original value and y is the predicted value, so the equation will be showed here TSS is sum of squares for total and ESS is regression sum of squares.Moreover, RSS is residual sum of squares.

Algorithm implementation
The paper will use Python3.6 to set up linear regression analysis model targeting at the effect of temperature variation on the sale of company.We introduce the unified operating system interface function in Python3.6 and all the functions in the numpy library that stores object variable.Moreover, we will also introduce Pandas analysis package and establish more advanced data structure and data analysis package of tool.The importing statements of Python3.6 will be showed as followed: Import os; Import numpy; From pandas import read_csv; From matplotlib import pyplot as plt; From sklearn linear_model import lineatRegression;

The comparison results
Draw the scatter diagram of the forecast temperature and sale of iced products after data cleansing, which will be showed as followed: .We can predict the linear relations between x and y by means of the sample of corresponding relations of variable x and variable y.The training model will be showed as followed: lrModel.fit(x,y) and lrModel.score(x,y)，Test the regression model and the result will be showed in Fig. 2.  The result shows when the temperature is 24 degrees, the predicted value of sale will be 959.25;when the temperature is 28 degrees, the predicted value of sale will be 1128.977;when the temperature is 30 degrees, the sale will be 1213.837.So, the company can adjust the production according to the predicted value of sale, which will have great effect on the company.At the same time, the linear regression model used in the paper can obtain greater imitative effect of the predicted value of product sale based on the temperature variation and boast strong practicality.The latest Python3.6used in the paper also shows its concise and elegant statements that fits for mathematics model.Python3.6 in data mining will be the first choice of learners of machine learning field.

CONCLUSION
The paper herein introduces the algorithm and model of the field of machine learning.Linear regression model is used to analyze the sale of iced products of company and the effect of temperature variation on the sale.Firstly, we cleanse the data collected one year ago and analyze data at the same time.Then we choose forecast temperature as the independent variable and the sale of iced products as the dependent variable to establish simple linear regression model for analysis.We use the object-oriented programming language Python3.6 and introduce the linear regression function.Programming language can also make data analysis in the field of data mining easier.The final result correctly leads the company to adjust the production and sale of iced products flexibly according to the variation of temperature, which definitely provides great commercial value and offers crucial theoretical foundation for the sale of other companies who produce iced products.

Fig. 1
Fig.1 Scatter Diagram Plt.ScatterEstimate model parameter andset up regression model, then introduce Linear regression function: LrModel=LinearRegression().We can predict the linear relations between x and y by means of the sample of corresponding relations of variable x and variable y.The training model will be showed as followed: lrModel.fit(x,y) and lrModel.score(x,y)，Test the regression model and the result will be showed in Fig.2.