Study of New Materials Design based on Hadoop

With the rapid development of information technology, the scientific research shows that the data mining and other information technology could be used in the design of new materials. It is explicit that Intelligent Materials research focuses on using physical and chemical principles combined with computer techniques such as Big Data, Cloud computing and Intelligent modeling and simulation to solve chemical problems. In this paper, based on the cluster based outlier algorithm as the main body, this paper discusses the definition New Materials research In the Hadoop cloud platform, and the parallel processing of Map-Reduce model. The performance this model of new material was established by using the method of Map-Reduction provided the basis for the performance optimization.


Introduction
In recent years, the scientific research shows that the data mining and other information technology could be used in the design of new materials.On one hand this new research method would help to make the new materials much more intelligent and useful than the traditional method, on the other hand it can get more ideal material with less experiment, and achieve the result with half the effort.From the first International Conference on Computer Aided Design of New Materials in 1990 to the Intelligent Materials developed in Big data time, it is explicit that Intelligent Materials research focuses on using physical and chemical principles combined with computer techniques such as Big Data, Cloud computing and Intelligent modeling and simulation to solve chemical problems [1][2][3][4] .
Nowadays there are some international journals about this interdisciplinary research, such as Modeling and Simulation in Materials Science and Engineering, Computational Materials Science and so on.Nongnuch Artrith and Alexander Urbanb [5] have worked about Machine learning interpolation of atomic potential energy surfaces enables the nearly automatic construction of highly accurate atomic interaction potentials.Mansouri Iman and Ozbakkaloglu Togay [6] studies the ability of artificial neural network (ANN), adaptive neuro fuzzy inference system (ANFIS), multivariate adaptive regression splines (MARS) and M5 Model Tree (M5Tree) techniques to predict ultimate conditions FRP-confined concrete.Performances of the proposed models are also compared with those of the existing conventional and evolutionary algorithm models, which indicate that the proposed ANN, ANFIS, MARS and M5Tree models exhibit improved accuracy over the existing models.Cabaleiro Manuel, Riveiro Belen, and etc [7] have researched how to represent an important lack of material in the structural member.All the changes in the cross section of the beam must be considered in any kind of strength calculation and the protocols for the

Computer Aided Material Design
Computer Aided Material Design is divided into two parts: First Principle Calculations and Data Mining.Materials research areas [8][9][10] .Through the data mining method would sum up the law and the method to obtain the required materials.In this paper, based on the Hadoop cloud platform, The performance prediction model of new material was established by using the method of data reduction and support vector machine or artificial neural network, which provided the basis for the performance optimization.

Data Mining and Hadoop
Calculation and design from the theory, although it can provide many useful clues, but the complexity of the HDFS is a file system that can provide massive data storage and high throughput data access to the application, at the same time, she and the common file system, can be very convenient to achieve the file operation.Thanks to design architecture of HDFS master-slave (Master/slave), a HDFS cluster contains a node name (name node) and a series of data nodes (data node), which acts as a master role name node mainly responsible for the HDFS file system management, and accept the request sent by the client.As data node is slave role, is the main function of the data file is stored in HDFS by a file cut into one or more data block, the data block is storage in one or more data node, and the generated logical file storage structure on the name node nodes.HDFS file system in order to achieve high fault tolerance of the system, will be a number of copies of each data block to be stored in a number of different data nodes.

Map-Reduce Model and Experiment
In the 1980s with the new material exploration and research when First Principle Calculations method played an important role, it is important to discover those new materials, such as High temperature superconducting material, Super hard material, Nano material, Artificial low dimensional quantum structure materials.The first principle calculation is also called as Calculated based on the design of quantum theory.And its basic method has the solid quantum theory and the quantum chemistry theory.Especially suitable for the calculation and design of materials for atomic scale, Nano scale engineering, materials for many devices, and materials for electronic devices.Main research task in the aspect of material surface and interface would be as following work: reveal occur in the physical connotation of material surface and interface phenomena; how to use first principles method to calculate, design of surface and interface of physical chemical and dynamical processes.At present, the most powerful theoretical method is molecular dynamics simulation and Monte Carlo simulation.The key of the technique lies in the accurate calculation of the interaction potential between atoms.However Data Mining is so different as First-Principle Calculations which be viewed as inductive method.Data mining is mainly defined as ''a non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data'', or ''the analysis of observational datasets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner''.As such, data mining and knowledge discovery are typically considered knowledge intensive tasks.Thus, knowledge plays a crucial role in Intelligent

DOI
Fig1 Hadoop Architecture Hadoop distributed file system (HDFS) is the bottom Hadoop file system, is designed by Google open source implementations of distributed file system GFS.HDFS Hadoop as one of the core components, and Hadoop HDFS data in blocks of storage in the cluster files.HDFS by superior fault tolerance, which can run on a large number of common storage hardware, to meet the data Map-Reduce Hadoop as an open source implementation of Map-Reduce Google, is a high reliability, high fault tolerance capability of parallel computing software framework.Map-Reduce based applications can be run in large clusters of parallel processing large data sets.Simplifies concurrent programming model provides application programming interface (API) to the user through the map reduce, make not familiar with parallel computing users can easily develop map reduce application, and can reduce the amount of repeated work, map reduce execution flow chart is as follows Fig 2.

Fig 2
Fig 2 Map-Reduce structural work Map-Reduce mainly consists of two core operations: mapping (Map) and protocol (Reduce).The mapping (Map) method is used to get a set of key value pairs