A study on the unstructured music database—Taking the Bo people’s music and its music iconography database as an example

An unstructured music iconography data system constructed by key technologies like Dublin Core, Lucene technology and MVC framework is studied in this paper. Results indicate that the traditional directory tree and the existing indexing and searching tools are severely insufficient in the organization and management of the massive unstructured data. Relevant documents can be searched effectively and rapidly through the index established by  provided by BeFS. Key technologies, such as Dublin Core, Lucene technology and MVC framework, can be applied to the construction of the enormous unstructured database of music and image resources. The database system test can be divided into two links, functional test and performance test. The test results of the Bo people’s music and image database system obtained through the tested design scheme indicate that the performance of the system is relatively high and able to satisfy the concurrent access of massive data with excellent user experience.


INTRODUCTION
Long Wen (2009) pointed out that information is generally divided into structural data and unstructured data [1] .Structural data refers to numbers and symbols whereas unstructured data refers to texts, images, sounds, and so on.The Bo people's music iconography database is a thematic database with objects of the Bo people's music and musical images of the Bo people's cliff paintings in the Gong County of Sichuan province.This paper studies the unstructured data resources, aiming at exploring key technologies and system test methods applied in the construction of the thematic database.
Many people have made efforts in studies on the unstructured database and the thematic data of music.Teng Teng (2015) pointed out that the big data of music can not only "predict the future" but also speculate and dig the music resource utilization deeply in the Internet era [2] .Haiyuan Wu (2014) discussed the construction mode of the northeastern folk music database with such social functions as modern education and resource sharing [3] .Peishun Ye (2015) studied two basic routing strategies and the heuristic P2P searching strategy of the unstructured P2P network, analyzed seven kinds of heuristic searching strategies of the unstructured P2P network, made comparisons of these strategies, and discussed the implementation mechanism as well as merits and demerits of each strategy [4] .Ying Yang (2013) constructed a data warehouse model based on unstructured data with XML and a video retrieval platform of the unstructured database [5] .Studies of the above scholars suggest that music and image resources are massive unstructured data.This kind of database can be only established in a small range.Traditional methods have certain difficulties in dealing with massive unstructured data.Based on previous studies, this paper analyzes the applicability of Dublin Core, Lucene technology and MVC framework in constructing a system of massive unstructured music and music iconography data so as to explore effective construction methods and test means for the database system.

UNSTRUCTURED DATA ORGANIZATION
With the development of science and the progress of the society, the popularization of network technologies is now in a state that cannot be surpassed.Disadvantages of the file database become gradually prominent against this background.The most severe disadvantage lies in the aspect of semi-structured and unstructured data processing.As for the organization and management of unstructured data, there are such methods as file directory tree, index and retrieval, semantic system, and so on [6] .In order to study the unstructured thematic database of music class, this paper first analyzes the three methods of organization and management mentioned above so as to provide a theoretical basis for the research object.
A study on the unstructured music database --Taking the Bo people's music and its music iconography database as an example

The method of file directory tree
Data and information are frequently stored and organized in the form of files inside the computer.The organization and management of data [7] can be also regarded as the organization and management of files.At present, the directory tree method is most commonly used in the structure of file management.This method classifies and manages files in accordance with path names of files.The advantage is that users can set up a path name through the understanding on the file content and store the file in a certain path accurately.
The path name of a file includes the semantic meaning and the function of the physical address.It means that users carry out the logic management through file path names whereas the operating system carries out semantic block operations on files through file path names [8] .The management efficiency of the directory tree method becomes lower with the continuous growth of file data.When the number of files becomes massive, the traditional directory tree method has the following disadvantages: 1) Files can be only classified into one category.With the refinement of the classification, it becomes more and more difficult to select the exclusive category for each file.Although a file can be classified into multiple categories through the linking scheme, the scheme cannot satisfy the practical usage of users perfectly.In the meantime, it brings the complexity of reclassification.The link fails to reflect corresponding changes of the source file when the source file is moved or deleted.
2) Lots of detail information of files will be lost if there is no good naming method.For example, the naming of music files MP3 fails to include all the information, such as the singer, the genre and the rhythm.
3) Related file sets must belong to the same subset; otherwise the relationship cannot be expressed.
4) A user must remember the full name of a file in the file retrieval.However, this memory is quite rare in practical applications.
When a large amount of unstructured data is managed through the traditional method of file directory tree, users are easily caught in a dilemma that they want to classify files in detail but they fail to remember absolute pathnames marking the detailed classification of files.
To sum up: the traditional method of file directory tree is not applicable in the management and organization of massive unstructured data.

Index and retrieval
Index first appeared in the bibliography system.In this sense, index refers to a system guide for items contained in a literature collection or the extraction of a concept from the literature collection [9] .These items or extracted concepts are expressed with articles of the known or explained checkable sequence.Due to the advent of the computer, the index technology gets its rapid development in modern times, especially the index technology of the database system.
"Index", as a term of database, refers to the sequence of data in a database in line with a specific domain (or attribute).The specific domain is called the key domain [10] .The corresponding indexing service is to extract information from data according to the index and then provide users with it after organizing and analyzing the information effectively.Indexing technologies that are similar to reading and consulting the catalog are adopted in the database so that the required content can be rapidly looked up without scanning the whole database.In the practical application process, users are unable to remember absolute paths of all data information.But they can provide certain features of data information required by the scanning.Users also hope that the small amount of feature information can be utilized to reduce the data file set so as to find the data required by themselves quickly.At present, most file systems provide users with relevant retrieval tools by establishing some specific indexes, such as UINX/Linux [11] , Windows, Google Desktop [12] , Yahoo, Desktop [13] , Windows Desktop [14] , and so on.The index of a search engine is different from the index of a database.But they have the same function.The index of a search engine not only provides the interface between unstructured data and users but also provides technical supports for users to find data rapidly and accurately [15,16] .The diagram of the index of a search engine is shown in Figure 1.
The advantage of existing indexing and retrieval tools is that users are able to find the required data rapidly with some simple operations.But there are certain defects, because indexes are predefined or obtained from analyses on the absolute paths of files.Both the accuracy and the usability cannot meet the requirements of massive unstructured data usage.

Method of semantic file system
Many institutes have developed indexing methods based on attributes in order to get rid of the defects of hierarchical cultural directory tree [17] .BeFS provides the method of indexing a file set through attributes.Users are able to find relevant files with keywords through the indexing structure established by <keywords, files>.
The semantic file system [18] normally uses <classification, value> to provide files with retrievable mappings.The classification is also called file's attribute, which can be obtained through user input or other methods, such as the full text analysis and data extraction of file paths.Once the attribute is established, users are able to set up the virtual folder of this attribute and all files containing this attribute can link to this virtual folder.If there is an inheritance relationship between two attributes, the parent attribute can be expressed in the form of virtual subfolders.

KEY TECHNOLOGIES OF ESTABLISHING THE UNSTRUCTURED MUSIC DATABASE
From the resource quantity, the Bo people's music and its music database belong to a massive unstructured database.This paper discusses the key technologies of constructing a highly extendible general unstructured database system based on the unified data storage standard, aiming at carrying out a study on the unstructured music database.The music database system discussed in this paper is based on the Dublin Core Metadata.It formulates the metadata standard of resource unification by combining the characteristics of the Bo people's music and its music iconography resources.After the resource encapsulation of the metadata standard, the key technology Lucene can be used to create an index in data retrieval and MVC framework can be used to process interactions between operations and data.

Dublin Core Metadata Set
Dublin Core is a brief directory model used for identifying electronic resources [22] .It is formulated by the international Dublin Core Metadata Initiative (DCMI).It is the common core standard that should be followed by all Web resources [23] .Dublin Core only defines and describes the minimum core set of resources.It consists of 15 items of core metadata that are provided in Table 1.Functions of metadata like retrieval, positioning and evaluation are the collection of attributes related to content and formats of digital resources [25] .Metadata has the function of electronic resource directory, which can be used to represent the content of electronic resources and other attributes.With the increase of electronic resources, the significance of metadata in its organization becomes greater and greater.On the one hand, metadata is able to store such information as formats and content of electronic resources.On the other hand, metadata provides exact storage locations of electronic resources [26] .The design principle of Dublin Core is that the core metadata set features extendibility, the minimum scale and definite significance.Four principles are followed in terms of design, including the simplicity of construction and maintenance, generally accepted grammar, multiple language culture and strong extendibility.

Lucene technology
Lucene is an open source project of the Jakarta project group of Apache software foundation.It is a full-text search engine toolkit with the function of data indexing and full-text retrieval [27] .
Lucene source code consists of seven components, namely query analyzer, language analyzer, lexicon, data storage, index and retrieval.The organizational structure is shown in Figure 2. Lucene index contains multiple files that are respectively stored in groups according to the segment.Indexed files of the same group have the same filenames but different expanded-names [28] .The structure of Lucene index is shown in Figure 3.Each file of the segment contains domain set information and item set information.There is a corresponding relationship between indexed files and information files, which means that information files are sequenced in accordance with indexed files.In addition, the corresponding relation between domain set information and item set information can be also realized by domain record numbers of domain record files.
Normalized factor files and deleted files are respectively used in the scoring and ordering mechanism of indexing results and pseudo delete operations.Both of the two provide basic functions.These files constitute the index information of a segment [29] .The whole retrieval process of Lucene is shown below [30] : 1) Create a textbox and save all the information that might be retrieved by users.Determine the text model of the index.The text model is an information format approved by the system.Upheavals should be avoided after the determination.
2) Create an index according to the data text.In this process, correct indexing methods should be correctly selected in line with the retrieval system scale.
3) Retrieval.Users firstly submit a retrieval request and then the system analyzes the request.Carry out relevant processing through text operations.
4) Filter and sequence the indexing results in line with certain rules and then return the results to users.A Lucene retrieval operation is thusly completed.

MVC (Model View Controller
) is a model of the program development and design.It realizes the separation of business logic and data display [30] , improves the maintainability, transportability, extendibility and reusability of programs, and reduces the development difficulty of programs.
The MVC framework includes three cores, namely Model, View and Controller.The structure is shown in Figure 4.As the main part of the MVC application program, Model consists of a business logic module and a data module.View is the interface of users' interactions, consisting of jsp and html.Controller processes nothing, which only receives requests from users and invokes Model and View to fulfill users' requests.Struts2 framework is the most popular framework in MVC framework field.The structure of the Struts2 controller is shown in Figure 5.It is composed of a scheduler, an interceptor and a service controller.The operating principles are: (1) The scheduler selects corresponding interceptor groups for preprocessing after receiving an http request from the page and then invoke the corresponding service controller; (2) the service controller processes requests by calling the service logic layer structure; (3) return the results to the interceptor group and then return to the page for rendering after the processing."title"-->"bronze drum"-->"music al instruments"-->"index ing"

Test_0204 Subclass deletion
Choose a parent class containing subclasses and delete those needed to be deleted after the display of subclasses Test_0105 Single category query "article"-->"add search criteria"-->"swf"-->" searching"

Test_0205
New field Choose a category without subclasses, click "new-built"-->"st ore" after the display of relevant configuration fields

Test_0106
Detailed information browsing Click one of the above searching structures, enter the page for detailed resource information, and click the attachment.

Test_0206 Field deletion
Choose a category without subclasses, click "delete" after the display of relevant configuration fields

SYSTEM TESTING OF UNSTRUCTURED MU-SIC DATABASE AND THE TESTING RE-SULTS ANALYSIS
The system of the Bo people's music and its music iconography data is composed of a global resource search module, a resource classification management module and a resource item management module.The global resource search module has two major functions, creating Lucene index and realizing the global search.The resource classification management module is mainly responsible for the database and category management and field configuration of the database so as to ensure the system's universality and high extendibility fundamentally.The resource item management module is responsible for metadata input and management of all music resources.It contains three sub-modules, namely new item, item input and item operation.Problems like music iconography data processing [32] and numbering [33] of the system of the Bo people's music and its music iconography data are no longer discussed in this paper.The testing of the database is accomplished in the Internet environment.Music resources, system engineering and database are deployed on the same server.The operating system of the server is CentOS of Linux and the operating system of the client host is Windows7.System accesses are conducted through the browser.Interfaces, links and forms of the system are tested so that applications and researches of unstructured music database can be provided with theoretical guidance.

Test content design
Objectives of functional tests are provided below: 1) According to the UI design, test whether the sys-tem interface and the design are consistent and whether the system is operable, reasonable, artistic and coordinated.Tests mainly include navigation test, form test and overall interface test.
2) Test whether all links and functional buttons in each page skip to the target page in line with the design and whether link pages exist or not so as to make sure that all links are valid and correct.
3) Test character strings, numerical values, required items and keywords of the input content so as to test whether parameters of the submitted forms are correct or not and whether the response of the critical value is correct or not.
According to the above test objectives, functional tests can be accomplished with examples of manual tests.Specific testing processes of the portal subsystem, the classification management module and the item management module are provided in Table 2.
Objectives of the performance test are presented below: 1) Provide the statistics of performance parameters, including response speed, handling capacity, and so on.
2) The portal subsystem contacts with users directly.The database is frequently operated due to the large number of users.Therefore, retrieval pages have the largest load in the system.The performance test of the single category query page of the portal subsystem is carried out with JMeter.Specific parameter configuration of the performance test is shown in Table 3.

Test results analysis
Functional test results analysis: The database system is tested in accordance with the functional test cases, results of which are shown in Table 4.It can be known from Table 4: The test of the database system is in good condition; the system meets the design requirements; interface styles and color unification are excellent; users have good experience; functional logic can be realized correctly; the compatibility is good except that for IE6/IE7.
Performance test results analysis: Aggregated data of the performance test of the portal subsystem obtained under the JMeter parameters configuration in Table 3 is listed in Table 5.
It can be known from Table 4 that the system is still able to guarantee short time and high handling capacity with heavy loads.
To sum up: the performance of the unstructured Bo people's music and its music iconography database system designed in this paper satisfies the requirements of the database construction as well as concurrent accesses of massive data with extremely excellent user experience.

CONCLUSIONS
Based on the explanation of the unstructured data organization and management, this paper analyzes the key technologies of establishing the unstructured Bo people's music and its music iconography database system and carries out functional tests and performance tests on the database system established by the key technologies.Conclusions are drawn as below: 1) The traditional method of file directory tree is not applicable in the management and organization of massive unstructured data.Existing indexing and retrieval tools cannot be fully applied in the management and organization of massive unstructured data.Relevant documents can be searched by users effectively and rapidly through the index established by <keywords, files> provided by BeFS.
2) Dublin Core, Lucene and MVC framework can be regarded as the key to the construction of the unstructured database system.
3) The unstructured database system is composed of a global resource search module, a resource classification management module and a resource item management module.
4) The test of this kind of database system should be divided into two parts, functional test and perfor-mance test.Functional tests include three aspects, namely interface style, link test and form test. Performance tests mainly involve response time and handling capacity.
5) The thematic database system of the Bo people's music and its music iconography constructed by key technologies like Dublin Core, Lucene and MVC framework has a relatively high performance.It allows concurrent accesses of massive data and has extremely excellent user experience.

Figure 1 .
Figure 1.Indexing diagram of the search engine

Figure 5 .
Figure 5. Structure of the Struts2 controller College of Music and the Performing Arts, Yibin University, Yibin, Sichuan, China ABSTRACT: An unstructured music iconography data system constructed by key technologies like Dublin Core, Lucene technology and MVC framework is studied in this paper.Results indicate that the traditional directory tree and the existing indexing and searching tools are severely insufficient in the organization and management of the massive unstructured data.Relevant documents can be searched effectively and rapidly through the index established by <keywords, files> provided by BeFS.Key technologies, such as Dublin Core, Lucene technology and MVC framework, can be applied to the construction of the enormous unstructured database of music and image resources.The database system test can be divided into two links, functional test and performance test.The test results of the Bo people's music and image database system obtained through the tested design scheme indicate that the performance of the system is relatively high and able to satisfy the concurrent access of massive data with excellent user experience.Owned by the authors, published by EDP Sciences, 2015 Keywords: unstructured data; image database of the Bo people's music; Dublin Core; Lucene technology; MVC framework; system test DOI: 10.1051/ C

Table 2 .
Manual test cases of the unstructured music database system

Table 3 .
Parameter configuration of JMeter

Table 4 .
Functional test results of the portal subsystem of the unstructured Bo people's music and its music iconography database system

Table 5 .
Aggregated data of the performance test of the portal subsystem