MATEC Web Conf.
Volume 277, 20192018 International Joint Conference on Metallurgical and Materials Engineering (JCMME 2018)
|Number of page(s)||7|
|Section||Data and Signal Processing|
|Published online||02 April 2019|
Content enrichment with expressive document modelling to leverage the understanding of unstructured data
All in All Analytics Limited, New Zealand
* Corresponding author: firstname.lastname@example.org
Most information in an enterprise is in the form of unstructured data which is usually managed using a document database. One of the key challenges is to define a generalized data model for this unstructured data and any information extracted from it using content enrichment algorithms. It is more challenging to incorporate provenance and temporal capabilities to such data models. Semantic databases use ontologies such as PROV-O to represent their provenance information expressively, and relational databases use for example Slowly Changing Dimensions (SCDs) concepts to represent temporal information. In this paper, we present a document model which has features inspired from Dublin core, PROV-O and temporal methodologies to generalize information extracted from unstructured data using content enrichment algorithms. Provenance information enables comparison of enrichment models, allows reproducibility and facilitates complex filtering on the enriched data. Temporal metadata helps in versioning the document and enables point-intime and history queries conveniently.
© The Authors, published by EDP Sciences, 2019
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.