Header

UZH-Logo

Maintenance Infos

“Semantics Inside!” But let’s not tell the Data Miners: Intelligent Support for Data Mining


Kietz, Jörg-Uwe; Serban, Floarea; Fischer, Simon; Bernstein, Abraham (2014). “Semantics Inside!” But let’s not tell the Data Miners: Intelligent Support for Data Mining. In: European Semantic Web Conference ESWC 2014, Crete, Greece, 25 May 2014 - 29 May 2014. Springer, 706-720.

Abstract

Knowledge Discovery in Databases (KDD) has evolved significantly over the past years and reached a mature stage offering plenty of operators to solve complex data analysis tasks. User support for building data analysis workflows, however, has not progressed sufficiently: the large number of operators currently available in KDD systems and interactions between these operators complicates successful data analysis. To help Data Miners we enhanced one of the most used open source data mining tools—RapidMiner—with semantic technologies. Specifically, we first annotated all elements involved in the Data Mining (DM) process—the data, the operators, models, data mining tasks, and KDD workflows—semantically using our eProPlan modelling tool that allows to describe operators and build a task/method decomposition grammar to specify the desired workflows embedded in an ontology. Second, we enhanced RapidMiner to employ these semantic annotations to actively support data analysts. Third, we built an Intelligent Discovery Assistant, eIda, that leverages the semantic annotation as well as HTN planning to automatically support KDD process generation. We found that the use of Semantic Web approaches and technologies in the KDD domain helped us to lower the barrier to data analysis. We also found that using a generic ontology editor overwhelmed KDD-centric users. We, therefore, provided them with problem-centric extensions to Protege. Last and most surprising, we found that our semantic modeling of the KDD domain served as a rapid prototyping approach for several hard-coded improvements of RapidMiner, namely correctness checking of workflows and quick-fixes, reinforcing the finding that even a little semantic modeling can go a long way in improving the understanding of a domain even for domain experts.

Abstract

Knowledge Discovery in Databases (KDD) has evolved significantly over the past years and reached a mature stage offering plenty of operators to solve complex data analysis tasks. User support for building data analysis workflows, however, has not progressed sufficiently: the large number of operators currently available in KDD systems and interactions between these operators complicates successful data analysis. To help Data Miners we enhanced one of the most used open source data mining tools—RapidMiner—with semantic technologies. Specifically, we first annotated all elements involved in the Data Mining (DM) process—the data, the operators, models, data mining tasks, and KDD workflows—semantically using our eProPlan modelling tool that allows to describe operators and build a task/method decomposition grammar to specify the desired workflows embedded in an ontology. Second, we enhanced RapidMiner to employ these semantic annotations to actively support data analysts. Third, we built an Intelligent Discovery Assistant, eIda, that leverages the semantic annotation as well as HTN planning to automatically support KDD process generation. We found that the use of Semantic Web approaches and technologies in the KDD domain helped us to lower the barrier to data analysis. We also found that using a generic ontology editor overwhelmed KDD-centric users. We, therefore, provided them with problem-centric extensions to Protege. Last and most surprising, we found that our semantic modeling of the KDD domain served as a rapid prototyping approach for several hard-coded improvements of RapidMiner, namely correctness checking of workflows and quick-fixes, reinforcing the finding that even a little semantic modeling can go a long way in improving the understanding of a domain even for domain experts.

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

226 downloads since deposited on 05 Jun 2014
28 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Scopus Subject Areas:Physical Sciences > Theoretical Computer Science
Physical Sciences > General Computer Science
Language:English
Event End Date:29 May 2014
Deposited On:05 Jun 2014 07:24
Last Modified:24 Jan 2022 04:16
Publisher:Springer
Series Name:Lecture Notes in Computer Science
Number:8465
ISSN:0302-9743
ISBN:978-3-319-07443-6
Additional Information:The original publication is available at link.springer.com
OA Status:Green
Publisher DOI:https://doi.org/10.1007/978-3-319-07443-6_47
Related URLs:http://2014.eswc-conferences.org/
Other Identification Number:merlin-id:9300
  • Content: Accepted Version