An approach on ETL attached data quality management

Autoren Christian Lettner
Reinhard Stumptner
Karl-Heinz Bokesch
Editoren L. Bellatreche
M. K. Mohania
TitelAn approach on ETL attached data quality management
BuchtitelData Warehousing and Knowledge Discovery - Proc. DaWaK 2014
Typin Konferenzband
VerlagSpringer
SerieLecture Notes in Computer Science
Band8646
ISBN978-3-319-10159-0
MonatSeptember
Jahr2014
Seiten1-8
SCCH ID#1406
Abstract

This contribution introduces an approach on ETL attached Data Quality Management by means of an autonomous Data Quality Monitoring System. The Data Quality Monitor can be attached (via light-weight connectors) to already implemented ETL processes and allows to quantify data quality and to suggest measures if the quality of a particular data package falls below a certain limit for instance. Furthermore, the long-term vision of this approach is to correct corrupted data (semi-)automatically according to user defined Data Quality Rules. The Data Quality Monitor can be attached to an ETL process by defining "snapshot points", where data samples which should be validated are collected and by introducing "approval points", where an ETL process can be interrupted in case of corrupted input data. As the Data Quality Monitor is an autonomous module which is attached to instead of embedded into ETL processes, this approach supports the division of work between ETL developers and special data quality engineers.