Data Quality Management

Some recent news / activities:

In response to several requests, I finally completed the very first blog post on data quality management. In this post, I’ve mainly “set the scene” with the intention of publishing several more in-depth blog posts related to specific areas of my interest within data quality management, including but not limited to AI-augmented data quality management.


This November I was glad to deliver a keynote at The 3rd International Workshop on Democracy and AI (DemocrAI2023) held in conjunction with the 20th Pacific Rim International Conference on Artificial Intelligence (PRICAI2023), which I entitled “Unlocking the symbiotic relationship of Artificial Intelligence, Data Intelligence, and Collaborative Intelligence for Innovative Decision-Making and Problem Solving” (read more…)


Our paper “ISO/IEC 25012-based methodology for managing data quality requirements in the development of information systems: Towards data quality by design” became the most downloaded articles from Data & Knowledge Engineering (Elsevier) from July 2023 to currently December. It presents a methodology for Project Management of Data Quality Requirements Specification called DAQUAVORD aimed at eliciting DQ requirements arising from different users’ viewpoints. These specific requirements should serve as typical requirements, both functional and non-functional, at the time of the development of IS that takes Data Quality into account by default leading to smarter and collaborative development (read more…)


June 5 (2023), I was delighted to be invited to be a keynote at the HackCodeX Forum, delivering a keynote titled “Data Quality as a prerequisite for your business success: when should I start taking care of it?” in my hometown – Riga, Latvia. HackCodeX Forum is a one-day event where international experts share their experience and knowledge about emerging technologies and areas such as Artificial Intelligence, Security, Data Quality, Quantum Computing, Sustainability, Open Data, Privacy, Ethics, Digital Services (with a keynote from CEO of SK ID Solutions (read more …)


The new paper “Overlooked aspects of data governance: workflow framework for enterprise data deduplication” has been just presented at the IEEE-sponsored International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS2023), at which I also acted as part of Steering committee, Technical Program committee, as well as publicity chair (read more…)


Glad to announce the release of “Towards a data quality framework for EOSC”, which we have been hard at work on hard for several months as the Data Quality subgroup of the “FAIR Metrics and Data Quality” Task Force European Open ScienceCloud (EOSC) Association) – Carlo Lacagnina, Romain David, Anastasija Nikiforova, Mari Elisa Kuusniemi, Cinzia Cappiello, Oliver Biehlmaier, Louise Wright, Chris Schubert, Andrea Bertino, Hannes Thiemann, Richard Dennis (read more…)


Last week (November 2022) I had the pleasure of taking part in a Data Science Seminar titled “When, Why and How? The Importance of Business Intelligence. In this seminar, organized by the Institute of Computer Science (University of Tartu) in cooperation with Swedbank, we (me, M. Gharib, J. Koitsalu, I. Artemtsuk) discussed the importance of BI with some focus on data quality. (read more…)


The EOSC Task Force on FAIR Metrics and Data Quality presents the whitepaper “Community-driven Governance of FAIRness Assessment: An Open Issue, an Open Discussion” (Mark D. Wilkinson; Susanna-Assunta Sansone; Eva Méndez; Romain David; Richard Dennis; David Hecker; Mari Kleemola; Carlo Lacagnina; Anastasija Nikiforova; Leyla Jael Castro) published by European Commission, of course, in an open access, here. (read more…)


This June I was glad and honored to take part in a four-day ONE Conference 2022 (co-organised by EFSA, ECDC, ECHA, EEA, EMA, and the European Commission’s Joint Research Centre (JRC)) as one of panelists of the “Turning open science into practice: causality as a showcase” thematic session as part of “ONE society” thematic track, where I served as an open data expert with a substantial background in data quality area (read more …)


The topic of data quality was developed over the years and resulted in a series of articles, as well as a doctoral thesis (actually, my master thesis defended with a distinction and recognized to be the best Master’s Thesis in Computer Science of the year (by ZIBIT), was also devoted to it), in which the user-centered data object-driven approach for data quality assessment was developed and applied to open data and, open government data (OGD) in particular, served as a domain of application. Later, data quality analysis became an integral part of my activities, including my experience with Latvian Biomedical Research and Study Centre (BBMRI-ERIC Latvian National Node), where I have inspected the current data ecosystem of both Latvian Biomedical Research and Study Centre and related data artifacts, including Latvian Genome database. This resulted in a set of guidelines towards efficient data management for heterogeneous data holders and exchangers developed as a deliverable of the HORIZON2020 INTEGROMED project (Deliverable 2.1 “Guidelines for the maintenance of efficient biobank, health register and research associated data“) and presented during the European Biobank Week 2021 (read more about it here…). It, however, serves as an input for the DECIDE – Development of a dynamic informed consent system for biobank and citizen science data management, quality control and integration. In addition, as part of the digitization of processes running within the LBMC, I have developed and introduced the set of e-surveys to substitute paper-based surveys and questionnaires, which were typically completed by hand by either patients, donors or done doctorates and then inserted in the database by manual rewriting of answers from paper-based surveys. Therefore, data tended to be incomplete, non-compliant with the database design, inaccurate errors are made, when the person reenter the data to the database from the paper-based survey or interpret the answer to made it compliant with it. E-surveys allow to ensure data integrity and in-built data quality checks, and compliance with the actual database and reduce the number of errors made within the process of data collection and insertion.

September 2021 I have joined European Open Science Cloud (EOSC) Association Task Force FAIR Metrics and Data Quality, which oversees the implementation of FAIR metrics for the EOSC, testing them with research communities to ensure they are fit for purpose.

Very briefly on the data object-driven approach to data quality evaluation mentioned above. It consists of three main components: (1) a data object, (2) data quality requirements, and (3) data quality evaluation process. As data quality is of relative nature, the data object and quality requirements are (a) use-case dependent and (b) defined by the user in accordance with his needs. All three components of the presented data quality model are described using graphical Domain Specific Languages (DSLs). A data quality model is executable, enabling data object scanning and detecting data quality defects and anomalies.

The proposed approach served as an input for several projects, including DQMBT – data quality model-based testing approach for information systems developed and applied for real-life projects where the e-scooter system was a central object thereby demonstrating the proposed approach in the context of IoT or IoV (Internet of Vehicles). This example, however, has served as an input to the current study on car-sharing services in Latvia, more specifically on optimization processes read more…

Source: https://eosc-portal.eu/
Source: European Open Science Cloud (EOSC)

The list of articles (both, journal articles and conference papers) is presented below.

I have also delivered some talks on this and related topics, including: