Data Quality Management

Some recent news / activities:

In response to several requests, I finally completed the very first blog post on data quality management. In this post, I’ve mainly “set the scene” with the intention of publishing several more in-depth blog posts related to specific areas of my interest within data quality management, including but not limited to AI-augmented data quality management.

This November I was glad to deliver a keynote at The 3rd International Workshop on Democracy and AI (DemocrAI2023) held in conjunction with the 20th Pacific Rim International Conference on Artificial Intelligence (PRICAI2023), which I entitled “Unlocking the symbiotic relationship of Artificial Intelligence, Data Intelligence, and Collaborative Intelligence for Innovative Decision-Making and Problem Solving” (read more…)

Our paper “ISO/IEC 25012-based methodology for managing data quality requirements in the development of information systems: Towards data quality by design” became the most downloaded articles from Data & Knowledge Engineering (Elsevier) from July 2023 to currently December. It presents a methodology for Project Management of Data Quality Requirements Specification called DAQUAVORD aimed at eliciting DQ requirements arising from different users’ viewpoints. These specific requirements should serve as typical requirements, both functional and non-functional, at the time of the development of IS that takes Data Quality into account by default leading to smarter and collaborative development (read more…)

June 5 (2023), I was delighted to be invited to be a keynote at the HackCodeX Forum, delivering a keynote titled “Data Quality as a prerequisite for your business success: when should I start taking care of it?” in my hometown – Riga, Latvia. HackCodeX Forum is a one-day event where international experts share their experience and knowledge about emerging technologies and areas such as Artificial Intelligence, Security, Data Quality, Quantum Computing, Sustainability, Open Data, Privacy, Ethics, Digital Services (with a keynote from CEO of SK ID Solutions (read more …)

The new paper “Overlooked aspects of data governance: workflow framework for enterprise data deduplication” has been just presented at the IEEE-sponsored International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS2023), at which I also acted as part of Steering committee, Technical Program committee, as well as publicity chair (read more…)

Glad to announce the release of “Towards a data quality framework for EOSC”, which we have been hard at work on hard for several months as the Data Quality subgroup of the “FAIR Metrics and Data Quality” Task Force European Open ScienceCloud (EOSC) Association) – Carlo Lacagnina, Romain David, Anastasija Nikiforova, Mari Elisa Kuusniemi, Cinzia Cappiello, Oliver Biehlmaier, Louise Wright, Chris Schubert, Andrea Bertino, Hannes Thiemann, Richard Dennis (read more…)

Last week (November 2022) I had the pleasure of taking part in a Data Science Seminar titled “When, Why and How? The Importance of Business Intelligence“. In this seminar, organized by the Institute of Computer Science (University of Tartu) in cooperation with Swedbank, we (me, M. Gharib, J. Koitsalu, I. Artemtsuk) discussed the importance of BI with some focus on data quality. (read more…)

The EOSC Task Force on FAIR Metrics and Data Quality presents the whitepaper “Community-driven Governance of FAIRness Assessment: An Open Issue, an Open Discussion” (Mark D. Wilkinson; Susanna-Assunta Sansone; Eva Méndez; Romain David; Richard Dennis; David Hecker; Mari Kleemola; Carlo Lacagnina; Anastasija Nikiforova; Leyla Jael Castro) published by European Commission, of course, in an open access, here. (read more…)

This June I was glad and honored to take part in a four-day ONE Conference 2022 (co-organised by EFSA, ECDC, ECHA, EEA, EMA, and the European Commission’s Joint Research Centre (JRC)) as one of panelists of the “Turning open science into practice: causality as a showcase” thematic session as part of “ONE society” thematic track, where I served as an open data expert with a substantial background in data quality area (read more …)

The topic of data quality was developed over the years and resulted in a series of articles, as well as a doctoral thesis (actually, my master thesis defended with a distinction and recognized to be the best Master’s Thesis in Computer Science of the year (by ZIBIT), was also devoted to it), in which the user-centered data object-driven approach for data quality assessment was developed and applied to open data and, open government data (OGD) in particular, served as a domain of application. Later, data quality analysis became an integral part of my activities, including my experience with Latvian Biomedical Research and Study Centre (BBMRI-ERIC Latvian National Node), where I have inspected the current data ecosystem of both Latvian Biomedical Research and Study Centre and related data artifacts, including Latvian Genome database. This resulted in a set of guidelines towards efficient data management for heterogeneous data holders and exchangers developed as a deliverable of the HORIZON2020 INTEGROMED project (Deliverable 2.1 “Guidelines for the maintenance of efficient biobank, health register and research associated data“) and presented during the European Biobank Week 2021 (re ad more about it here…). It, however, serves as an input for the DECIDE – Development of a dynamic informed consent system for biobank and citizen science data management, quality control and integration. In addition, as part of the digitization of processes running within the LBMC, I have developed and introduced the set of e-surveys to substitute paper-based surveys and questionnaires, which were typically completed by hand by either patients, donors or done doctorates and then inserted in the database by manual rewriting of answers from paper-based surveys. Therefore, data tended to be incomplete, non-compliant with the database design, inaccurate errors are made, when the person reenter the data to the database from the paper-based survey or interpret the answer to made it compliant with it. E-surveys allow to ensure data integrity and in-built data quality checks, and compliance with the actual database and reduce the number of errors made within the process of data collection and insertion.

September 2021 I have joined European Open Science Cloud (EOSC) Association Task Force FAIR Metrics and Data Quality, which oversees the implementation of FAIR metrics for the EOSC, testing them with research communities to ensure they are fit for purpose.

Very briefly on the data object-driven approach to data quality evaluation mentioned above. It consists of three main components: (1) a data object, (2) data quality requirements, and (3) data quality evaluation process. As data quality is of relative nature, the data object and quality requirements are (a) use-case dependent and (b) defined by the user in accordance with his needs. All three components of the presented data quality model are described using graphical Domain Specific Languages (DSLs). A data quality model is executable, enabling data object scanning and detecting data quality defects and anomalies.

The proposed approach served as an input for several projects, including DQMBT – data quality model-based testing approach for information systems developed and applied for real-life projects where the e-scooter system was a central object thereby demonstrating the proposed approach in the context of IoT or IoV (Internet of Vehicles). This example, however, has served as an input to the current study on car-sharing services in Latvia, more specifically on optimization processes read more…

Source: https://eosc-portal.eu/ — *_{Source: European Open Science Cloud (EOSC)}*

The list of articles (both, journal articles and conference papers) is presented below.

Definition and Evaluation of Data Quality: User-Oriented Data Object-Driven Approach to Data Quality Assessment (Nikiforova)

ISO/IEC 25012-based methodology for managing data quality requirements in the development of information systems: Data quality by design (García, Nikiforova, Jiménez, Gonzalez, Torres, García)

Overlooked Aspects of Data Governance: Workflow Framework For Enterprise Data Deduplication (Azeroual, Nikiforova, Sha)

Towards a data quality framework for EOSC (Lacagnina, David, Nikiforova, Kuusniemi, Cappiello, Biehlmaie., Wright, Schubert, Bertino, Thiemann, Dennis)

Putting FAIR principles in the context of research information: FAIRness for CRIS and CRIS for FAIRness (Azeroual, Schöpfel, Pölönen, and Nikiforova) – best paper award

FAIRification of CRIS: A Review (Azeroual, Schöpfel, Pölönen, and Nikiforova)

A Record Linkage-Based Data Deduplication Framework with DataCleaner Extension (Azeroual, Jha, Nikiforova, Sha, Alsmirat, Jha)

Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS (Azeroual, Schopfel, Ivanovic, Nikiforova)

Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can Save Your Business (Nikiforova & Kozmina)

Data Quality Model-based Testing of Information Systems: the Use-case of E-scooters (Nikiforova, Bicevskis, Bicevska, Oditis) – best paper award nominee

Data Quality Model-based Testing of Information Systems (Bicevskis, Bicevska, Nikiforova, Oditis) – best paper award

User-Oriented Approach to Data Quality Evaluation (Nikiforova, Bicevskis, Bicevska, Oditis)

A Step Towards a Data Quality Theory (Bicevskis, Nikiforova, Bicevska, Oditis, Karnitis)

Towards Data Quality Runtime Verification (Bicevskis, Bicevska, Nikiforova, Oditis)

Analysis of Open Health Data Quality Using Data Object-Driven Approach to Data Quality Evaluation: Insights from a Latvian Context (Nikiforova)

An Extended Data Object-driven Approach to Data Quality Evaluation: Contextual Data Quality Analysis (Nikiforova, Bicevskis)

An Approach to Data Quality Evaluation (Bicevskis, Bicevska, Nikiforova, Oditis)

Data quality evaluation: a comparative analysis of company registers’ open data in four European countries (Bicevskis, Bicevska, Nikiforova, Oditis)

Open Data Quality Evaluation: A Comparative Analysis of Open Data in Latvia (Nikiforova)

Open Data Quality (Nikiforova)

I have also delivered some talks on this and related topics, including:

ICCNS2023 – International Conference on Intelligent Computing, Communication, Networking and Services – “Overlooked aspects of data governance: workflow framework for enterprise data deduplication“, June 19-22, Valencia, Spain
HackCodeX Forum – keynote with the talk “Data Quality as a prerequisite for you business success: when should I start taking care of it?” June 5, Riga, Latvia
14th International Conference on Knowledge Management and Information Systems (KMIS2022) – Putting FAIR principles in the context of research information: FAIRness for CRIS and CRIS for FAIRness (authors: Azeroual, Schopfel, Polonen, Nikiforova), October 24-26, Malta – best paper award
15th International Conference on Current Research Information Systems (CRIS2022) – Linking Research Information across data spaces: “Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS“, May 11-14, Croatia
The International Conference on Intelligent Data Science Technologies and Applications (IDSTA2021) – “Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can Save Your Business“, November 15-16, 2021. Tartu, Estonia (web-based)
Europe Biobank Week 2021, Section “Novel IT solutions, effective data storage, processing and analysis” – “Towards efficient data management of biobank, health register and research data: the use-case of BBMRI-ERIC Latvian National Node”
The Fourth International Conference on Multimedia Computing, Networking and Applications (MCNA2020), Spain (web-based): “Timeliness of Open Data in Open Government Data Portals Through Pandemic-related Data: a long data way from the publisher to the user“
The 7th International Conference on Internet of Things: Systems, Management and Security (IoTSMS 2020): “Data Quality Model-based Testing of Information Systems: the Use-case of E-scooters” (web-based) – best paper award nominee
The Third International Workshop on Data Science Engineering and its Applications (DSEA 2019) in conjunction with The Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS-2019): “A step towards a Data Quality Theory”, Spain
The 79th International Conference of the University of Latvia: “DQMBT or data quality model-based testing of information systems”
11th International Conference on e-Health in conjunction with 13th Multi Conference on Computer Science and Information Systems: “Analysis Of Open Health Data Quality Using Data Object-Driven Approach To Data Quality Evaluation: Insights From A Latvian Context”, Portugal
21st International Conference on Enterprise Information Systems (ICEIS 2019): “An Extended Data Object-driven Approach to Data Quality Evaluation: Contextual Data Quality Analysis”, Crete
The Second International Workshop on Data Science Engineering and its Applications (DSEA 2018) in conjunction with The Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS-2018): “An approach to Data Quality Evaluation“, Spain
13th International Baltic Conference on Databases and Information Systems (Baltic DB&IS 2018), Lithuania.
The 13th conference of Latvian Association of Open Technologies “Data-driven nation” – “Open data quality“

Anastasija Nikiforova, PhD

Public & Open data ecosystems and Data Quality Management researcher, Assistant Professor of Information Systems, European Open Science Cloud (EOSC) Task Force

Data Quality Management