The topic of data quality was developed over the years and resulted in a series of articles, as well as a doctoral thesis (actually, my master thesis defended with a distinction and recognized to be the best Master’s Thesis in Computer Science of the year (by ZIBIT), was also devoted to it), in which the user-centered data object-driven approach for data quality assessment was developed and applied to open data served as a domain of application.
The proposed approach served as an input for several projects, including DQMBT – data quality model-based testing approach for information systems developed and applied for real-life projects where the e-scooter system was a central object thereby demonstrating the proposed approach in the context of IoT or IoV (Internet of Vehicles). This example, however, has served as an input to the current study on car-sharing services in Latvia, more specifically on optimization processes read more…
Later, data quality analysis became an integral part of my activities, including my experience with Latvian Biomedical Research and Study Centre (BBMRI-ERIC Latvian National Node), where I have inspected the current data ecosystem of both Latvian Biomedical Research and Study Centre and related data artifacts, including Latvian Genome database. This resulted in a set of guidelines towards efficient data management for heterogeneous data holders and exchangers developed as a deliverable of the HORIZON2020 INTEGROMED project (Deliverable 2.1 “Guidelines for the maintenance of efficient biobank, health register and research associated data“) and presented during the European Biobank Week 2021 (read more about it here…). It, however, serves as an input for the DECIDE – Development of a dynamic informed consent system for biobank and citizen science data management, quality control and integration. In addition, as part of the digitization of processes running within the LBMC, I have developed and introduced the set of e-surveys to substitute paper-based surveys and questionnaires, which were typically completed by hand by either patients, donors or done doctorates and then inserted in the database by manual rewriting of answers from paper-based surveys. Therefore, data tended to be incomplete, non-compliant with the database design, inaccurate errors are made, when the person reenter the data to the database from the paper-based survey or interpret the answer to made it compliant with it. E-surveys allow to ensure data integrity and in-built data quality checks, and compliance with the actual database and reduce the number of errors made within the process of data collection and insertion.
September 2021 I have joined European Open Science Cloud (EOSC) Association Task Force FAIR Metrics and Data Quality, which oversees the implementation of FAIR metrics for the EOSC, testing them with research communities to ensure they are fit for purpose.
Very briefly on the data object-driven approach to data quality evaluation mentioned above. It consists of three main components: (1) a data object, (2) data quality requirements, and (3) data quality evaluation process. As data quality is of relative nature, the data object and quality requirements are (a) use-case dependent and (b) defined by the user in accordance with his needs. All three components of the presented data quality model are described using graphical Domain Specific Languages (DSLs). A data quality model is executable, enabling data object scanning and detecting data quality defects and anomalies.
The list of articles (both, journal articles and conference papers) is presented below.
I have also delivered some talks on this and related topics, including:
- 15th International Conference on Current Research Information Systems (CRIS2022) – Linking Research Information across data spaces: “Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS“, May 11-14, Croatia
- The International Conference on Intelligent Data Science Technologies and Applications (IDSTA2021) – “Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can Save Your Business“, November 15-16, 2021. Tartu, Estonia (web-based)
- Europe Biobank Week 2021, Section “Novel IT solutions, effective data storage, processing and analysis” – “Towards efficient data management of biobank, health register and research data: the use-case of BBMRI-ERIC Latvian National Node”
- The Fourth International Conference on Multimedia Computing, Networking and Applications (MCNA2020), Spain (web-based): “Timeliness of Open Data in Open Government Data Portals Through Pandemic-related Data: a long data way from the publisher to the user“
- The 7th International Conference on Internet of Things: Systems, Management and Security (IoTSMS 2020): “Data Quality Model-based Testing of Information Systems: the Use-case of E-scooters” (web-based)
- The Third International Workshop on Data Science Engineering and its Applications (DSEA 2019) in conjunction with The Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS-2019): “A step towards a Data Quality Theory”, Spain
- The 79th International Conference of the University of Latvia: “DQMBT or data quality model-based testing of information systems”
- 11th International Conference on e-Health in conjunction with 13th Multi Conference on Computer Science and Information Systems: “Analysis Of Open Health Data Quality Using Data Object-Driven Approach To Data Quality Evaluation: Insights From A Latvian Context”, Portugal
- 21st International Conference on Enterprise Information Systems (ICEIS 2019): “An Extended Data Object-driven Approach to Data Quality Evaluation: Contextual Data Quality Analysis”, Crete
- The Second International Workshop on Data Science Engineering and its Applications (DSEA 2018) in conjunction with The Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS-2018): “An approach to Data Quality Evaluation“, Spain
- 13th International Baltic Conference on Databases and Information Systems (Baltic DB&IS 2018), Lithuania.
- The 13th conference of Latvian Association of Open Technologies “Data-driven nation” – “Open data quality“
In addition, in 2021 I was honored to deliver two Guest Lectures, with one of them closely related the data quality. It was given to the students of the University of South-Eastern Norway (USN) and was mainly focused on the open data – its ecosystem, barriers, current and future trends in both worldwide and Norway context (see slides here), however with a strong emphasis on the data quality, which the audience found to be especially interesting (read more…)