Towards data quality by design – ISO/IEC 25012-based methodology for managing DQ requirements in the development of IS

It is obvious that users should trust the data that are managed by software applications constituting the Information Systems (IS). This means that organizations should ensure an appropriate level of quality of the data they manage in their IS. Therefore, the requirement for the adequate level of quality of data to be managed by IS must be an essential requirement for every organization. Many advances have been done in recent years in software quality management both at the process and product level. This is also supported by the fact that a number of global standards have been developed and involved, addressing some specific issues, using quality models such as (ISO 25000, ISO 9126), those related to process maturity models (ISO 15504, CMMI), and standards focused mainly on software verification and validation (ISO 12207, IEEE 1028, etc.). These standards have been considered in worldwide for over 15 years.

However, awareness of software quality depends on other variables, such as the quality of information and data managed by application. This is recognized by SQUARE standards (ISO/IEC 25000), which highlight the need to deal with data quality as part of the assessment of the quality level of the software product, according to which “the target computer system also includes computer hardware, non-target software products, non-target data, and the target data, which is the subject of the data quality model”. This means that organizations should take into account data quality concerns when developing various software, as data is a key factor. To this end, we stress that such data quality concerns should be considered at the initial stages of software development, attending the “data quality by design” principle (with the reference to the “quality by design” considered relatively often with significantly more limited interest (if any) to “data quality” as a subset of the “quality” concept when referring to data / information artifacts).

The “data quality” concept is considered to be multidimensional and largely context dependent. For this reason, the management of specific requirements is a difficult task. Thus, the main objective of our new paper titled “ISO/IEC 25012-based methodology for managing data quality requirements in the development of information systems: Towards data quality by design” is to present a methodology for Project Management of Data Quality Requirements Specification called DAQUAVORD aimed at eliciting DQ requirements arising from different users’ viewpoints. These specific requirements should serve as typical requirements, both functional and non-functional, at the time of the development of IS that takes Data Quality into account by default leading to smarter and collaborative development.

In a bit more detail, we introduce the concept of Data Quality Software Requirement as a method to implement a Data Quality Requirement in an application. Data Quality Software Requirement is described as a software requirement aimed at satisfying a Data Quality Requirement. The justification for this concept lies in the fact that we want to capture the Data Quality Software Requirements that best match the data used by a user in each usage scenario, and later, originate the consequent Data Quality Software Requirements that will complement the normal software requirements linked to each of those scenarios. Addressing multiple Data Quality Software Requirements is indisputably a complex process, taking into account the existence of strong dependencies such as internal constraints and interaction with external systems, and the diversity of users. As a result, they tend to impact and show the consequences of contradictory overlaps on both process and data models.

In terms of such complexity and attempting to improve the developing efforts, we introduce DAQUAVORD, a Methodology for Project Management of Data Quality Requirements Specification, which is based on the Viewpoint-Oriented Requirements Definition (VORD) method, and the latest and most generally accepted ISO/IEC 25012 standard. It is universal and easily adaptable to different information systems in terms of both their nature, number and variety of actors and other aspects. The paper proposes both the concept of the proposed methodology and an example of its application, which is a kind of manual step-by-step guidance on how to use it to achieve smarter software development with data quality by design. This paper is a continuation of our previous study. This paper establishes the following research questions (RQs):

RQ1: What is the state of the art regarding the “data quality by design” principle in the area of software development? What are (if any) current approaches to data quality management during the development of IS?

RQ2: How the concepts of the Data Quality Requirements (DQR) and the Viewpoint-Oriented Requirements Definition (VORD) method should be defined and implemented in order to promote the “data quality by design” principle?

The first comprehensive approach to this problematic is presented in this paper, setting out the methodology for project management of the specification for data quality requirements. Given the relative nature of the concept of “data quality” and active discussions on the universal view on the data quality dimensions, we have based our proposal on the latest and most generally accepted ISO/IEC 25012 standard, thus seeking to achieve a better integration of this methodology with existing documentation and systems or projects existing in the organization. We suppose that this methodology will help Information System developers to plan and execute a proper elicitation and specification of specific data quality requirements expressed by different roles (viewpoints) that interact with the application. This can be assumed as a guide that analysts can obey when writing a Requirements Specification Document supplemented with Data Quality management. The identification and classification of data quality requirements at the initial stage makes it easier to developers to be aware of the quality of data to be implemented for each function during all development process of the application.

As future work thinking, we plan to consider the advantages provided by the Model Driven Architecture (MDA), focusing mainly on its capabilities of both abstraction and modelling characteristics. It will be much easier to integrate our results into the development of “Data Quality aware Information Systems” (DQ-aware-IS) with other software development methodologies and tools. This, however, is expected to expand the scope of the developed methodology and consider various feature related to data quality, including the development of a conceptual measure of data value, i.e., intrinsic value, as proposed in.

14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K): how it was and who got the Best Paper Award?

In this post I would like to briefly elaborate on a truly insightful 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), where I was honored to participate as a speaker, presenting our paper “Putting FAIR principles in the context of research information: FAIRness for CRIS and CRIS for FAIRness” (authors: Otmane Azeroual, Joachim Schopfel and Janne Polonen, and Anastasija Nikiforova), and as a chair of two absolutely amazing sessions, where live and fruitful discussions took place, which is a real indicator of the success of such! And spoiler, our paper was recognized as the Best Paper! (i.e., best paper award goes to… :))

IC3K consists of three subconferences, namely 14th International Conference on Knowledge Discovery and Information Retrieval (KDIR), 14th International Conference on Knowledge Engineering and Ontology Development (KEOD), and 14th International Conference on Knowledge Management and Information Systems (KMIS), where the latter is the one, to which my paper has been accepted, and also won the Best Paper Award – I know, this is a repetition, but I am glad to receive it, same as the euroCRIS community is proud for us – read more here…!

Briefly about our study, with which we mostly wanted to urge a call for action in the area of CRIS and their FAIRness. Of course, this is all about the digitization, which take place in various domain, including but not limited to the research domain, where it refers to the increasing integration and analysis of research information as part of the research data management process. However, it is not clear whether this research information is actually used and, more importantly, whether this information and data are of sufficient quality, and value and knowledge could be extracted from them. It is considered that FAIR principles (Findability, Accessibility, Interoperability, Reusability) represent a promising asset to achieve this. Since their publication (by one of the colleagues I work together in European Open Science Cloud), they have rapidly proliferated and have become part of both national and international research funding programs. A special feature of the FAIR principles is the emphasis on the legibility, readability, and understandability of data. At the same time, they pose a prerequisite for data and their reliability, trustworthiness, and quality. In this sense, the importance of applying FAIR principles to research information and respective systems such as Current Research Information Systems (CRIS, also known as RIS, RIMS), which is an underrepresented subject for research, is the subject of our study. What should be kept in mind is that the research information is not just research data, and research information management systems such as CRIS are not just repositories for research data. They are much more complex, alive, dynamic, interactive and multi-stakeholder objects. However, in the real-world they are not directly subject to the FAIR research data management guiding principles. Thus, supporting the call for the need for a ”one-stop-shop and register-once use-many approach”, we argue that CRIS is a key component of the research infrastructure landscape / ecosystem, directly targeted and enabled by operational application and the promotion of FAIR principles. We hypothesize that the improvement of FAIRness is a bidirectional process, where CRIS promotes FAIRness of data and infrastructures, and FAIR principles push further improvements to the underlying CRIS. All in all, three propositions on which we elaborate in our paper and invite  everyone representing this domain to think of, are:

1. research information management systems (CRIS) are helpful to assess the FAIRness of research data and data repositories;

2. research information management systems (CRIS) contribute to the FAIRness of other research infrastructure;

3. research information management systems (CRIS) can be improved through the application of the FAIR principles.

Here, we have raised a discussion on this topic showing that the improvement of FAIRness is a dual or bidirectional process, where CRIS promotes and contributes to the FAIRness of data and infrastructures, and FAIR principles push for further improvement in the underlying CRIS data model and format, positively affecting the sustainability of these systems and underlying artifacts. CRIS are beneficial for FAIR, and FAIR is beneficial for CRIS. Nevertheless, as pointed out by (Tatum and Brown, 2018), the impact of CRIS on FAIRness is mainly focused on the (1) findability (“F” in FAIR) through the use of persistent identifiers and (2) interoperability (“I” in FAIR) through standard metadata, while the impact on the other two principles, namely accessibility and reusability (“A” and “R” in FAIR) seems to be more indirect, related to and conditioned by metadata on licensing and access. Paraphrasing the statement that “FAIRness is necessary, but not sufficient for ‘open’” (Tatum and Brown, 2018), our conclusion is that “CRIS are necessary but not sufficient for FAIRness”.

This study differs significantly from what I typically talk about, but it was to contribute to it, thereby sharing the experience I gain in European Open Science Cloud (EOSC), and respective Task Force I am involved in – “FAIR metrics and data quality”. It also allowed me to provide some insights on what we are dealing with within this domain and how our activities contribute to the currently limited body of knowledge on this topic.

A bit about the sessions I chaired and topics raised within them, which were very diverse but equally relevant and interesting. I was kindly invited to chair two sessions, namely “Big Data and Analytics” and “Knowledge management Strategies and Implementations”, where the papers on the following topics were presented:

  • Decision Support for Production Control based on Machine Learning by Simulation-generated Data (Konstantin Muehlbauer, Lukas Rissmann, Sebastian Meissner, Landshut University of Applied Sciences, Germany);
  • Exploring the Test Driven Development of a Fraud Detection Application using the Google Cloud Platform (Daniel Staegemann, Matthias Volk, Maneendra Perera, Klaus Turowski, Otto-von-Guericke University Magdeburg, Germany) – this paper was also recognized as the best student paper;
  • Decision Making with Clustered Majority Judgment (Emanuele D’ajello , Davide Formica, Elio Masciari, Gaia Mattia, Arianna Anniciello, Cristina Moscariello, Stefano Quintarelli, Davide Zaccarella, University of Napoli Federico II, Copernicani, Milano, Italy.
  • Virtual Reality (VR) Technology Integration in the Training Environment Leads to Behaviour Change (Amy Rosellini, University of North Texas, USA)
  • Innovation in Boutique Hotels in Valletta, Malta: A Multi-level Investigation (Kristina, University of Malta, Malta)

And, of course, as is the case for each and every conference, the keynotes are panelists are those, who gather the highest number of attendees, which is obvious, considering the topic they elaborate on, as well as the topics they raise and discuss. IC3K is not an exception, and the conference started with a very insightful discussion on Current Data Security Regulations and the discussion on whether they Serve or rather Restrict the Application of the Tools and Techniques of AI. Each of three speakers, namely Catholijn Jonker, Bart Verheijen, and Giancarlo Guizzardi, presented their views considering the domain they represent. As a result, both were very different, but at the same time leading you to “I cannot agree more” feeling!

One of panelists – Catholijn Jonker (TU Delft) delivered then an absolutely exceptional keynote speech on Self-Reflective Hybrid Intelligence: Combining Human with Artificial Intelligence and Logic. Enjoyed not only the content, but also the style, where the propositions are critically elaborated on, pointing out that they are not indented to serve as a silver bullet, and the scope, as well as side-effects should be determined and considered. Truly insightful and, I would say, inspiring talk.

All in all, thank you, organizers – INSTICC (Institute for Systems and Technologies of Information, Control and Communication), for bringing us together!

Europe Biobank Week 2021

This is a short note about Europe Biobank Week 2021, which took place in an online mode this year during 8-10 November. EBW is jointly organized by ESBB (European, Middle Eastern & African Society for Biopreservation and Biobanking) and BBMRI-ERIC with this year’s theme “Biobanking for our Future – Opportunities Unlocked. The programme was plenty of very different events and opportunities, including rich programme of live presentations from high-level experts, and a collection of selected posters, where I was honored to be represented in two of them (14 main topics) as part of the Latvian Biomedical Research & Study centre team, where I work as an IT-expert.

One of them authored by me was presented within “Novel IT solutions, effective data storage, processing and analysis” section. This poster titled “Towards efficient data management of biobank, health register and research data: the use-case of BBMRI-ERIC Latvian National Node” (authors: Anastasija Nikiforova, Vita Rovīte, Laura Ansone) was devoted to the ongoing project (funded by the European Union under the Horizon 2020) called INTEGROMED – Integration of knowledge and biobank resources in comprehensive translational approach for personalized prevention and treatment of metabolic disorders, where some preliminary results of my activities on the inspecting and improving the ecosystem of the Latvian Biomedical Research and Study centre and then summarized and transformed into the set of guidelines towards efficient data management for heterogeneous data holders and exchangers, were presented.

European Biobank Week 2021, poster
“Towards efficient data management of biobank, health register and research data: the use-case of BBMRI-ERIC Latvian National Node” (authors: A. Nikiforova, V. Rovīte, L. Ansone)

Another poster titled “Development of a dynamic informed consent system for Latvian national biobank and citizen science data management, quality control and integration” (authors: Kante N., Nikiforova A., Kalēja J., Svandere A., Mezinska S., Rovīte V.) was presented under “Population-based cohorts – addressing global challenges for future generations” section and was dedicated to another project, which is funded by European Regional Development Fund (ERDF) – “DECIDE – Development of a dynamic informed consent system for biobank and citizen science data management, quality control and integration“.

“Development of a dynamic informed consent system for Latvian national biobank and citizen science data management, quality control and integration” (authors: Kante N., Nikiforova A., Kalēja J., Svandere A., Mezinska S., Rovīte V.)

This was another very nice experience!