UT & Swedbank Data Science Seminar “When, Why and How? The Importance of Business Intelligence”

Last week I had the pleasure of taking part in a Data Science Seminar titled “When, Why and How? The Importance of Business Intelligence. In this seminar, organized by the Institute of Computer Science  (University of Tartu) in cooperation with Swedbank, we (me, Mohammad Gharib, Jurgen Koitsalu, Igor Artemtsuk) discussed the importance of BI with some focus on data quality. More precisely, 2 of 4 talks were delivered by representatives of the University of Tartu and were more theoretical in nature, where we both decided to focus our talks on data quality (for my talk, however, this was not the main focus this time), while another two talks were delivered by representatives of Swedbank, mainly elaborating on BI – what it can give, what it already gives, how it is achieved and much more. These talks were followed by a panel moderated by prof. Marlon Dumas.

In a bit more detail…. In my presentation I talked about:

  • Data warehouse vs. data lake – what are they and what is the difference between them?” – in a very few words – structured vs unstructured, static vs dynamic (real-time data), schema-on-write vs schema on-read, ETL vs ELT. With further elaboration on What are their goals and purposes? What is their target audience? What are their pros and cons? 
  • Is the Data warehouse the only data repository suitable for BI?” – no, (today) data lakes can also be suitable. And even more, both are considered the key to “a single version of the truth”. Although, if descriptive BI is the only purpose, it might still be better to stay within data warehouse. But, if you want to either have predictive BI or use your data for ML (or do not have a specific idea on how you want to use the data, but want to be able to explore your data effectively and efficiently), you know that a data warehouse might not be the best option.
  • So, the data lake will save my resources a lot, because I do not have to worry about how to store /allocate the data – just put it in one storage and voila?!” – no, in this case your data lake will turn into a data swamp! And you are forgetting about the data quality you should (must!) be thinking of!
  • But how do you prevent the data lake from becoming a data swamp?” – in short and simple terms – proper data governance & metadata management is the answer (but not as easy as it sounds – do not forget about your data engineer and be friendly with him [always… literally always :D) and also think about the culture in your organization.
  • So, the use of a data warehouse is the key to high quality data?” – no, it is not! Having ETL do not guarantee the quality of your data (transform&load is not data quality management). Think about data quality regardless of the repository!
  • Are data warehouses and data lakes the only options to consider or are we missing something?“– true! Data lakehouse!
  • If a data lakehouse is a combination of benefits of a data warehouse and data lake, is it a silver bullet?“– no, it is not! This is another option (relatively immature) to consider that may be the best bit for you, but not a panacea. Dealing with data is not easy (still)…

In addition, in this talk I also briefly introduced the ongoing research into the integration of the data lake as a data repository and data wrangling seeking for an increased data quality in IS. In short, this is somewhat like an improved data lakehouse, where we emphasize the need of data governance and data wrangling to be integrated to really get the benefits that the data lakehouses promise (although we still call it a data lake, since a data lakehouse, although not a super new concept, is still debated a lot, including but not limited to, on the definition of such).

However, my colleague Mohamad Gharib discussed what DQ and more specifically data quality requirements, why they really matter, and provided a very interesting perspective of how to define high quality data, which further would serve as the basis for defining these requirements.

All in all, although we did not know each other before and had a very limited idea of what each of us will talk about, we all admitted that this seminar turned out to be very coherent, where we and our talks, respectively, complemented each other, extending some previously touched but not thoroughly elaborated points. This allowed us not only to make the seminar a success, but also to establish a very lively discussion (although the prevailing part of this discussion took place during the coffee break – as it usually happens – so, unfortunately, is not available in the recordings, the link to which is available below).

The recordings are available here.

Latvian Open Data Hackathon for pupils 2022 – winners are announced!

During the last month, I have been a mentor of the Latvian Open Data Hackathon and an idea generator for pupils, organized by the Latvian Open Technologies Association with the support of DATI Group  , E-Klase, Latvijas Kultūras akadēmija / Latvian Academy of Culture, Vides aizsardzības un reģionālās attīstības ministrija (VARAM)/ Ministry of Environmental Protection and Regional Development of Republic of Latvia and others.

This year the main topic of the hackathon was cultural heritage! Within a month, 36 teams from 126 participants from all over Latvia developed their ideas and prototypes, 10 teams reached the final after a round of semi-final presentations of their solutions to us – the mentor team (of course, we worked with the assigned teams in previous weeks as well).  Here, we not only evaluated these ideas, but also provided them with yet another portion of feedback and suggestions for improving the idea or prototype for its further presentation in the final, where the jury will finally decide who the winner is.

Here I should note that as usually (I am a permament mentor of this hackathon) the participants surprised me very much both with the diversity of ideas and in very many times with their technical knowledge and skills (AI, crowdsourcing, gamification to name just a few) – just wow!

And last week it happened – we finally found out who are the winners – Kultūrkults as the best idea in the respective category (idea generator), and 417 Expectations Failed as the hackathon winner!

Congratulations to everyone on the successful efforts to promote the cultural and historical heritage! Also, congrats the whole society on having such a responsible and passionate youth to their culture and history!

Repeating what I already at the closing ceremony, I really want to believe that all the teams that participated in the hackathon will develop and implement their ideas regardless of the outcome of the hackathon – you are all winners for us! Keep going towards your goal!

14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K): how it was and who got the Best Paper Award?

In this post I would like to briefly elaborate on a truly insightful 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), where I was honored to participate as a speaker, presenting our paper “Putting FAIR principles in the context of research information: FAIRness for CRIS and CRIS for FAIRness” (authors: Otmane Azeroual, Joachim Schopfel and Janne Polonen, and Anastasija Nikiforova), and as a chair of two absolutely amazing sessions, where live and fruitful discussions took place, which is a real indicator of the success of such! And spoiler, our paper was recognized as the Best Paper! (i.e., best paper award goes to… :))

IC3K consists of three subconferences, namely 14th International Conference on Knowledge Discovery and Information Retrieval (KDIR), 14th International Conference on Knowledge Engineering and Ontology Development (KEOD), and 14th International Conference on Knowledge Management and Information Systems (KMIS), where the latter is the one, to which my paper has been accepted, and also won the Best Paper Award – I know, this is a repetition, but I am glad to receive it, same as the euroCRIS community is proud for us – read more here…!

Briefly about our study, with which we mostly wanted to urge a call for action in the area of CRIS and their FAIRness. Of course, this is all about the digitization, which take place in various domain, including but not limited to the research domain, where it refers to the increasing integration and analysis of research information as part of the research data management process. However, it is not clear whether this research information is actually used and, more importantly, whether this information and data are of sufficient quality, and value and knowledge could be extracted from them. It is considered that FAIR principles (Findability, Accessibility, Interoperability, Reusability) represent a promising asset to achieve this. Since their publication (by one of the colleagues I work together in European Open Science Cloud), they have rapidly proliferated and have become part of both national and international research funding programs. A special feature of the FAIR principles is the emphasis on the legibility, readability, and understandability of data. At the same time, they pose a prerequisite for data and their reliability, trustworthiness, and quality. In this sense, the importance of applying FAIR principles to research information and respective systems such as Current Research Information Systems (CRIS, also known as RIS, RIMS), which is an underrepresented subject for research, is the subject of our study. What should be kept in mind is that the research information is not just research data, and research information management systems such as CRIS are not just repositories for research data. They are much more complex, alive, dynamic, interactive and multi-stakeholder objects. However, in the real-world they are not directly subject to the FAIR research data management guiding principles. Thus, supporting the call for the need for a ”one-stop-shop and register-once use-many approach”, we argue that CRIS is a key component of the research infrastructure landscape / ecosystem, directly targeted and enabled by operational application and the promotion of FAIR principles. We hypothesize that the improvement of FAIRness is a bidirectional process, where CRIS promotes FAIRness of data and infrastructures, and FAIR principles push further improvements to the underlying CRIS. All in all, three propositions on which we elaborate in our paper and invite  everyone representing this domain to think of, are:

1. research information management systems (CRIS) are helpful to assess the FAIRness of research data and data repositories;

2. research information management systems (CRIS) contribute to the FAIRness of other research infrastructure;

3. research information management systems (CRIS) can be improved through the application of the FAIR principles.

Here, we have raised a discussion on this topic showing that the improvement of FAIRness is a dual or bidirectional process, where CRIS promotes and contributes to the FAIRness of data and infrastructures, and FAIR principles push for further improvement in the underlying CRIS data model and format, positively affecting the sustainability of these systems and underlying artifacts. CRIS are beneficial for FAIR, and FAIR is beneficial for CRIS. Nevertheless, as pointed out by (Tatum and Brown, 2018), the impact of CRIS on FAIRness is mainly focused on the (1) findability (“F” in FAIR) through the use of persistent identifiers and (2) interoperability (“I” in FAIR) through standard metadata, while the impact on the other two principles, namely accessibility and reusability (“A” and “R” in FAIR) seems to be more indirect, related to and conditioned by metadata on licensing and access. Paraphrasing the statement that “FAIRness is necessary, but not sufficient for ‘open’” (Tatum and Brown, 2018), our conclusion is that “CRIS are necessary but not sufficient for FAIRness”.

This study differs significantly from what I typically talk about, but it was to contribute to it, thereby sharing the experience I gain in European Open Science Cloud (EOSC), and respective Task Force I am involved in – “FAIR metrics and data quality”. It also allowed me to provide some insights on what we are dealing with within this domain and how our activities contribute to the currently limited body of knowledge on this topic.

A bit about the sessions I chaired and topics raised within them, which were very diverse but equally relevant and interesting. I was kindly invited to chair two sessions, namely “Big Data and Analytics” and “Knowledge management Strategies and Implementations”, where the papers on the following topics were presented:

  • Decision Support for Production Control based on Machine Learning by Simulation-generated Data (Konstantin Muehlbauer, Lukas Rissmann, Sebastian Meissner, Landshut University of Applied Sciences, Germany);
  • Exploring the Test Driven Development of a Fraud Detection Application using the Google Cloud Platform (Daniel Staegemann, Matthias Volk, Maneendra Perera, Klaus Turowski, Otto-von-Guericke University Magdeburg, Germany) – this paper was also recognized as the best student paper;
  • Decision Making with Clustered Majority Judgment (Emanuele D’ajello , Davide Formica, Elio Masciari, Gaia Mattia, Arianna Anniciello, Cristina Moscariello, Stefano Quintarelli, Davide Zaccarella, University of Napoli Federico II, Copernicani, Milano, Italy.
  • Virtual Reality (VR) Technology Integration in the Training Environment Leads to Behaviour Change (Amy Rosellini, University of North Texas, USA)
  • Innovation in Boutique Hotels in Valletta, Malta: A Multi-level Investigation (Kristina, University of Malta, Malta)

And, of course, as is the case for each and every conference, the keynotes are panelists are those, who gather the highest number of attendees, which is obvious, considering the topic they elaborate on, as well as the topics they raise and discuss. IC3K is not an exception, and the conference started with a very insightful discussion on Current Data Security Regulations and the discussion on whether they Serve or rather Restrict the Application of the Tools and Techniques of AI. Each of three speakers, namely Catholijn Jonker, Bart Verheijen, and Giancarlo Guizzardi, presented their views considering the domain they represent. As a result, both were very different, but at the same time leading you to “I cannot agree more” feeling!

One of panelists – Catholijn Jonker (TU Delft) delivered then an absolutely exceptional keynote speech on Self-Reflective Hybrid Intelligence: Combining Human with Artificial Intelligence and Logic. Enjoyed not only the content, but also the style, where the propositions are critically elaborated on, pointing out that they are not indented to serve as a silver bullet, and the scope, as well as side-effects should be determined and considered. Truly insightful and, I would say, inspiring talk.

All in all, thank you, organizers – INSTICC (Institute for Systems and Technologies of Information, Control and Communication), for bringing us together!

ICEGOV2022: 4 insightful days and four roles – participant / attendee, author / presenter, workshop chair and Best Paper Awards nominee (part 2)

In previous post I already shared my impressions with ICEGOV ICEGOV2022 conference – 15th International Conference on Theory and Practice of Electronic Governance, which took place in a very specific place – Guimarães that is considered the birthplace of Portugal, providing some general insight on how it was, elaborated a bit on paper I presented there and the fact that it was nominated to the Best Paper Awards. Thus, this post I dedicate to one particular role I played, i.e. workshop chair.

If you actively follow me, you probably remember that some time ago I already posted that our workshop titled “Identification of high-value dataset determinants: is there a silver bullet?” organized by me, Charalampos Alexopoulos, Nina Rizun and Magdalena Ciesielska was accepted for ICEGOV2022. So now we finally brought it alive! Our workshop was among 7 accepted workshops, organized by such prominent organizations as UNDESA – United Nations Department of Economic and Social Affairs, European Commission, UNESCO – United Nations Educational, Scientific and Cultural Organization and several more, which is just “wow”!

It took some time since as you might remember, the very first steps in this topic I took a year ago, i.e. this workshop is a continuation of the paper I presented at ICEGOV2021 – Towards enrichment of the open government data: a stakeholder-centered determination of High-Value Data sets for Latvia, which, in turn, was something what I did as a response to the request I received from my government, which was curious about this topic in the light of Open Data Directive (previously Public Sector Information (PSI) Directive). It was a simple one study, which was based on a survey of individual users and SME of Latvia aimed at clarifying their level of awareness about the existence of the OGD, their usage habits, as well as the overall level of satisfaction with the value of the OGD and their potential. Now the topic became even more topical, considering that even European Open Data Maturity Report is updating its methodology, as one of the new aspects including HVD. This, in turn, led to the fact that our workshop was even included in the list of upcoming events they suggest to consider and attend (see screenshots of schedule :)).

To say in a few words, in this workshop we initiated a discussion about the value of open data, and, more precisely, the concept of high value data(sets) and determinants / indicators that could allow not only identify them among existing data, but rather to identify, i.e., even if they are not previously published.

Together with our participants, we spoke about:
💡How can the “value” of open data be defined?
💡What are the current indicators for determining the value of data? Can they be used to identify valuable datasets to be opened? What are the country-specific high-value determinants (aspects)?
💡How high-value datasets can be identified? What mechanisms and/ or methods should be put in place to allow their determination?
💡Could it be there an automated way to gather information for HVD?
💡Can they be identified by third parties, e.g. researchers, enthusiasts AND potential data publishers, i.e. data owners?
💡What should be the scope of the framework?

The main point to be said on the mode of this workshop that it was a community-based, participatory, interactive workshop. Although it is clear that on the last day of the conference (the day after the official closing ceremony) all those registered for this workshop could not gather, it was still a valuable experience and we managed to have a nice event full of discussions with the participants, who got actively involved, which is especially pleasant, i.e. we managed to avoid sit-and-listen mode!!!

All in all, we had a nice event, where at least the first step in the direction we selected, were taken and some initial feedback was gathered.

Again, thank you, of Guimarães, Portugal, and, of course, organizers of ICEGOV2022 – University of Minho (Universidade do Minho) and UNU-eGOV – United Nations University!

ICEGOV2022: 4 insightful days and four roles – participant / attendee, author / presenter, workshop chair and Best Paper Awards nominee (part 1)

This October (2022) I had a pleasure and honor to spend four absolutely insightful days at ICEGOV2022 conference – 15th International Conference on Theory and Practice of Electronic Governance, which took place in a very specific place – Guimarães that is considered the birthplace of Portugal. The conference took place under the “Digital Governance for Social, Economic, and Environmental Prosperity” moto and was organized by organized by the University of Minho (Universidade do Minho) and UNU-eGOV – United Nations University.

Just to start, the conference was indeed something very special starting with the very first seconds since the conference was opened by United Nations Secretary-General António Guterres. 6 exceptional keynote talks were delivered during 3 days of the conference, while the last day was fully dedicated to workshops, which were 7 – one of them organized by me and my colleagues, triggered by the study I presented at ICEGOV in 2021 (although only in a virtual / online mode).

All in all, this made it exceptionally honorable to play not one, not two or even three, but four roles. In other words, I was not only a regular participant / attendee or even just an author presenting the paper, but also a workshop chair and even a Best Paper Awards nominee.

Let me start with the reflection on the paper I presented here, since it is also very special for me, considering how it was developed. In other words, it was a joint paper we wrote together with my colleague Anneke Zuiderwijk as part of my 6-months long research visit to the University of Delft, Faculty of Technology, Policy and Management. Even more, it was the topic we discussed with many colleagues from TU Delft, when I was kindly invited to participate in an ICT colloquium, where we had an opportunity to talk about it (among other things such as my general research interests). Moreover, we got this paper accepted exactly at time, when my role of visiting researcher at Delft University of Technology was close to the end.

And before I will briefly elaborate on it, just to make you curious about it, let me mention that it was recognized as one of the best papers, which was nominated to ICEGOV2022 Best Paper Awards in its category. More precisely, it was in top-3 best papers among 61 papers! And although another paper got this award, we still consider it as a small victory! And citing the awards committee “the goal of ICEGOV Best Paper Awards is to acknowledge excellent research” isn’t that a win? ✌️✌️✌️

So… let me now provide a brief insight on the paper (finally). The paper itself is a reflection on the current ongoing research and is titled “Barriers to openly sharing government data: towards an open data-adapted innovation resistance theory”.

To say in a few words, the study itself aims to develop an Open Government Data-adapted Innovation Resistance Theory model to empirically identify predictors affecting public agencies’ resistance to openly sharing government data. Here we want to understand:
💡what are functional and behavioural factors that facilitate or hamper opening government data by public organizations?
💡does IRT provide a new and more complete insight compared to more traditional UTAUT and TAM? – IRT has not been applied in this domain, yet, so we are checking whether it should be considered, or rather those models we are familiar so much are the best ones?
💡and additionally – does the COVID-19 pandemic had an [obvious/significant] effect on the public agencies in terms of their readiness or resistance to openly share government data?
Based on a review of the literature on both IRT research and barriers associated with open data sharing by public agencies, we developed an initial version of the model, which was presented here at ICEGOV2022. Here I should immediately express my huge gratitude to the audience and very positive feedback I received after the session. At the same time, considering that many of these compliments came from people representing , TU Delft, I feel even more belongness to it, despite even have not been there physically during the above-mentioned stay (thanks to pandemic).

Taking a step back to the research – now, we plan to conduct exploratory interviews in multiple countries, preferably of different maturity levels (Estonia, Latvia, Netherlands, Italy (?), Belgium (?) – who else?), to refine the model. And once the model is refined, we will validate it to study the resistance of public authorities to openly sharing government data in a quantitative study.

📢📢📢 By the way, in case you are interested in this research and would like to get involved, we are now seeking for people who could conduct exploratory interviews in their countries, so in case you are such person, let us know even if your country is listed above – the interview protocol is already ready and we are about to start these interviews. The more countries will be involved, the more “universal” model we will be able to bring to this world!

In the meantime, find the paper (and cite as) -> Nikiforova A., Zuiderwijk A. (2022) Barriers to openly sharing government data: towards an open data-adapted innovation resistance theory, In 15th International Conference on Theory and Practice of Electronic Governance (ICEGOV 2022). Association for Computing Machinery, New York, NY, USA, 215–220, https://doi.org/10.1145/3560107.3560143

And just to close this post, and considering very positive impressions I had on them, let me at least list keynotes we had, since the organizers paid a very special attention to them, same as later – we as participants to their actual talks. All in all, ICEGOV2022 invited 6 keynotes we truly enjoyed – starting with the opening keynote delivered by Portuguese Secretary of State for Digitalisation and Administrative Modernisation Mário Campolargo, continuing with the following keynotes, which in many cases were later complemented with plenaries. These keynotes were very diverse in nature, more precisely – “Harnessing multilateralism for digital governance development?” by Cristina Duarte (Under-Secretary-General and Special Adviser on Africa to the United Nations Secretary-General), Digital transformation across countries and continents by Tony Shannon (Department of Public Expenditure & Reform, Dublin, Ireland), “AI+X: building a transformation agenda” by Theresa A. Pardo (University at Albany (SUNY)), “What is digital humanism?” by Walter Gehr (Federal Ministry for European and International Affairs), and “Reinforcing Interoperability Policy in the EU: Interoperable Europe” by Veronica Gaffey (European Commission (DG DIGIT)). Truly massive set of high-quality keynotes!

Thank you, of Guimarães, Portugal, and, of course, organizers of ICEGOV2022 – University of Minho (Universidade do Minho) and UNU-eGOV – United Nations University!

P.S. do not forget to read another post on the workshop I had at ICEGOV2022!