24th Annual International Conference on Digital Government Research: from the former President of Poland & Nobel Prize laureate Lech Wałęsa to the 3rd edition of our workshop on HVD

The week was full of impressions (positive) from the the 24th Annual International Conference on Digital Government Research in the charming Gdańsk (Poland), which started with our workshop, followed by the keynote talk delivered by the former President of Poland & Nobel Prize laureate Lech Wałęsa, until the very last session & coming to my hometown and meeting John Malkovich there! Although this was already the 3rd working trip in the last 4 weeks and the 5th from the very beginning of June, during which I delivered two keynote lectures, presented two papers (with two more papers presented by my colleagues at other conferences), chaired the workshop with several more events & activities, it was still an absolutely great experience, where we finally had the 3rd edition of our workshop “Identification of high value dataset determinants: is there a silver bullet for efficient sustainability-oriented data-driven development? as part of dg.o2023, which brought more than 20 participants, with whom we jointly tried to understand:

💡What can be the country-specific HVD determinants (aspects)? Incl. who should be the expected beneficiary of the availability of HVD? what are the current approaches towards HVD determination?

💡What mechanisms or methods can be put in place to determine them?

💡Can this be done (semi-)automatically?

💡How a framework for determining country-specific HVD could look like?

As part of the workshop, we also validated the results of our Towards High-Value Datasets determination for data-driven development: a systematic literature review (Anastasija Nikiforova, Nina Rizun, Magdalena Ciesielska, Charalampos Alexopoulos, Andrea Miletič) paper we expect to present to EGOV community, which has been already named a “sound in the noise” (we work hard to correspond to this characteristic!). As part of the above, we verified whether the findings from the literature are relevant, valid & complete discussing:

💡What can be data-centered characteristic of HVD? (and should they be predefined?)
💡What are the expected characteristics of the HVD determination indicators, i.e. (1) ex-ante / ex-post / both?, (2) qualitative / quantitative / both? (3) Internal (such as usage statistics) / external (e.g., report, indices, charters) / both?, (4) SMART / not necessarily? Diving then into the above questions, as well as evaluating relevance of indicators identified previously (lit-re & current practices / ad-hoc approaches & previous workshops)

Many thanks to everyone who participated in the ICEGOV, ICOD or DGO workshops (more than 60 people in total), as well as thanks to Maria, who was part of the ICOD workshop!

Thanks to the organizers, including but not limited to the local organizing committee – Gdansk University, Digital Government Society, Emerald Publishing – for giving an opportunity to have a good time and to finally meet the colleagues in person (some of whom never met before despite a relatively long collaboration)!

Keynote at the 5th International Conference on Advanced Research Methods and Analytics (CARMA 2023)

June 28 I had the honor to participate in the opening of CARMA2023 – 5th International Conference on Advanced Research Methods and Analytics “Internet and Big Data in Economics and Social Sciences” delivering my keynote “Public data ecosystems in and for smart cities: how to make open / Big / smart / geo data ecosystems value-adding for SDG-compliant Smart Living and Society 5.0?” in the spectacular city of Sevilla, Spain 🇪🇸 🇪🇸 🇪🇸. What a honor to open the conference, immediately after the inaugural speech by organizers and sponsors, including representatives of Joint Research Center, European Commission (JRC), who even mentioned the topics I covered in my keynote (not limited to them, of course) as those that make this conference an event to attend and to learn from!!!

In this talk, as the title suggests, I:

  • elaborated on the concepts of public /open data (incl. OGD), smart city and SDG and how are they related?
  • introduced the concept of Society 5.0 and how is it related to open data?
  • and finally, and more importantly, public/ open data ecosystem – what it is? what does it consist of?

I then dived into (1) data-related aspects of the public data ecosystem, i.e. what are the data-related prerequisites for a sustainable and resilient data ecosystem? (2) data portal / platforms as entry points and how to make it sufficiently attractive for the target audience? (3) stakeholder engagement – how to involve the target audience? what are the benefits of their involvement? and some more things.

Public data ecosystem part was built around our “Transparency of open data ecosystems in smart cities: Definition and assessment of the maturity of transparency in 22 smart cities“, with some references to other studies such us Transparency-by-design: What is the role of open data portals?, “Timeliness of Open Data in Open Government Data Portals Through Pandemic-related Data: A long data way from the publisher to the user“, “Open government data portal usability: A user-centred usability analysis of 41 open government data portals“, which were previously noticed by the Living Library that recommends studies they see as the “signal in the noise” and the Open Data Institute.

For the data, apart of almost “classical things”, I referred to the topic of “high-value datasets” and dived into a taxonomy we presented in “Towards High-Value Datasets determination for data-driven development: a systematic literature review” (also recommended by the Living Library as the “sound in the noise”), enriched by the results of my earlier study “Towards enrichment of the open government data: a stakeholder-centered determination of High-Value Data sets for Latvia” as well as results of two international workshops we organized.

The part on the public / open data, smart city, SDG and Society 5.0 and how they are interrelated was, in turn, based on our Chapter “The Role of Open Data in Transforming the Society to Society 5.0: A Resource or a Tool for SDG-Compliant Smart Living?”, which was called by FIT Academy “a groundbreaking research”.

And for the engagement, it mostly was about the workshops, datathons, hackathons, data competitions, as we as a co-creation and how the co-creation ecosystem occurs, what are the prerequisites for this etc., incl. referencing to “Open data hackathon as a tool for increased engagement of Generation Z: to hack or not to hack?” and “The Role of Open Government Data and Co-creation in Crisis Management: Initial Conceptual Propositions from the COVID-19 Pandemic

CARMA is a forum for researchers and practitioners to exchange ideas and advances on how emerging research methods and sources are applied to different fields of social sciences as well as to discuss current and future challenges with main focus on the topics such as Internet and Big Data sources in economics and social sciences including Social media and public opinion mining, Web scraping, Google Trends and Search Engine data, Geospatial and mobile phone data, Open data and public data, Big Data methods in economics and social sciences such as Sentiment analysis, Internet econometrics, AI and Machine learning applications, Statistical learning, Information quality and assessment, Crowdsourcing, Natural Language processing, Explainability and interpretability, the applications of the above including but not limited to Politics and social media, Sustainability and development, Finance applications, Official statistics, Forecasting and nowcasting, Bibliometrics and sciencetometrics, Social and consumer behaviour, mobility patterns, eWOM and social media marketing, Labor market, Business analytics with social media, Advances in travel, tourism and leisure, Digital management, Marketing Intelligence analytics, Data governance, and Digital transition and global society, which, in turn, expects contributions in relation to Privacy and legal aspects, Electronic Government, Data Economy, Smart Cities, Industry adoption.

In addition to the regular sessions, poster session and two keynotes, a Special JRC session (EC) took place, during which Luca Barbaglia, Nestor Duch Brown, Matteo Sostero and Paolo Canfora presented projects they work on.

Great thanks goes to organizers and sponsors of CARMA2023 – Universidad de SevillaCátedra Metropol ParasolCátedra Digitalización Empresarial, IBMUniversitat Politècnica de ValènciaJoint Research Center – European Commission and Coca-Cola, who made this event a true success. Enjoyed this experience very much! Excellent venue! Great audience! ¡Muchas gracias!

References:

📢🚨⚠️Paper alert! Overlooked aspects of data governance: workflow framework for enterprise data deduplication

This time I would like to recommend for reading the new paper “Overlooked aspects of data governance: workflow framework for enterprise data deduplication” that has been just presented at the IEEE-sponsored International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS2023). This “just”, btw, means June 19 – the day after my birthday, i.e. so I decided to start my new year with one more conference and paper & yes, this means that again, as many of those who congratulated me were wishing – to find the time for myself, reach work-life balance etc., is still something I have to try to achieve, but this time, I decided to give a preference to the career over my personal life (what a surprise, isn’t it?) 🙂 Moreover, this is the conference, where I am also considered to be part of Steering committee, Technical Program committee, as well as publicity chair. During the conference, I also acted as a session chair of its first session, what I consider to be a special honor – for me the session was very smooth, interactive and insightful, of course, beforehand its participants & authors and their studies, which allowed us to establish this fruitful discussion and get some insights for our further studies (yes, I also got one beforehand one very useful idea for further investigation). Thank you all contributors, with special thanks to Francisco Bonilla Rivas, Bruck Wubete, Reem Nassar, Haitham Al Ajmi.

And I am also proud with getting one of four keynotes for this conference – prof. Eirini Ntoutsi from the Bundeswehr University Munich (UniBw-M), Germany, who delivered a keynote “Bias and Discrimination in AI Systems: From Single-Identity Dimensions to Multi-Discrimination“, which I heard during one of previous conferences I attended and decided that it is “must” for our conference as well – super glad that Eirini accepted our invitation! Here, I will immediately mention that other keynotes were excellent as well – Giancarlo Fortino (University of Calabria, Italy), Dofe Jaya (Computer Engineering Department, California State University, Fullerton, California, USA), Sandra Sendra (Polytechnic University of Valencia, Spain).

The paper I presented is authored in a team of three – Otmane Azeroual, German Centre for Higher Education Research and Science Studies (DZHW), Germany, myself – Anastasija Nikiforova, Faculty of Science and Technology, Institute of Computer Science, University of Tartu, Estonia & Task Force “FAIR Metrics and Data Quality”, European Open Science Cloud & Kewei Sha, College of Science and Engineering University of Houston Clear Lake, USA – very international team. So, what is the paper about? It is (or should be) clear that data quality in companies is decisive and critical to the benefits their products and services can provide. However, in heterogeneous IT infrastructures where, e.g., different applications for Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), product management, manufacturing, and marketing are used, duplicates, e.g., multiple entries for the same customer or product in a database or information system, occur. There can be several reasons for this (incl. but not limited due to the growing volume of data, incl. due to the adoption of cloud technologies, use of multiple different sources, the proliferation of connected personal and work devices in homes, stores, offices and supply chains), but the result of non-unique or duplicate records is a degraded data quality, which, in turn, ultimately leads to inaccurate analysis, poor, distorted or skewed decisions, distorted insights provided by Business Intelligence (BI) or machine learning (ML) algorithms, models, forecasts, and simulations, where the data form the input, and other data-driven activities such as service personalisation in terms of both their accuracy, trustworthiness and reliability, user acceptance / adoption and satisfaction, customer service, risk management, crisis management, as well as resource management (time, human, and fiscal), not to say about wasted resources, and employees, who are less likely trust the data and associated applications thereby affecting the company image. This, in turn, can lead to a failure of a project if not a business. At the same time, the amount of data that companies collect is growing exponentially, i.e., the volume of data is constantly increasing, making it difficult to effectively manage them. Thus, both ex-ante and ex-post deduplication mechanisms are critical in this context to ensure sufficient data quality and are usually integrated into a broader data governance approach. In this paper, we develop such a conceptual data governance framework for effective and efficient management of duplicate data, and improvement of data accuracy and consistency in medium to large data ecosystems. We present methods and recommendations for companies to deal with duplicate data in a meaningful way, while the presented framework is integrated into one of the most popular data quality tools – Data Cleaner.

In short, in this paper we:

  • first, present methods for how companies can deal meaningfully with duplicate data. Initially, we focus on data profiling using several analysis methods applicable to different types of datasets, incl. analysis of different types of errors, structuring, harmonizing, & merging of duplicate data;
  • second, we propose methods for reducing the number of comparisons and matching attribute values based on similarity (in medium to large databases). The focus is on easy integration and duplicate detection configuration so that the solution can be easily adapted to different users in companies without domain knowledge. These methods are domain-independent and can be transferred to other application contexts to evaluate the quality, structure, and content of duplicate / repetitive data;
  • finally, we integrate the chosen methods into the framework of Hildebrandt et al. [ref 2]. We also explore some of the most common data quality tools in practice, into which we integrate this framework.

After that, we test and validate the framework. The final refined solution provides the basis for subsequent use. It consists of detecting and visualizing duplicates, presenting the identified redundancies to the user in a user-friendly manner to enable and facilitate their further elimination.

With this paper we aim to support research in data management and data governance by identifying duplicate data at the enterprise level and meeting today’s demands for increased connectivity / interconnectedness, data ubiquity, and multi-data sourcing. In addition, the proposed conceptual data governance framework aims to provide an overview of data quality, accuracy and consistency to help practitioners approach data governance in a structured manner.

In general, not only technological solutions are needed that would identify / detect poor quality data and allow their examination and correction, or would ensure their prevention by integrating some controls into the system design, striving for “data quality by design” [ref3, ref4], but also cultural changes related to data management and governance within the organization. These two perspectives form the basis of the wealth business data ecosystem. Thus, the presented framework describes the hierarchy of people who are allowed to view and share data, rules for data collection, data privacy, data security standards, and channels through which data can be collected. Ultimately, this framework will help users be more consistent in data collection and data quality for reliable and accurate results of data-driven actions and activities.

Sounds interesting? Read the paper -> here (to be cited as: Azeroual, O., Nikiforova, A., Sha, K. (2023, June). Overlooked aspects of data governance: workflow framework for enterprise data deduplication. In 2023 International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS2023). IEEE (in print))

International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS2023) is collocated with The International Conference on Multimedia Computing, Networking and Applications (MCNA2023), which are sponsored by IEEE (IEEE Espana Seccion), Universitat Politecnica de Valencia, Al ain University. Great thanks to the organizers – Jaime Lloret, Universitat Politècnica de València, Spain & Yaser Jararweh, Jordan University of Science and Technology, Jordan & Marios C. Angelides, Brunel University London, UK & Muhannad Quwaider, Jordan University of Science and Technology, Jordan.

References:

Azeroual, O., Nikiforova, A., Sha, K. (2023, June). Overlooked aspects of data governance: workflow framework for enterprise data deduplication. In 2023 International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS2023). IEEE (in print).

Hildebrandt, K., Panse, F., Wilcke, N., & Ritter, N. (2017). Large-scale data pollution with Apache Spark. IEEE Transactions on Big Data, 6(2), 396-411

Guerra-García, C., Nikiforova, A., Jiménez, S., Perez-Gonzalez, H. G., Ramírez-Torres, M., & Ontañon-García, L. (2023). ISO/IEC 25012-based methodology for managing data quality requirements in the development of information systems: Towards Data Quality by Design. Data & Knowledge Engineering, 145, 102152.

Corrales, D. C., Ledezma, A., & Corrales, J. C. (2016). A systematic review of data quality issues in knowledge discovery tasks. Revista Ingenierías Universidad de Medellín, 15(28), 125-150.

💬💬💬 Contributed talk for QWorld Quantum Science Days 2023 (QSD 2023)

In the very last days of May 2023, I had yet another experience – I delivered a contributed talk at QWorld Quantum Science Days 2023 (QSD 2023) titled “Framework for understanding quantum computing use cases from a multidisciplinary perspective and future research directions” (Ukpabi, D.C., Karjaluoto, H., Botticher, A., Nikiforova, A., Petrescu, D.I., Schindler, P., Valtenbergs, V., Lehmann, L., & Yakaryılmaz, A), which, in fact, is based on the paper we made publicly available some time ago and developed it even earlier when together with Germany, Spain, Finland, Romania, and Latvia we built a consortia and submitted a project proposal to CHANSE call “Transformations: Social and Cultural Dynamics in the Digital Age”. We went there much far beyond my expectations, i.e. in fact, we were notified that this time we will not be granted the funding for the project at the very last stage, having gone through all those intermediate evaluation rounds, which were already fascinating news (at least for me). While working on the proposal and building our network, we conducted a preliminary analysis of the area, which then, regardless of the output of the application, we decided to continue and bring to at least some logical end. We like our result so decided to make it publicly available. And now, a few years from that, we submitted our work to QWorld Quantum Science Days 2023 (QSD 2023) and were accepted. It was a big surprise, and I, as the person delegated by our team to present our study, delivered this talk, where I finally familiarized the audience with our findings. What was my surprise when my talk, which followed immediately after the keynote “Let’s talk about Quantum; Societal readiness through science communication research” delivered on behalf of Quantum DELTA NL by Julia Cramer, was in the very similar direction? It is also worth mentioning a very interesting coincidence that while the keynote elaborated on the DELTA that stands for five major quantum hubs, namely Delft, Eindhoven, Leiden, Twente, Amsterdam, I was preparing the last things for my presentation located in the Delta building – it is the name of the building my office is located in. In both cases, no connection with COVID-19 😀

🤔 What is the paper about?

There has been increasing awareness of the tremendous opportunities inherent in quantum computing. It is expected that the speed and efficiency of quantum computing will significantly impact the Internet of Things, cryptography, finance, and marketing. Accordingly, there has been increased quantum computing research funding from national and regional governments and private firms. However, ❗❗❗ critical concerns regarding legal, political, and business-related policies germane to quantum computing adoption exist ❗❗❗

Since this is an emerging and highly technical domain, most of the existing studies focus heavily on the technical aspects of quantum computing. In contrast, our study highlights its practical and social uses cases, which are needed for the increased interest of governments. More specifically, our study offers a multidisciplinary review of quantum computing, drawing on the expertise of scholars from a wide range of disciplines whose insights coalesce into a framework that simplifies the understanding of quantum computing, identifies possible areas of market disruption and offer empirically based recommendations that are critical for forecasting, planning, and strategically positioning QCs for accelerated diffusion.

"Framework for understanding quantum computing use cases from a multidisciplinary perspective and future research directions" (Ukpabi, D.C., Karjaluoto, H., Botticher, A., Nikiforova, A., Petrescu, D.I., Schindler, P., Valtenbergs, V., Lehmann, L., & Yakaryılmaz, A)

To this end, we conducted a gray literature research, whose outputs were then structured in accordance with Dwivedi et al., 2021 (Dwivedi et al. (2021). Setting the future of digital and social media marketing research: Perspectives and research propositions. International Journal of Information Management, 59, 102168), which embodies three broad areas—environment, users, and application areas—and the dominant sub-themes presented in figure below. We found that for application areas, business and finance, renewable energy, medicine & pharmaceuticals, and manufacturing are now the hottest. While for environment, we found subdomains such as ecosystem, security, jurisprudence, institutional change & geopolitics. And for the users, nothing surprising – as typically, customers, firms, countries. We then dive into each of those areas, as well as later come up with the most popular topics, the most promising, and overlooked.

Sounds interesting? Read the paper here, find slides here, watch video here.

Quantum Science Days is an annual, international, and virtual scientific conference organized by QWorld (Association) to provide opportunities to the quantum community to present and discuss their research results at all levels (from short projects to thesis work to research publications), and to get to know each other. The third edition (QSD2023) included 7 invited speakers, 10 thematic talks on “Building an Open Quantum Ecosystem”, 31 contributed talks, an industrial demo session by Classiq, and a career talk on quantum. QSD2023 was sponsored by Unitary Fund & Classiq and supported by Latvian Quantum Initiative.

Qworld

🔖🔖🔖NEW book chapter: The Role of Open Data in Transforming the Society to Society 5.0: A Resource or a Tool for SDG-Compliant Smart Living?

 

This time I am glad to announce that the Chapter “The Role of Open Data in Transforming the Society to Society 5.0: A Resource or a Tool for SDG-Compliant Smart Living?” (Nikiforova, Alor Flores, Lytras), which is a part of the book “Smart Cities and Digital Transformation: Empowering Communities, Limitless Innovation, Sustainable Development and the Next Generation“, is finally publicly available! This time indeed, finally, since the Chapter was ready and accepted in 2021, if I remember correctly…. so it took some time, but hopefully, it is like a wine 🍷🍷🍷 or cognac 🥃🥃🥃 – becoming better with the time! In fact, a several weeks after it appeared online, it was already included in FIT Academy recommended reading as part of “…five outstanding articles by top experts from around the world” referring to it as “a groundbreaking research” – thanks a lot for this!!!

It was a hard work, especially for the editors of the book, who indeed did a great job in adapting to the situation and taking care of us – authors – as much as possible! Kudos! 

So, almost traditionally a few words about the content or “what is it about?“. Today we all know that open data and open government data are characterized by a number of economic, environmental, technological, innovative, and social benefits. They are seen as a significant contributor to the city’s transformation into smart city. This is all the more so when the society is on the border of Society 5.0, that is, shift from the Information Society (Society 4.0) to a “super smart society” or “society of imagination” (Society 5.0) takes place. However, the question constantly asked by open data experts is, what are the key factors to be met and satisfied in order to achieve promised benefits? The current trend of openness suggests that the principle of openness should be followed not only by data but also research, education, software, standard, hardware, etc., it should become a philosophy to be followed at different levels, in different domains. This should ensure greater transparency, eliminating inequalities, promoting, and achieving sustainable development goals (SDGs). I.e., the openness in both, data, science, technology (software or hardware) is considered as one of the keys for meeting SDGs, while supporting some of them “by default” simultaneously (general principles of open data covered by Open Data Charter applicable to all data to perceive and treat it as open data) and domain, which the open data represents. This was also emphasized at the 76th session of the United Nations General Assembly, highlighting that the openness contributes to the attainment of the United Nations Sustainable Development Goals (SDGs). Therefore, many agendas (sustainable development strategies, action plans) now have openness as a prerequisite. This chapter deals with concepts of open (government) data and Society 5.0 pointing to their interconnection, i.e., common objectives, providing some success stories of open data use in smart cities or transformation of cities toward smart cities, mapping them to the features of the Society 5.0. We believe that this trend develops a new form of society, which we refer to as “open data-driven society.” It forms a bridge from Society 4.0 to Society 5.0. This chapter attempts to identify the role of openness in promoting human-centric smart society, smart city, and smart living, incl. identifying and elaborating on both determinants or prerequisites capable of promoting the development of the Society 5.0 by means of open data and barriers, which stakeholders of different types may face on the way towards sustainable smart city and super smart society

Sounds catchy? Hope so! If yes, read the chapter here or find its preprint here.

Citation

Nikiforova, A., Flores, M.A.A. and Lytras, M.D. (2023), “The Role of Open Data in Transforming the Society to Society 5.0: A Resource or a Tool for SDG-Compliant Smart Living?”, Lytras, M.D., Housawi, A.A. and Alsaywid, B.S. (Ed.) Smart Cities and Digital Transformation: Empowering Communities, Limitless Innovation, Sustainable Development and the Next Generation, Emerald Publishing Limited, Bingley, pp. 219-252. https://doi.org/10.1108/978-1-80455-994-920231011