ICEGOV2022 workshop: Identification of high-value dataset determinants: is there a silver bullet?

This year the 15th International Conference on Theory and Practice of Electronic Governance known as ICEGOV2022 will be focused on “Digital Governance for Social, Economic, and Environmental Prosperity“. And we – me, Charalampos Alexopoulos, Nina Rizun and Magdalena Ciesielska are glad to announce our own a community-based, participatory, interactive workshop aimed at identifying High-Value Dataset (HVD) determinants towards efficient sustainability-oriented data-driven development.

Briefly about the workshop, our motivation, our objective and why we want to make you a part of it…

Today, Open Government Data (OGD) are seen as one of the trends that can potentially benefit the economy, improve the quality, efficiency, and transparency of public services, as well as transform our lives contributing to efficient sustainability-oriented data-driven development. Their scope, as well as actors who can work with them, do not meet any restrictions. In addition to “classical” benefits such as improving the quality, efficiency, and transparency of public services, they are considered drivers and promoters of Industry 4.0 and Society 5.0 [1,2], including Smart cities trends. OGD is also a driver of economic growth, and, according to [3], the open data market size in 2020 was estimated at €184 billion and it is expected to grow in the coming years reaching €199.51 and €334.21 billion in 2025. However, the achievement of these benefits is closely linked to the “value” of the data, i.e. the extent to which the data provided by public agencies are interesting, useful and valuable for their reuse, creating value for society and the economy. High data availability however can disorient users when deciding which sources are best suited to their needs [4]. The practice demonstrates that the majority of data sets available on the OGD portals are not used, where only a few datasets create value for users [5], [6]. This is also in line with Quarati and Martino [4], who provided a snapshot on the use of 15 OGD portals, based on usage indicators available. This also applies to Latvia [7,8]. In other words, in order to gain benefit from the OGD, countries should open data cleverly, where not quantity, but quality and data value must be more important, since all benefits of the OGD can only be obtained if the data are re-used and transformed to value.

Here, the concept of “high-value datasets” comes, pointing to data that would create highest value to society and economy. The concept of “high-value data” comes into force here. High-value data are defined as the data “the re-use of which is associated with important benefits for society, the environment and the economy, in particular because of their suitability for the creation of value-added services, applications and new, high-quality and decent jobs, and of the number of potential beneficiaries of the value-added services and applications based on those datasets” [9]. Although the PSI directive is a step in this direction by announcing six categories [9], they appear to be generic and do not take into account the national perspective, i.e. the nature of these data sets will depend to a large extent on the country concerned [10,11].
It is therefore important to support the identification of high-value datasets, which would enhance the interest of users of the OGD by transforming data in innovative solutions and services. The research suggests that different perspectives appear in the literature to identify “high-value datasets” and there is no consensus on the most comprehensive, so a number of activities will be taken covering these perspectives but prior identified within the workshop.

This workshop expects to raise a discussion on the identification of high-value data sets for a common understanding of how this could be done in general terms, i.e. what possible activities will lead to better understanding and clearer vision of what are the most valuable data sets for the society and economics of a particular country and how they can be identified (how? who? etc.). The topic under consideration is very important these days, given that the opening up of data sets with high potential for their use and re-use is expected to facilitate creation of new products or services with positive economic and social impact [12]. However, identifying these data is a complicated task, particularly where country-specific data sets should be identified.

This workshop is a step in this direction and is a continuation of the paper presented at ICEGOV2021 [13], where a first step in this direction was taken by conducting a survey of individual users and SME of Latvia aimed at clarifying their level of awareness about the existence of the OGD, their usage habits, as well as the overall level of satisfaction with the value of the OGD and their potential. This time we aim to develop the framework for identification of high-value datasets (and their determinants) as a result of comprehensive study conducted jointly with participants of ICEGOV. All in all, the objective of the workshop is to raise awareness of and establish a network of the major stakeholders around the HVD issue, allow each participant to think about how and whether the determination of HVD is taking place in their country and how this can be improved with the help of portal owners, data publishers, data owners and citizens. Our main motivation is that, as members of the ICEGOV community, we could jointly answer the following questions representing the objectives of the workshop:

  1. How can the “value” of open data be defined?
  2. What are the current indicators  for determining the value of data? Can they be used to identify valuable datasets to be opened? What are the country-specific high-value determinants (aspects) participants can think of?
  3. How high-value datasets can be identified? What mechanisms and/ or methods should be put in place to allow their determination? Could it be there an automated way to gather information for HVD? Can they be identified by third parties, e.g. researchers, enthusiasts AND potential data publishers, i.e. data owners?
  4. What should be the scope of the framework, i.e. who should be the target audience who should be made aware of the HVD applying this framework? public officials / servants? data owners? Intermediaries? (discussion with participants OR direction for our discussion depending on the participants and their profile).

More precisely, the following “procedure” is expected to be followed:

  • STEP 0 (conducted by participants (not mandatory)): participants are invited to get familiar with open data portals of their country (higher coverage, i.e. of more than their own country, is welcome) by inspecting the current state-of-the-art in terms of both the content – data available, functionality with particular interest of HVD determination-related features (if any) including citizen-engagement-oriented features, features allowing to track the current interest of users etc.
  • STEP 1: A brief introduction to the current state-of-the art [approximately 45 minutes]: How HVD are seen by the PSI Directive and what tasks are set for countries regarding determination and opening HVD, how countries are coping with this (both from grey literature and from personal experience on Latvia), what approaches and methods for determining HVDs are known and why is there no uniform method / framework? A brief overview of the results of a survey of individual users and small and medium-sized businesses (SME) of Latvia on their view regarding the current state of the data, i.e. in which extent they meet their needs, and what data might be useful for them, and how their availability would affect their willingness to use these data. Overview of Deloitte report on HVD. What is the methodology used? What are the indicators used? What are the results of the study?
  • STEP 2: Considering the diversity of perceptions of the term “value” (depending on the domain, actor etc.), the discussion in the form of brainstorming (idea generation) is expected to be held providing as many definitions as possible, which are then used to provide a more comprehensive definition(s) considering different perspectives (domain- and actor-related) [approximately 30-45 minutes]
  • STEP 3: Discussion on current methods / mechanisms to determine the current value of the data and determining HVD in the form of brainstorming [approximately 20-30 minutes]
  • STEP 4: Idea generation on potential methods / mechanisms to determine the current value of the data and determining HVD in the form of brainstorming [approximately 20-30 minutes]
  • STEP 5: Iterative filtering of features, methods, approaches that could constitute the framework for determination of high value datasets in the form of DELPHI-like analysis [approximately 45 minutes]
  • STEP 6: Agenda for future research, networking [approximately 30 minutes]

This is a community-based, participatory, interactive workshop aimed at engaging participants – instead of asking participants to write a paper to be later presented during the workshop in the form of sit-and-listen, we expect to establish a lively and interesting discussion of novel ideas, answering existing questions and raising new ones. The audience of the workshop is ICEGOV participants without restriction on the domain they represent, affiliation, interests, knowledge and experience. Both OGD experts and those who are not familiar with OGD are welcome.

Join us this October (4 – 7 October 2022)!

References:

  1. Bargiotti, L., De Keyzer, M., Goedertier, S., & Loutas, N. (2014). Value based prioritisation of Open Government Data investments. European Public Sector Information Platform.
  2. Bertot, J. C., McDermott, P., & Smith, T. (2012, January). Measurement of open government: Metrics and process. In 2012 45th Hawaii International Conference on System Sciences (pp. 2491-2499). IEEE.
  3. Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information
  4. European Comission, The Digital Economy and Society Index (DESI), online, https://ec.europa.eu/digital-single-market/en/digital-economy-and-society-index-desi, last accessed: 7.04.2021
  5. Gagliardi, D., Schina, L., Sarcinella, M. L., Mangialardi, G., Niglia, F., & Corallo, A. (2017). Information and communication technologies and public participation: interactive maps and value added for citizens. Government Information Quarterly, 34(1), 153-166.
  6. Huyer, E., Blank, M. (2020). Analytical Report 15: High-value datasets: understanding the perspective of data providers. Luxembourg: Publications Office of the European Union, 2020 doi:10.2830/363773
  7. Kampars, J., Zdravkovic, J., Stirna, J., & Grabis, J. (2020). Extending organizational capabilities with Open Data to support sustainable and dynamic business ecosystems. Software and Systems Modeling, 19(2), 371-398.
  8. Kotsev, A., Cetl, V., Dusart, J., & Mavridis, D. (2018). Data-driven Economies in Central and Eastern Europe
  9. Kucera, J., Chlapek, D., Klímek, J., & Necaský, M. (2015). Methodologies and Best Practices for Open Data Publication. In DATESO (pp. 52-64).
  10. McBride, K., Toots, M., Kalvet, T., & Krimmer, R. (2019). Turning Open Government Data into Public Value: Testing the COPS Framework for the Co-creation of OGD-Driven Public Services. In Governance Models for Creating Public Value in Open Data Initiatives (pp. 3-31). Springer, Cham.
  11. Nikiforova, A., & Lnenicka, M. (2021). A multi-perspective knowledge-driven approach for analysis of the demand side of the Open Government Data portal. Government Information Quarterly, 101622
  12. Ruijer, E., Détienne, F., Baker, M., Groff, J., & Meijer, A. J. (2020). The politics of open government data: Understanding organizational responses to pressure for more transparency. The American review of public administration, 50(3), 260-274
  13. Nikiforova, A. (2021, October). Towards enrichment of the open government data: a stakeholder-centered determination of High-Value Data sets for Latvia. In 14th International Conference on Theory and Practice of Electronic Governance (pp. 367-372).


Research and Innovation Forum 2022: panel organizer, speaker, PC member, moderator and Best panel moderator award

As I wrote earlier, this year I was invited to organize my own panel session within the Research and Innovation Forum (Rii Forum). This invitation was a follow-up on several articles that I have recently published (article#1, article#2, article#3) and a Chapter to be published in “Big data & decision-making: how big data is relevant across fields and domains” (Emerald Studies in Politics and Technology) I was developing at that time. I was glad to accept this invitation, but I did not even think about how many roles I will act in Rii Forum and how many emotions I will experience. So, how was it?

First, what was my panel about? It was dedicated to data security entitled “Security of data storage facilities: is your database sufficiently protected?” being a part of the track called “ICT, safety, and security in the digital age: bringing the human factor back into the analysis“.

My own talk was titled “Data security as a top priority in the digital world: preserve data value by being proactive and thinking security first“, which makes it to be a part of the panel described above. In this talk I elaborated on the main idea of the panel, referring to an a study I recently conducted. In short, today, in the age of information and Industry 4.0, billions of data sources, including but not limited to interconnected devices (sensors, monitoring devices) forming Cyber-Physical Systems (CPS) and the Internet of Things (IoT) ecosystem, continuously generate, collect, process, and exchange data. With the rapid increase in the number of devices and information systems in use, the amount of data is increasing. Moreover, due to the digitization and variety of data being continuously produced and processed with a reference to Big Data, their value, is also growing. As a result, the risk of security breaches and data leaks. The value of data, however, is dependent on several factors, where data quality and data security that can affect the data quality if the data are accessed and corrupted, are the most vital. Data serve as the basis for decision-making, input for models, forecasts, simulations etc., which can be of high strategical and commercial / business value. This has become even more relevant in terms of COVID-19 pandemic, when in addition to affecting the health, lives, and lifestyle of billions of citizens globally, making it even more digitized, it has had a significant impact on business. This is especially the case because of challenges companies have faced in maintaining business continuity in this so-called “new normal”. However, in addition to those cybersecurity threats that are caused by changes directly related to the pandemic and its consequences, many previously known threats have become even more desirable targets for intruders, hackers. Every year millions of personal records become available online. Moreover, the popularity of IoTSE decreased a level of complexity of searching for connected devices on the internet and easy access even for novices due to the widespread popularity of step-by-step guides on how to use IoT search engine to find and gain access if insufficiently protected to webcams, routers, databases and other artifacts. A recent research demonstrated that weak data and database protection in particular is one of the key security threats. Various measures can be taken to address the issue. The aim of the study to which this presentation refers is to examine whether “traditional” vulnerability registries provide a sufficiently comprehensive view of DBMS security, or whether they should be intensively and dynamically inspected by DBMS holders by referring to Internet of Things Search Engines moving towards a sustainable and resilient digitized environment. The study brings attention to this problem and make you think about data security before looking for and introducing more advanced security and protection mechanisms, which, in the absence of the above, may bring no value.

Other presentations delivered during this session were “Information Security Risk Awareness Survey of non-governmental Organization in Saudi Arabia”, “Fake news and threats to IoT – the crucial aspects of cyberspace in the times of cyber war” and “Minecraft as a Tool to Enhance Engagement in Higher Education” – both were incredibly interesting, and all three talks were delivered by females, where only the moderator of the session was a male researcher, which he found to be very specific, given the topic and ICT orientation – not a very typical case 🙂 But, nevertheless, we managed to have a great session and a very lively and fruitful discussion, mostly around GDPR-related questions, which seems to be one of the hottest areas of discussion for people representing different ICT “subbranches”. The main question that we discussed was – is the GDPR more a supportive tool and a “great thing” or rather a “headache” that sometimes even interferes with development.

In addition, shortly before the start of the event, I was asked to become a moderator of the panel “Business in the era of pervasive digitalization“. Although, as you may know, this is not exactly in line with my area of expertise, it is in line with what I am interested in. This is not surprising, since both management, business, the economics are very closely connected and dependent on ICT. Moreover, they affect ICT, thereby pointing out the critical areas that we as IT-people need to refer to. All in all, we had a great session with excellent talks and lively discussion at the end of the session, where we discussed different session-related topics, shared our experience, thoughts etc. Although it was a brilliant experience, there is one thing that made it even better… A day later, a ceremony was held where the best contributions of the forum were announced and I was named the best panel moderator as a recognition of “the academic merit, quality of moderation, scheduling, and discussion held during the panel”!!!

These were wonderful three days of the forum with very positive emotions and so many roles – panel organizer, speaker / presenter, program committee member and panel moderator with the cherry on the cake and such a great end of the event. Thank you Research and Innovation Forum!!! Even being at home and participating online, you managed to give us an absolute amazing experience and even the feeling that we were all together in Athens!

Google.org supported programme by Riga TechGirls and my participation in it as a speaker and lead mentor for the digital development workshop on Information and data literacy

This February I got yet another experience by participating in a programme launched by Riga TechGirls and supported by Google.org (“Google Impact challenge” grant), in addition to local supporters such as the Ministry of Education and Science of Latvia, the Ministry of Culture, Riga city council (Rīgas Dome), titled “Human on technology” for more than 2000 Latvian teachers with the aim of disrupting technophobia and provide them with digital skills that are “must-have” in this digital world/ era. I have acted as both the lecturer and the lead mentor for the digital development workshop held as a part of the “Information and data literacy” module.

Hope the knowledge and experience I have shared will be beneficial and improve their lives, work-related and daily activities, and now they will be able “to train the “Google””. This, however is a reference to the title of my talk (and “How to train your dragon?” animated film) since I was asked to speak about working with information and entitled my lecture Work with information or how to train your “Google”?” During this lecture we discussed tips&tricks for data and information search, data analysis, fact checking in the digital environment, spoke about the “art” of searching for information in different browsers – how to create search requests, how to search more precisely so that searches are more accurate, including but not limited to the build of queries with searching operators for advanced search. And of course, we spoke about how the search engine works, i.e. what is the process of searching the information from the crawlers to indexing and sorting results returned to the end-user and different algorithms they are based on, and covered a rich list of search engines – Google, Bing, Yahoo, Ecosia, DuckDuckGo, Yandex, Baidu, Seznam and many more by trying to understand what is the difference between them and how they can be classified (i.e. security- and safety- oriented, regional search engines etc.).

The second meeting with the audience took place a bit later as the last activity of the “Information and data literacy” module, where I acted as the lead mentor of this digital development workshop. Here I delivered a lecture (mostly of practical nature), followed by splitting participants in rooms, where they were asked to complete tasks we have prepared to them and discuss questions they could have (both related and not related to these tasks, thereby providing the support to resolve their daily problems), where 11 female mentors, including myself, were actively involved. We worked hard, but it seems we were able to help combating challenges teachers face on a daily basis in their daily activities (from data sorting and organization of email to backup creation and their further maintenance), and that is great!!!

During this workshop we have discussed how to structure your data more wisely, effectively and efficiently with their further re-use in both local and cloud environments with the focus on OneDrive and Google Drive, which are used by teachers more frequently. We have covered a lot of tips&tricks for both the creation of their own pre-defined systems, which I have recommended to develop keeping in mind FAIR principles (this was an attempt to provide them with a brief overview of these principles, when and how they should be used etc. and why both FAIR and openness, including but not limited to open science, matter) and filling them with the information found applying previous knowledge acquired during the lecture I have delivered before (the use of different search engines (at least just to test), building of queries and use of operators to limit the resulting set and make it more search query-compliant etc.). We have also referred to other curious methods and techniques and took a brief overview on stuff like extensions for browsers, including those allowing video download from Youtube, Google Lens, data export and import “from-to and between” cloud storage and local system, image and video search in different search engines, verification of facts, searching for academic and scientific literature and many more…

It was an amazing experience of working with teachers – the target audience of this programme – and even more pleasant to hear that they will (continuously) apply these knowledge and practical experience gained during this weekend at their workplace (and outside it). We continue receiving their comments that they have started doing this immediately during the workshop and continue doing this today as the first activity to be done when arriving at the workplace AND everything works as expected or even better!!! It is the best assessment we could hope for!

I am very grateful for all those comments left for both the Riga TechGirls team and even more grateful for those left for me – I am very happy that both my lecture and the following hand-outs were both easy to understand & track (this audience was something really new for me in terms of both the nature – teachers, and amount – around 1 000 people (90 Zoom rooms for practical assignments and discussions) during the day in three time slots), and valuable at the same time!!! I am very glad to hear that teachers would likely meet me again – I will be glad to meet you again, too!!!

Riga TechGirls, mentors and supportive words for attendees

Thank you, Riga TechGirls team – the first community in Latvia aimed at educating and inspiring women and girls in IT, empowering them to be architects of the future – for your invitation and this valuable experience!