CFP: 3rd International Workshop on Advanced Data Systems Management, Engineering, and Analytics (MegaData): where the Edge meets the Cloud

December 15, 2022December 15, 2022Anastasija Nikiforova Leave a comment

On behalf of the organizers, I sincerely invite you to consider submitting the results of your recent research to The Third International Workshop on Advanced Data Systems Management, Engineering, and Analytics (MegaData), which will be held in conjunction with the 23^rd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID2023).

The MegaData objective is to bring together researchers, practitioners, system administrators, system programmers, and others interested in sharing and presenting their perspectives on the effective management of big data systems. The focus of the workshop is on a novel and practical, systems-oriented work, where MegaData, whose moto this year is “MegaData: where the Edge meets the Cloud”, offers an opportunity to showcase the latest advances in this area and discuss and identify future directions and challenges in management and engineering of big data systems.

MegaData covers the area of Big Data operations (management, engineering, and analytics) within Cloud and Edge computing models. It aims to report on the advances and trends in Big Data deployment architectures from both the infrastructure and application levels. Papers presenting recent results, research issues, practical applications, case studies, and industrial implementations are welcome.

Specific topics of interest include, but are not limited, to the following:

Resource management and scheduling mechanisms for data systems
Auto Scaling and elastic scaling approaches and mechanisms
Data governance and privacy of “data in motion” and “data at rest” over edge/cloud
Emerging Data deployment models in IoT, IoT-to-Cloud, Edge/fog
Federated Learning and edge intelligence for big data systems
Advances data storage models, including object stores and key-value stores
Techniques for data integrity, availability, reliability, and fault tolerance
Big Data workflows (data management, data wrangling, automated workflows)
Data pipeline (data lake to analytics, new data stream architectures, edge/fog, cloud enabled solutions)
High-performance Data Analytics applications
Adaptive offloading techniques among Fog, Edge, and Cloud Computing

Important dates

Submission open: December 1st, 2022
Paper submission Deadline: January 15th, 2023
Acceptance notification: February 10th, 2023
Camera-ready submission: March 17th, 2023
Conference Dates: May 1st, 2023

We invite original research papers that have not been previously published and are not currently under review for publication elsewhere. Submitted papers should be no longer than 8 pages (including references and appendices) in two-column IEEE template format. Papers need to be submitted through the EasyChair submission portal.

Best papers presented at the workshop will be selected, and the corresponding authors will be invited to submit an extended version of their papers for possible publications in: Special Issue on Emerging Topics in Big Data and Edge Intelligence on Springer Cluster Computing (IF 2.3)

Submit the paper and meet our team in Bangalore, India in May, 2023!

With best wishes,

MegaData organizers – Yaser Jararweh, Feras M. Awaysheh, Moath Jarrah, Anastasija Nikiforova, Sadi Alawadi

Our – EOSC TF “FAIR Metrics and Data Quality” – whitepaper “Community-driven Governance of FAIRness Assessment: An Open Issue, an Open Discussion” is published by European Commission!

December 9, 2022December 28, 2022Anastasija Nikiforova Leave a comment

Proud to be part of the EOSC Task Force on FAIR Metrics and Data Quality and present our whitepaper “Community-driven Governance of FAIRness Assessment: An Open Issue, an Open Discussion” (Mark D. Wilkinson; Susanna-Assunta Sansone; Eva Méndez; Romain David; Richard Dennis; David Hecker; Mari Kleemola; Carlo Lacagnina; Anastasija Nikiforova; Leyla Jael Castro) published by European Commission, of course, in an open access, here.

“Although FAIR Research Data Principles are targeted at and implemented by different communities, research disciplines, and research stakeholders (data stewards, curators, etc.), there is no conclusive way to determine the level of FAIRness intended or required to make research artefacts (including, but not limited to, research data) Findable, Accessible, Interoperable, and Reusable.

The FAIR Principles cover all types of digital objects, metadata, and infrastructures. However, they focus their narrative on data features that support their reusability. FAIR defines principles, not standards, and therefore they do not propose a mechanism to achieve the behaviours they describe in an attempt to be technology/implementation neutral.

FAIR is evolving in some expected and some unexpected ways. FAIR “Reusability” sub-principle R1.3 states that “(meta)data should meet domain-relevant community standards,” which predicts a proliferation of FAIR interpretations by individual communities as they select their preferred approach to FAIRness. Similarly, as expected, there is an active movement around the adaptation of the FAIR Principles to digital objects other than data (e.g., software and workflows), again with individual communities interpreting what FAIRness means in these expanded contexts. However, there have also been attempts to expand the FAIR Principles themselves in recent years, including features of digital objects beyond reusability, including popularity (reuse/citation), reproducibility, reliability, data quality, etc. All of this is occurring with no overall coordination or planning.

A range of FAIR assessment metrics and tools have been designed that measure FAIRness. Unfortunately, the same digital objects assessed by different tools often exhibit widely different outcomes because of these independent interpretations of FAIR. This results in confusion among the publishers, the funders, and the users of digital research objects. Moreover, in the absence of a standard and transparent definition of what constitutes FAIR behaviours, there is a temptation to define existing approaches as being FAIR-compliant rather than having FAIR define the expected
behaviours. While it is anticipated that communities will define domain-specific FAIR metrics and tests, it is desirable to avoid “gaming the system” and have broadly agreed-upon approaches to FAIRness that do not favour a specific implementation of technology.

These observations suggest a growing need to align the different interpretations of the FAIR Principles. However, this whitepaper does not suggest that the FAIR Principles themselves require governance. Indeed, the document argues that the Principles should remain untouched. Specialised communities should extend/edit those Principles to adapt and make them more relevant to their community and their specific research outcome intended to be FAIR.

This whitepaper identifies three high-level stakeholder categories -FAIR decision and policymakers, FAIR custodians, and FAIR practitioners – and provides examples outlining specific stakeholders’ (hypothetical but anticipated) needs. It also examines possible models for governance based on the existing peer efforts, standardisation bodies, and other ways to acknowledge specifications and potential benefits. This whitepaper can serve as a starting point to foster an open discussion around FAIRness governance and the mechanism(s) that could be used to implement it, to be trusted, broadly representative, appropriately scoped, and sustainable”

Cite as: Mark D. Wilkinson, Susanna-Assunta Sansone, Eva Méndez, Romain David, Richard Dennis, David Hecker, Mari Kleemola, Carlo Lacagnina, Anastasija Nikiforova, & Leyla Jael Castro. (2022). Community-driven Governance of FAIRness Assessment: An Open Issue, an Open Discussion [version 1; peer review: awaiting peer review]. Open Res Europe 2022, 2:146 (https://doi.org/10.12688/openreseurope.15364.1)

Want to know more? Read more here!

Data Science Seminar - UT&Swedbank - Data lake vs Data warehouse vs Data lakehouse

UT & Swedbank Data Science Seminar “When, Why and How? The Importance of Business Intelligence”

November 16, 2022October 18, 2023Anastasija Nikiforova 1 Comment

Last week I had the pleasure of taking part in a Data Science Seminar titled “When, Why and How? The Importance of Business Intelligence“. In this seminar, organized by the Institute of Computer Science (University of Tartu) in cooperation with Swedbank, we (me, Mohammad Gharib, Jurgen Koitsalu, Igor Artemtsuk) discussed the importance of BI with some focus on data quality. More precisely, 2 of 4 talks were delivered by representatives of the University of Tartu and were more theoretical in nature, where we both decided to focus our talks on data quality (for my talk, however, this was not the main focus this time), while another two talks were delivered by representatives of Swedbank, mainly elaborating on BI – what it can give, what it already gives, how it is achieved and much more. These talks were followed by a panel moderated by prof. Marlon Dumas.

Data Science Seminar - UT&Swedbank - GIGO

Data Science Seminar - UT&Swedbank - panel

In a bit more detail…. In my presentation I talked about:

“Data warehouse vs. data lake – what are they and what is the difference between them?” – in a very few words – structured vs unstructured, static vs dynamic (real-time data), schema-on-write vs schema on-read, ETL vs ELT. With further elaboration on What are their goals and purposes? What is their target audience? What are their pros and cons?
“Is the Data warehouse the only data repository suitable for BI?” – no, (today) data lakes can also be suitable. And even more, both are considered the key to “a single version of the truth”. Although, if descriptive BI is the only purpose, it might still be better to stay within data warehouse. But, if you want to either have predictive BI or use your data for ML (or do not have a specific idea on how you want to use the data, but want to be able to explore your data effectively and efficiently), you know that a data warehouse might not be the best option.
“So, the data lake will save my resources a lot, because I do not have to worry about how to store /allocate the data – just put it in one storage and voila?!” – no, in this case your data lake will turn into a data swamp! And you are forgetting about the data quality you should (must!) be thinking of!
“But how do you prevent the data lake from becoming a data swamp?” – in short and simple terms – proper data governance & metadata management is the answer (but not as easy as it sounds – do not forget about your data engineer and be friendly with him [always… literally always :D) and also think about the culture in your organization.
“So, the use of a data warehouse is the key to high quality data?” – no, it is not! Having ETL do not guarantee the quality of your data (transform&load is not data quality management). Think about data quality regardless of the repository!
“Are data warehouses and data lakes the only options to consider or are we missing something?“– true! Data lakehouse!
“If a data lakehouse is a combination of benefits of a data warehouse and data lake, is it a silver bullet?“– no, it is not! This is another option (relatively immature) to consider that may be the best bit for you, but not a panacea. Dealing with data is not easy (still)…

Data Lake or Data Warehouse? Data cleaning or data wrangling? How to ensure the quality of your data?Download

In addition, in this talk I also briefly introduced the ongoing research into the integration of the data lake as a data repository and data wrangling seeking for an increased data quality in IS. In short, this is somewhat like an improved data lakehouse, where we emphasize the need of data governance and data wrangling to be integrated to really get the benefits that the data lakehouses promise (although we still call it a data lake, since a data lakehouse, although not a super new concept, is still debated a lot, including but not limited to, on the definition of such).

However, my colleague Mohamad Gharib discussed what DQ and more specifically data quality requirements, why they really matter, and provided a very interesting perspective of how to define high quality data, which further would serve as the basis for defining these requirements.

All in all, although we did not know each other before and had a very limited idea of what each of us will talk about, we all admitted that this seminar turned out to be very coherent, where we and our talks, respectively, complemented each other, extending some previously touched but not thoroughly elaborated points. This allowed us not only to make the seminar a success, but also to establish a very lively discussion (although the prevailing part of this discussion took place during the coffee break – as it usually happens – so, unfortunately, is not available in the recordings, the link to which is available below).

Latvian Open Data Hackathon for pupils 2022 – winners are announced!

November 16, 2022July 27, 2023Anastasija Nikiforova Leave a comment

During the last month, I have been a mentor of the Latvian Open Data Hackathon and an idea generator for pupils, organized by the Latvian Open Technologies Association with the support of DATI Group, E-Klase, Latvijas Kultūras akadēmija / Latvian Academy of Culture, Vides aizsardzības un reģionālās attīstības ministrija (VARAM)/ Ministry of Environmental Protection and Regional Development of Republic of Latvia and others.

This year the main topic of the hackathon was cultural heritage! Within a month, 36 teams from 126 participants from all over Latvia developed their ideas and prototypes, 10 teams reached the final after a round of semi-final presentations of their solutions to us – the mentor team (of course, we worked with the assigned teams in previous weeks as well). Here, we not only evaluated these ideas, but also provided them with yet another portion of feedback and suggestions for improving the idea or prototype for its further presentation in the final, where the jury will finally decide who the winner is.

Here I should note that as usually (I am a permament mentor of this hackathon) the participants surprised me very much both with the diversity of ideas and in very many times with their technical knowledge and skills (AI, crowdsourcing, gamification to name just a few) – just wow!

And last week it happened – we finally found out who are the winners – Kultūrkults as the best idea in the respective category (idea generator), and 417 Expectations Failed as the hackathon winner!

Congratulations to everyone on the successful efforts to promote the cultural and historical heritage! Also, congrats the whole society on having such a responsible and passionate youth to their culture and history!

Repeating what I already at the closing ceremony, I really want to believe that all the teams that participated in the hackathon will develop and implement their ideas regardless of the outcome of the hackathon – you are all winners for us! Keep going towards your goal!

Watch the closing ceremony (in Latvian) here.

ICEGOV2022: 4 insightful days and four roles – participant / attendee, author / presenter, workshop chair and Best Paper Awards nominee (part 2)

October 13, 2022Anastasija Nikiforova 3 Comments

In previous post I already shared my impressions with ICEGOV ICEGOV2022 conference – 15th International Conference on Theory and Practice of Electronic Governance, which took place in a very specific place – Guimarães that is considered the birthplace of Portugal, providing some general insight on how it was, elaborated a bit on paper I presented there and the fact that it was nominated to the Best Paper Awards. Thus, this post I dedicate to one particular role I played, i.e. workshop chair.

If you actively follow me, you probably remember that some time ago I already posted that our workshop titled “Identification of high-value dataset determinants: is there a silver bullet?” organized by me, Charalampos Alexopoulos, Nina Rizun and Magdalena Ciesielska was accepted for ICEGOV2022. So now we finally brought it alive! Our workshop was among 7 accepted workshops, organized by such prominent organizations as UNDESA – United Nations Department of Economic and Social Affairs, European Commission, UNESCO – United Nations Educational, Scientific and Cultural Organization and several more, which is just “wow”!

It took some time since as you might remember, the very first steps in this topic I took a year ago, i.e. this workshop is a continuation of the paper I presented at ICEGOV2021 – “Towards enrichment of the open government data: a stakeholder-centered determination of High-Value Data sets for Latvia“, which, in turn, was something what I did as a response to the request I received from my government, which was curious about this topic in the light of Open Data Directive (previously Public Sector Information (PSI) Directive). It was a simple one study, which was based on a survey of individual users and SME of Latvia aimed at clarifying their level of awareness about the existence of the OGD, their usage habits, as well as the overall level of satisfaction with the value of the OGD and their potential. Now the topic became even more topical, considering that even European Open Data Maturity Report is updating its methodology, as one of the new aspects including HVD. This, in turn, led to the fact that our workshop was even included in the list of upcoming events they suggest to consider and attend (see screenshots of schedule :)).

To say in a few words, in this workshop we initiated a discussion about the value of open data, and, more precisely, the concept of high value data(sets) and determinants / indicators that could allow not only identify them among existing data, but rather to identify, i.e., even if they are not previously published.

Together with our participants, we spoke about:
💡How can the “value” of open data be defined?
💡What are the current indicators for determining the value of data? Can they be used to identify valuable datasets to be opened? What are the country-specific high-value determinants (aspects)?
💡How high-value datasets can be identified? What mechanisms and/ or methods should be put in place to allow their determination?
💡Could it be there an automated way to gather information for HVD?
💡Can they be identified by third parties, e.g. researchers, enthusiasts AND potential data publishers, i.e. data owners?
💡What should be the scope of the framework?

The main point to be said on the mode of this workshop that it was a community-based, participatory, interactive workshop. Although it is clear that on the last day of the conference (the day after the official closing ceremony) all those registered for this workshop could not gather, it was still a valuable experience and we managed to have a nice event full of discussions with the participants, who got actively involved, which is especially pleasant, i.e. we managed to avoid sit-and-listen mode!!!

All in all, we had a nice event, where at least the first step in the direction we selected, were taken and some initial feedback was gathered.

Again, thank you, of Guimarães, Portugal, and, of course, organizers of ICEGOV2022 – University of Minho (Universidade do Minho) and UNU-eGOV – United Nations University!

Anastasija Nikiforova, PhD

Associate Professor of Applied AI and Information Systems; Data ecosystems, data governance and responsible technology adoption researcher

Tag women in science

Our – EOSC TF “FAIR Metrics and Data Quality” – whitepaper “Community-driven Governance of FAIRness Assessment: An Open Issue, an Open Discussion” is published by European Commission!

UT & Swedbank Data Science Seminar “When, Why and How? The Importance of Business Intelligence”

Latvian Open Data Hackathon for pupils 2022 – winners are announced!

ICEGOV2022: 4 insightful days and four roles – participant / attendee, author / presenter, workshop chair and Best Paper Awards nominee (part 2)