CFP: 3rd International Workshop on Advanced Data Systems Management, Engineering, and Analytics (MegaData): where the Edge meets the Cloud

On behalf of the organizers, I sincerely invite you to consider submitting the results of your recent research to The Third International Workshop on Advanced Data Systems Management, Engineering, and Analytics (MegaData), which will be held in conjunction with the 23rd  IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID2023).

The MegaData objective is to bring together researchers, practitioners, system administrators, system programmers, and others interested in sharing and presenting their perspectives on the effective management of big data systems. The focus of the workshop is on a novel and practical, systems-oriented work, where MegaData, whose moto this year is “MegaData: where the Edge meets the Cloud”, offers an opportunity to showcase the latest advances in this area and discuss and identify future directions and challenges in management and engineering of big data systems.

MegaData covers the area of Big Data operations (management, engineering, and analytics) within Cloud and Edge computing models. It aims to report on the advances and trends in Big Data deployment architectures from both the infrastructure and application levels. Papers presenting recent results, research issues, practical applications, case studies, and industrial implementations are welcome.

Specific topics of interest include, but are not limited, to the following:

  • Resource management and scheduling mechanisms for data systems
  • Auto Scaling and elastic scaling approaches and mechanisms
  • Data governance and privacy of “data in motion” and “data at rest” over edge/cloud
  • Emerging Data deployment models in IoT, IoT-to-Cloud, Edge/fog
  • Federated Learning and edge intelligence for big data systems
  • Advances data storage models, including object stores and key-value stores
  • Techniques for data integrity, availability, reliability, and fault tolerance
  • Big Data workflows (data management, data wrangling, automated workflows)
  • Data pipeline (data lake to analytics, new data stream architectures, edge/fog, cloud enabled solutions)
  • High-performance Data Analytics applications
  • Adaptive offloading techniques among Fog, Edge, and Cloud Computing

Important dates

  • Submission open: December 1st, 2022
  • Paper submission Deadline: January 15th, 2023
  • Acceptance notification: February 10th, 2023
  • Camera-ready submission: March 17th, 2023
  • Conference Dates: May 1st, 2023

We invite original research papers that have not been previously published and are not currently under review for publication elsewhere. Submitted papers should be no longer than 8 pages (including references and appendices) in two-column IEEE template format. Papers need to be submitted through the EasyChair submission portal.

Best papers presented at the workshop will be selected, and the corresponding authors will be invited to submit an extended version of their papers for possible publications in: Special Issue on Emerging Topics in Big Data and Edge Intelligence on Springer Cluster Computing (IF 2.3)

Submit the paper and meet our team in Bangalore, India in May, 2023!

With best wishes,

MegaData organizers – Yaser Jararweh, Feras M. Awaysheh, Moath Jarrah, Anastasija Nikiforova, Sadi Alawadi

UT & Swedbank Data Science Seminar “When, Why and How? The Importance of Business Intelligence”

Last week I had the pleasure of taking part in a Data Science Seminar titled “When, Why and How? The Importance of Business Intelligence. In this seminar, organized by the Institute of Computer Science  (University of Tartu) in cooperation with Swedbank, we (me, Mohammad Gharib, Jurgen Koitsalu, Igor Artemtsuk) discussed the importance of BI with some focus on data quality. More precisely, 2 of 4 talks were delivered by representatives of the University of Tartu and were more theoretical in nature, where we both decided to focus our talks on data quality (for my talk, however, this was not the main focus this time), while another two talks were delivered by representatives of Swedbank, mainly elaborating on BI – what it can give, what it already gives, how it is achieved and much more. These talks were followed by a panel moderated by prof. Marlon Dumas.

In a bit more detail…. In my presentation I talked about:

  • Data warehouse vs. data lake – what are they and what is the difference between them?” – in a very few words – structured vs unstructured, static vs dynamic (real-time data), schema-on-write vs schema on-read, ETL vs ELT. With further elaboration on What are their goals and purposes? What is their target audience? What are their pros and cons? 
  • Is the Data warehouse the only data repository suitable for BI?” – no, (today) data lakes can also be suitable. And even more, both are considered the key to “a single version of the truth”. Although, if descriptive BI is the only purpose, it might still be better to stay within data warehouse. But, if you want to either have predictive BI or use your data for ML (or do not have a specific idea on how you want to use the data, but want to be able to explore your data effectively and efficiently), you know that a data warehouse might not be the best option.
  • So, the data lake will save my resources a lot, because I do not have to worry about how to store /allocate the data – just put it in one storage and voila?!” – no, in this case your data lake will turn into a data swamp! And you are forgetting about the data quality you should (must!) be thinking of!
  • But how do you prevent the data lake from becoming a data swamp?” – in short and simple terms – proper data governance & metadata management is the answer (but not as easy as it sounds – do not forget about your data engineer and be friendly with him [always… literally always :D) and also think about the culture in your organization.
  • So, the use of a data warehouse is the key to high quality data?” – no, it is not! Having ETL do not guarantee the quality of your data (transform&load is not data quality management). Think about data quality regardless of the repository!
  • Are data warehouses and data lakes the only options to consider or are we missing something?“– true! Data lakehouse!
  • If a data lakehouse is a combination of benefits of a data warehouse and data lake, is it a silver bullet?“– no, it is not! This is another option (relatively immature) to consider that may be the best bit for you, but not a panacea. Dealing with data is not easy (still)…

In addition, in this talk I also briefly introduced the ongoing research into the integration of the data lake as a data repository and data wrangling seeking for an increased data quality in IS. In short, this is somewhat like an improved data lakehouse, where we emphasize the need of data governance and data wrangling to be integrated to really get the benefits that the data lakehouses promise (although we still call it a data lake, since a data lakehouse, although not a super new concept, is still debated a lot, including but not limited to, on the definition of such).

However, my colleague Mohamad Gharib discussed what DQ and more specifically data quality requirements, why they really matter, and provided a very interesting perspective of how to define high quality data, which further would serve as the basis for defining these requirements.

All in all, although we did not know each other before and had a very limited idea of what each of us will talk about, we all admitted that this seminar turned out to be very coherent, where we and our talks, respectively, complemented each other, extending some previously touched but not thoroughly elaborated points. This allowed us not only to make the seminar a success, but also to establish a very lively discussion (although the prevailing part of this discussion took place during the coffee break – as it usually happens – so, unfortunately, is not available in the recordings, the link to which is available below).

Latvian Open Data Hackathon for pupils 2022 – winners are announced!

During the last month, I have been a mentor of the Latvian Open Data Hackathon and an idea generator for pupils, organized by the Latvian Open Technologies Association with the support of DATI Group, E-Klase, Latvijas Kultūras akadēmija / Latvian Academy of Culture, Vides aizsardzības un reģionālās attīstības ministrija (VARAM)/ Ministry of Environmental Protection and Regional Development of Republic of Latvia and others.

This year the main topic of the hackathon was cultural heritage! Within a month, 36 teams from 126 participants from all over Latvia developed their ideas and prototypes, 10 teams reached the final after a round of semi-final presentations of their solutions to us – the mentor team (of course, we worked with the assigned teams in previous weeks as well).  Here, we not only evaluated these ideas, but also provided them with yet another portion of feedback and suggestions for improving the idea or prototype for its further presentation in the final, where the jury will finally decide who the winner is.

Here I should note that as usually (I am a permament mentor of this hackathon) the participants surprised me very much both with the diversity of ideas and in very many times with their technical knowledge and skills (AI, crowdsourcing, gamification to name just a few) – just wow!

And last week it happened – we finally found out who are the winners – Kultūrkults as the best idea in the respective category (idea generator), and 417 Expectations Failed as the hackathon winner!

Congratulations to everyone on the successful efforts to promote the cultural and historical heritage! Also, congrats the whole society on having such a responsible and passionate youth to their culture and history!

Repeating what I already at the closing ceremony, I really want to believe that all the teams that participated in the hackathon will develop and implement their ideas regardless of the outcome of the hackathon – you are all winners for us! Keep going towards your goal!

Watch the closing ceremony (in Latvian) here.

ICEGOV2022: 4 insightful days and four roles – participant / attendee, author / presenter, workshop chair and Best Paper Awards nominee (part 2)

In previous post I already shared my impressions with ICEGOV ICEGOV2022 conference – 15th International Conference on Theory and Practice of Electronic Governance, which took place in a very specific place – Guimarães that is considered the birthplace of Portugal, providing some general insight on how it was, elaborated a bit on paper I presented there and the fact that it was nominated to the Best Paper Awards. Thus, this post I dedicate to one particular role I played, i.e. workshop chair.

If you actively follow me, you probably remember that some time ago I already posted that our workshop titled “Identification of high-value dataset determinants: is there a silver bullet?” organized by me, Charalampos Alexopoulos, Nina Rizun and Magdalena Ciesielska was accepted for ICEGOV2022. So now we finally brought it alive! Our workshop was among 7 accepted workshops, organized by such prominent organizations as UNDESA – United Nations Department of Economic and Social Affairs, European Commission, UNESCO – United Nations Educational, Scientific and Cultural Organization and several more, which is just “wow”!

It took some time since as you might remember, the very first steps in this topic I took a year ago, i.e. this workshop is a continuation of the paper I presented at ICEGOV2021 – Towards enrichment of the open government data: a stakeholder-centered determination of High-Value Data sets for Latvia, which, in turn, was something what I did as a response to the request I received from my government, which was curious about this topic in the light of Open Data Directive (previously Public Sector Information (PSI) Directive). It was a simple one study, which was based on a survey of individual users and SME of Latvia aimed at clarifying their level of awareness about the existence of the OGD, their usage habits, as well as the overall level of satisfaction with the value of the OGD and their potential. Now the topic became even more topical, considering that even European Open Data Maturity Report is updating its methodology, as one of the new aspects including HVD. This, in turn, led to the fact that our workshop was even included in the list of upcoming events they suggest to consider and attend (see screenshots of schedule :)).

To say in a few words, in this workshop we initiated a discussion about the value of open data, and, more precisely, the concept of high value data(sets) and determinants / indicators that could allow not only identify them among existing data, but rather to identify, i.e., even if they are not previously published.

Together with our participants, we spoke about:
💡How can the “value” of open data be defined?
💡What are the current indicators for determining the value of data? Can they be used to identify valuable datasets to be opened? What are the country-specific high-value determinants (aspects)?
💡How high-value datasets can be identified? What mechanisms and/ or methods should be put in place to allow their determination?
💡Could it be there an automated way to gather information for HVD?
💡Can they be identified by third parties, e.g. researchers, enthusiasts AND potential data publishers, i.e. data owners?
💡What should be the scope of the framework?

The main point to be said on the mode of this workshop that it was a community-based, participatory, interactive workshop. Although it is clear that on the last day of the conference (the day after the official closing ceremony) all those registered for this workshop could not gather, it was still a valuable experience and we managed to have a nice event full of discussions with the participants, who got actively involved, which is especially pleasant, i.e. we managed to avoid sit-and-listen mode!!!

All in all, we had a nice event, where at least the first step in the direction we selected, were taken and some initial feedback was gathered.

Again, thank you, of Guimarães, Portugal, and, of course, organizers of ICEGOV2022 – University of Minho (Universidade do Minho) and UNU-eGOV – United Nations University!