Online International Training and Capacity Building Program-2024 (ITCBP-2024) for the School of Planning and Architecture, New Delhi and my talk on “Data Management for AI Cities”

Yesterday, I had the honor of serving as an Expert speaker for an Online International Training and Capacity Building Program-2024 (ITCBP-2024) on “Data Management for AI Cities”, organised by the School of Planning and Architecture, New Delhi (SPA FIRST) that invited me to deliver a talk on “Data Visualisation for Cities: City Based Applications”.

During this talk, we touched on several important aspects surrounding data management and visualization in and for cities, including:

  • Data management that was then deduced to data quality management of both internal and external data, departing from understanding these data to managing their quality throughout the DQM lifecycle (stressing that data cleaning is not the same as DQM), touching on several approaches to this with greater emphasis on the AI-augmented data quality management – existing tools, underlying methods, and weaknesses that should be considered when using (semi-)automatic data quality rule recognition, depending on the method they use for this purpose;
  • Data governance was then discussed, stressing how it differs from DQM, and what it consists of and why it is crucial, incl. within the context of this talk;
  • Data visualization & storytellingrole, key principles, common mistakes, best practices. As part of this, we covered strategies for selecting data visualization type with tips on how to simplify this process, incl. by referring to chart selectors, but also stressing why “thinking outside the menu” is critical, esp. within city-level data visualization (where your audience is often citizens or policymakers). We looked at the most common and/or successful uses of non-traditional types of visualizations, incl. tools to be used for these purposes, breaking them into those that require coding and those that are rather low- or no-code; noise reduction – simplicity – strategic accents’ use, as well as drill-down (aka roll-down) & roll-up use to convey the message you want to deliver while overcoming highlighting everything and thereby losing your audience. In addition, a UX perspective was discussed, including but not limited some aspects that are often overlooked when thinking about the design and aesthetic color palette, namely the color-blindness of the audience that might “consume” these visualizations and again, tips on how to use it easier – did you you known that there are 300 million color blind people? And that 98% of those with color blindness have red-green color blindness?

So what was the key message or a “takeaway” of this talk? In a very few words:

  • Understand your data, audience and story you want to tell! Understand:
    • your data,
    • the story it tells,
    • your target audience’s preferences and needs,
    • the story you want to tell
    • data suitability
    • data quality
  • Attention-grabbing visuals & storytelling is a key!
    • reduce noise to avoid audience confusion and distraction
    • use drill-down and roll-up operations to keep visualization simple
    • add the context to provide all necessary information for clear understanding
    • add highlights to focus their attention – add accents strategically
  • Consider design – the optimal visualisation type, chart design, environment design, potential color-blindness of your audience
  • Keep track of the current advances, but also challenges and risks, of data visualization in urban settings, incl. but not limited to (1) privacy concerns, (2) data silos, (3) technological limitations.

All in all, it was quite a rich conversation and I am very grateful to the organizers for the invitation to be part of this event and to the audience for the very positive feedback!

📢New paper alert 📢“Predictive Analytics intelligent decision-making framework and testing it through sentiment analysis on Twitter data” or what people do and will think about ChatGPT?

This paper alert is dedicated to “Predictive Analytics intelligent decision-making framework and testing it through sentiment analysis on Twitter data” (authors: Otmane Azeroual, Radka Nacheva, Anastasija Nikiforova, Uta Störl, Amel Fraisse) paper, which is now publicly available in ACM Digital Library!

In this paper we present a predictive analytics-driven decision framework based on machine learning and data mining methods and techniques. We then demonstrate it in action by predicting sentiments and emotions in social media posts as a use-case choosing perhaps the trendiest topic – ChatGPT. In other words we check whether it is eternal love and complete trust or rather 🤬?

Why PA?

Predictive Analytics are seen to be useful in business, medical/ healthcare domain, incl. but not limited to crisis management, where, in addition to health-related crises, Predictive Analytics have proven useful in natural disasters management, industrial use-cases, such as energy to forecast supply and demand, predict the impact of equipment costs, downtimes / outages etc., aerospace to predict the impact of specific maintenance operations on aircraft reliability, fuel use, and uptime, while the biggest airlines – to predict travel patterns, setting ticket prices and flight schedules as well as predict the impact of, e.g., price changes, policy changes, and cancellations. And, of course, business process management and specifically retail, where Predictive Analytics allows retailers to follow customers in real-time, delivering targeted marketing and incentives, forecast inventory requirements, and configure their website (or store) to increase sales. It business process management area, in turn, Predictive Analytics give rise to what is called predictive process monitoring (PPM). Predictive Analytics uses were also found in Smart Cities and Smart Transportation domain, i.e. to support smart transportation services using open data, but also in education, i.e., to predict performance in MOOCs.

This popularity can be easily explained by examining their key strategic objectives, which IBM (Siegel, 2015) has summarized as: (1) competition – to secure the most powerful and unique stronghold of competitiveness, (2) growth – to increase sales and keep customers competitively, (3) enforcement – to maintain business integrity by managing fraud, (4) improvement – to advance core business capacity competitively, (5) satisfaction – to meet rising consumer expectations, (6) learning – to employ today’s most advanced analytics, (7) acting – to render business intelligence and analytics truly effective actionable. Marketing, sales, fraud detection, call center and core businesses of business units, same as customers and the enterprise as  a whole are expected to gain benefits, which makes PA a “must”.

And although according to (MicroStrategy, 2020), in 2020, 52% of companies worldwide used predictive analytics to optimize operations as part of business intelligence platform solution, although so far, predictive analytics have been used mostly by large companies (65% of companies with $100 million to $500 million in revenue, and 46% of companies under $10 million in revenue), with less adoption in medium-sized companies, not to say about small companies

Based on management theory and Gartner’s Business Intelligence and Performance Management Maturity Model, our framework covers four management levels of business intelligence – (a) Operational, (b) Tactical, (c) Strategic and (d) Pervasive. These are the levels that determine the need to manage data in organizations, transform them into information and turn them into knowledge, which is also the basis for making forecasts. The end result of applying it for business purposes is to generate effective solutions for each of these levels.

Sounds catchy? Read the paper here.

Many thanks to my co-authors – Radka and Otmane, who invited me to contribute to this study, and drove the entire process!

Cite paper as:

O. Azeroual, R. Nacheva, A. Nikiforova, U. Störl, and A. Fraisse. 2023. Predictive Analytics intelligent decision-making framework and testing it through sentiment analysis on Twitter data. In Proceedings of the 24th International Conference on Computer Systems and Technologies (CompSysTech ’23). Association for Computing Machinery, New York, NY, USA, 42–53. https://doi.org/10.1145/3606305.3606309

CFP for The IEEE International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS2023)

On behalf of the organizers (Technical Program Chair, Steering Committee, and finally publicity chair) of the IEEE International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS2023), I am inviting everyone, who is conducting research in this area, to consider submitting the paper to it.

Call for Papers:

New advancements in wireless communication systems such as Fifth-Generation (5G), Beyond Fifth-Generation (B5G), and Sixth-Generation (6G) networks will allow for new and unprecedented services to be made available for users with nearly unlimited capacity. These services will be the core driver for future digital transformation of our cities and communities. This will be accompanied by a ubiquitous deployment of Internet of Things (IoT) infrastructure and supported by computing capacity that will be available at the edge of the network and in the cloud. This computing infrastructure will handle the processing of data generated by users and services. Such a complex and diverse system will require the applications running on the computing/networking infrastructure to be Intelligent, efficient and sustainable. Additionally, the infrastructure will require smart control and automation systems to integrate and manage its different components. Artificial Intelligence (AI) and its applications will play a significant role in the design, deployment, automation, and management of future services. This will include applications that will be running on the edge and on cloud servers, networking applications to handle the flow of data between the users and the computing system, and intelligent automation and management software operating on the system. The International Conference on Intelligent Computing, Networking, and Services is aiming to provide an opportunity to present state of the art research in the intersections of Computing, Networking, and Services that are supported by Artificial Intelligence.

Submission Link: https://easychair.org/conferences/?conf=iccns2023

Researchers from both the industry and academia are encouraged to submit their original research contributions in all major areas, which include, but are not limited to the following main tracks:

💡Track 1: Artificial Intelligence Fundamentals

  • Artificial Intelligent Systems
  • Artificial Intelligent Theory
  • Artificial Intelligent applications in Computers and Communications
  • Artificial Intelligent and Robotics Technologies
  • Artificial Intelligent and cloud computing
  • Artificial Intelligent for Economic paradigms and game theory
  • Machine and Deep Learning of Knowledge
  • Artificial Intelligent based Distributed Knowledge and Processing
  • Artificial Intelligent for Human-Robot Interactions

💡Track 2: Intelligent Internet of Things and Cyber-Physical Systems

  • Intelligent IoT Applications and Services
  • Intelligent security for the Internet of Things and cyber-physical systems
  • Intelligent Internet of Things architectures and protocols
  • Intelligent Cyber Physical Systems (CPS)
  • Blockchain-based application in Intelligent Manufacturing: Industrial Internet of Things,
  • Blockchain and Secure Critical Infrastructure with Industry 4.0
  • Intelligent manufacture and management
  • Consensus and mining algorithms suited for resource-limited IoTs
  • Blockchain-based Controlled mobility and QoS
  • Blockchain-based energy optimization techniques in WSN
  • Blockchain-based Software defined networks

💡Track 3: Edge Intelligence and Federated Learning

  • Distributed and federated machine learning in edge computing
  • Theory and Applications of Edge Intelligence
  • Middleware and runtime systems for Edge Intelligence
  • Programming models compliant with Edge Intelligence
  • Scheduling and resource management for Edge Intelligence
  • Data allocation and application placement strategies for Edge Intelligence
  • Osmotic computing with edge continuum, Microservices and MicroData architectures
  • ML/AI models and algorithms for load balancing
  • Theory and Applications of federated learning
  • Federated learning and privacy-preserving large-scale data analytics
  • MLOps and ML pipelines at edge computing
  • Transfer learning, interactive learning, and Reinforcement Learning for edge computing
  • Modeling and simulation of EI and edge-to-cloud environments
  • Security, privacy, trust, and provenance issues in edge computing
  • Distributed consensus and blockchains at edge architecture
  • Blockchain networking for Edge Computing Architecture
  • Blockchain technology for Edge Computing Security
  • Blockchain-based access controls for Edge-to-cloud continuum
  • Blockchain-enabled solutions for Cloud and Edge/Fog IoT systems
  • Forensic Data Analytics compliant with Edge Intelligence

💡Track 4: Intelligent Networking in Beyond 5G (B5G) and 6G Wireless Communication

  • Intelligent Networking in Beyond 5G/6G Network Architectures
  • large-scale Internet of Things in B5G/6G
  • Vehicular networks in B5G/6G
  • Blockchain with lightweight computation
  • Service and applications for vehicular clouds in B5G/6G
  • Future internet architectures for B5G/6G
  • Intelligent networking services
  • Emerging networks in B5G/6G
  • Byzantine-tolerant FL
  • Churn-tolerant FL
  • FL for NGN and 6G
  • B5G/6G based IoT healthcare systems

💡Track 5: Intelligent Big Data Management and Processing

  • Intelligent Data Fusion
  • Intelligent Analytics and Data mining
  • Intelligent Distributed data management
  • Distributed transaction for blockchain
  • Intelligent Data Science and Data Engineering
  • Protocols for management and processing of data

💡Track 6: Intelligent Security and Privacy

  • Authentication and authorization
  • Applications of blockchain technologies in digital forensic
  • Privacy technologies
  • Blockchain-based threat intelligence and threat analytics techniques
  • Blockchain-based open-source tools
  • Forensics readiness of blockchain technologies
  • Blockchain Attacks on Existing Systems
  • Blockchain Consensus Algorithms
  • Blockchain-based Intrusion Detection/Prevention
  • Security and Privacy in Blockchain and Critical Infrastructure
  • Attacks on Blockchain and Critical Infrastructure
  • Blockchain and Secure Critical Infrastructure with Smart Grid

💡Track 7: Blockchain Research & Applications for Intelligent Networks and Services

  • State-of-the-art of the Blockchain technology and cybersecurity
  • Blockchain-based security solutions of smart cities infrastructures
  • Blockchain in connected and autonomous vehicles (CAV) and ITS)
  • Blockchain Technologies and Methodologies
  • Recent development and emerging trends Blockchain
  • New models, practical solutions and technological advances related to Blockchain
  • Theory of Blockchain in Cybersecurity
  • Applications of blockchain technologies in computer & hardware security
  • Implementation challenges facing blockchain technologies
  • Blockchain in social networking
  • Performance metric design, modeling and evaluation of blockchain systems
  • Network and computing optimization in blockchains
  • Experimental prototyping and testbeds for blockchains
  • Blockchain networking for Edge Computing Architecture
  • Blockchain technology for Edge Computing Security
  • Blockchain-based access controls for Edge-to-cloud continuum
  • Blockchain-enabled solutions for Cloud and Edge/Fog IoT systems
  • Forensic Data Analytics compliant with Edge Intelligence

Two workshops are scheduled to take place as part of ICCNS that you cannot miss, namely:

🗓️🗓️🗓️ IMPORTANT DATES

  • Full paper submission: April 21st, 2023 (Firm and Final)
  • Full paper acceptance notification: May 6th, 2023
  • Full paper camera-ready submission: May 20th, 2023

For any inquiries, please contact: intelligenttechorg@gmail.com.

Submit the paper and meet our team in Valencia in June, 2023!
 

With best wishes,

ICCNS2023 organizers

CFP: The International Conference on Intelligent Data Science Technologies and Applications (IDSTA2023)

On behalf of the organizers and as a publicity chair, I sincerely invite you to consider submitting the results of your recent research to The International Conference on Intelligent Data Science Technologies and Applications (IDSTA2023), which will be held in conjunction Kuwait Fintech and Blockchain Summit.

Huge amount of data is being generated and transmitted everyday. To be able to deal with this data, extract useful information from it, store it, transmit it, and represent it, intelligent technologies and applications are needed. The International Conference on Intelligent Data Science Technologies and Applications (IDSTA) is a peer reviewed conference, whose objective is to advance the Data Science field by giving an opportunity for researchers, engineers, and practitioners to present their latest findings in the field. It will also invite key persons in the field to share their current knowledge and their future expectations for the field. Topics of interest for submission include, but are not limited to:

💡Applied Public Affairs, incl. but not limited to Campaign Management, Mass Communication Politics, Political Analysis, Survey Sampling
💡Business Analytics, incl. but not limited to Stock Market Analysis, Predictive Analytics, Business Intelligence
💡Finance, incl. but not limited to Risk Management, Algorithmic Trading, Fraud Detection, Financial Analysis
💡Computer Science, incl. but not limited to Database Management Systems, Scientific Computing, Computer Vision, Fuzzy Computing, Feature Selection, Neural Networks, Deep Learning, Meta-Learning, Process Mining, Artificial Intelligence, Data Mining, Big Data, Web Analytics, Text Mining, Natural Language Processing, Sentiment Analysis, Social Media Analysis, Data Fusion, Performance Analysis and Evaluation, Evolutionary Computing and Optimization, Hybrid Methods, Granular Computing, Recommender Systems, Data Visualization, Predictive Maintenance, Internet of Things (IoT), Web Scraping
💡Sustainability, incl. but not limited to Datasets on Sustainability, Sustainability Modeling, Energy Sustainability, Water Sustainability, Environmental Sustainability, Risk Analysis
💡Cybersecurity, incl. but not limited to Data Privacy and Security, Network Security, Communication Security, Cryptography, Fraud Detection, Blockchain
💡Environmental Science, incl. but not limited to GIS, Climatographic, Remote Sensing, Spatial Data Analysis, Weather Prediction and Tracking,
💡Biotechnologies, incl. but not limited to Gnome Analysis, Drug Discovery and Screening and Side Effect Analysis, Structural and Folding Pattern, Disease Discovery and Classification, Bioinformatics, Next-Gen Sequencing
💡Smart City, incl. but not limited to City Data Management, Smart Traffic, Surveillance, Location-Based Services, Robotics
💡Human Behaviour Understanding
💡Semi-Structured and Unstructured Data
💡Pattern Recognition
💡Transparency in Research Data
💡Data and Information Quality
💡GPU Computing
💡Crowdsourcing


🗓️🗓️🗓️ IMPORTANT DATES

  • Paper submission:  March 15, 2023  
  • Acceptance notification:  May 20th, 2023
  • Full paper camera-ready submission: October 1st, 2023
    Conference Dates: October 24-26, 2023

All papers that are accepted, registered, and presented in IDSTA2023 and the workshops co-located with it will be submitted to IEEEXplore for possible publication. 
For any inquiries, contact intelligenttechorg@gmail.com.

Submit the paper and meet our team in Kuwait in October, 2023!
 

With best wishes,

IDSTA2023 organizers

Our – EOSC TF “FAIR Metrics and Data Quality” paper “Towards a data quality framework for EOSC” is released!🍷🍷🍷

I am glad to announce the release of “Towards a data quality framework for EOSC” document, which we have been hard at work on hard for several months as the Data Quality subgroup of the “FAIR Metrics and Data Quality” Task Force European Open ScienceCloud (EOSC) Association) – Carlo Lacagnina, Romain David, Anastasija Nikiforova, Mari Elisa Kuusniemi, Cinzia Cappiello, Oliver Biehlmaier, Louise Wright, Chris Schubert, Andrea Bertino, Hannes Thiemann, Richard Dennis.

This document explains basic concepts to build a solid basis for a mutual understanding of data quality in a multidisciplinary environment such as EOSC. These range from the difference between quality control, assurance, and management to categories of quality dimensions, as well as typical approaches and workflows to curate and disseminate dataset quality information, minimum requirements, indicators, certification, and vocabulary. These concepts are explored considering the importance of evaluating resources carefully when deciding the sophistication of the quality assessments. Human resources, technology capabilities, and capacity-building plans constrain the design of sustainable solutions. Distilling the knowledge accumulated in this Task Force, we extracted cross-domain commonalities (each TF member brings his / her own experience and knowledge – we all represent different domains and therefore try to make our contributions domain-agnostic, but at the same time considering every nuance that our specialism can bring and what deserves to be heard by others), as well as lessons learned, and challenges.

The resulting main recommendations are:

  1. Data quality assessment needs standards; unfortunately, not all communities have agreed on standards, so EOSC should assist and push each community to agree on community standards to guarantee the FAIR exchange of research data. Although we extracted a few examples highlighting this gap, the current situation requires a more detailed and systematic evaluation in each community. Establishing a quality management function can help in this direction because the process can identify which standard already in use by some initiatives can be enforced as a general requirement for that community. We recommend that EOSC considers taking the opportunity to encourage communities to reach a consensus in using their standards.
  2. Data in EOSC need to be served with enough information for the user to understand how to read and correctly interpret the dataset, what restrictions are in place to use it, and what processes participate in its production. EOSC should ensure that the dataset is structured and documented in a way that can be (re)used and understood. Quality assessments in EOSC should not be concerned with checking the soundness of the data content. Aspects like uncertainty are also important to properly (re)use a dataset. Still, these aspects must be evaluated outside the EOSC ecosystem, which only checks that evidence about data content assessments is available. Following stakeholders’ expectations, we recommend that EOSC is equipped with essential data quality management, i.e., it should perform tasks like controlling the availability of basic metadata and documentation and performing basic metadata compliance checks. The EOSC quality management should not change data but point to deficiencies that the data provider or producer can address.
  3. Errors found by the curators or users need to be rectified by the data producer/provider. If not possible, errors need to be documented. Improving data quality as close to the source (i.e., producer or provider) as possible is highly recommended. Quality assessments conducted in EOSC should be shown first to the data provider to give a chance to improve the data and then to the users.
  4. User engagement is necessary to understand the user requirements (needs, expectations, etc.); it may or may not be part of a quality management function. Determining and evaluating stakeholder needs is not a one-time requirement but a continuous and collaborative part of the service delivery process.
  5. It is recommended to develop a proof-of-concept quality function performing basic quality assessments tailored to the EOSC needs (e.g., data reliability and usability). These assessments can also support rewarding research teams most committed to providing FAIR datasets. The proof-of-concept function cannot be a theoretical conceptualization of what is preferable in terms of quality. Still, it must be constrained by the reality of dealing with an enormous amount of data within a reasonable time and workforce.
  6. Data quality is a concern for all stakeholders, detailed further in this document. The quality assessments must be a multi-actor process between the data provider, EOSC, and users, potentially extended to other actors in the long run. The resulting content of quality assessments should be captured in structured, human- and machine-readable, and standard-based formats. Dataset information must be easily comparable across similar products, which calls for providing homogeneous quality information.
  7. A number of requirements valid for all datasets in EOSC (and beyond) and specific aspects of a maturity matrix gauging the maturity of a community when dealing with quality have been defined. Further refinement will be necessary for the future, and specific standards to follow will need to be identified.

We sincerely invite you to take a look at this very concise 76-pages long overview of the topic and look forward to your recommendations / suggestions / feedback – we hope to provide you with the opportunity to communicate the above conveniently very soon, so take your time to read, while we are making our last preparations 📖 🍷📖🍷📖🍷 But make sure you have a glass of wine at the time of reading it, as this will make sense at some point of reading, i.e. when we compare data quality with wine quality with reference to both flavour type and intensity (intrinsic quality), brand, packaging (extrinsic quality)… but no more teasers and bon appetite! 🍷🍷🍷
The document can be found in an Open Access here.

We also want to acknowledge the contribution and input of colleagues from several European institutions, the EOSC Association and several external-to-TF stakeholders who gave feedback based on their own experience, and the TF Support Officer Paola Ronzino, as well as to our colleagues – Sarah Stryeck and Raed Al-Zoubi, and the last but not the list – to all respondents and everyone involved.