
June 5, I was delighted to be invited to be a keynote at the HackCodeX Forum, delivering a keynote titled “Data Quality as a prerequisite for your business success: when should I start taking care of it?“ in my hometown – Riga, Latvia. HackCodeX Forum is a one-day event where international experts share their experience and knowledge about emerging technologies and areas such as Artificial Intelligence, Security, Data Quality, Quantum Computing, Sustainability, Open Data, Privacy, Ethics, Digital Services (with a keynote from CEO of SK ID Solutions – one of the solutions that make Estonia the #1 digital nation) etc. This time I was invited to cover the topic of Data Quality and I was happy to do so, especially considering the fact that the HackCodeX Forum is an event that closes one of the leading hackathons in Europe, which Riga was fascinated and passionated about, and this is evidenced by the rich list of advertisement we all saw in the last weeks and months (Delfi, Haker.lv, kripto.media, kursors.lv, labsoflatvia.lv to name just a few), which this year held in Latvia and brought together around 500 developers, designers and entrepreneurs to create and innovate, solving 5 challenges of this year:
- 🏆 ATEA challenge: Minimise manual work and drive data-powered decision-making
- 🏆 Emergn challenge: Improve the quality of life for people with disabilities
- 🏆 UI.COM & Riga TechGirls challenge: Help shoppers make more sustainable purchasing decisions
- 🏆 Game Changer Audio (GCA) challenge: Identify each individual note by listening to notes being played real-time
- 🏆 Ministry of Education and Science challenge: Help make education hackable again!



Form me, in turn, yet another audience, yet another experience.
In short, in this Star Wars-style presentation (yes, I am a fan, and given the number of DQ memes in this style, I am not an exception and cannot say that I am a geek or a weird person, but rather a normal DQ/IT person), I urged “help R2D2 save the galaxy!“.


Images from: History in Objects: Death Star Plans Datacard • Lucasfilm, Video Analysis of an Exploding Death Star | WIRED, Post | LinkedIn, Destruction of Despayre | Wookieepedia | Fandom. Special thanks to George Firican for the idea and inspiration!
In a bit more detail, I elaborated on the importance and the relevance of the data quality regardless of the age of this topic [that is older than me], data quality management and the factors the DQM approach depends on. The popularity and importance of the topic is undoubtfully due to the amount of the data we are dealing with and the fact that we are living in the data-driven world, where data are everywhere – they are generated continuously, by multiple sources, which is not only about our devices, or sensors, but also about ourselves (however, with the help of the two above). This led to the fact that some time ago data have been claimed to be a new oil. Have you heard this? I am sure you were. But have you thought about this statement? is it true? false? something in between? Bingo! While there are commonalities between data and oil, they are rather small in number. One interesting reading devoted to this comes form Forbes. I.e. they admit that both artifacts – oil and data – can be seen as similar since both are “power”, including being the power of those, who own them. In other words, they compare data owners such as Alibaba, Google, Twitter, Facebook etc. to oil barons (100 years back from now). But, otherwise, more in-depth comparative analysis reveal mostly differences. To name just a few:
💡 oil is a finite resource, while data are not. Instead, data are effectively infinitely durable and reusable and treating them like oil, i.e. storing in siloes, reduces their value, usefulness and potential as whole;
💡another difference is in transportation, where oil requires huge amounts of resources to be transported to where and when it is needed, while for the data – they can be replicated indefinitely and moved around the world at very high speeds and, more importantly, at very low costs;
💡 Yet another difference lies in the usability of both – oil and data – when they have been already used once. While for the oil, when it is used, its energy is being lost (as heat or light), or permanently converted into another form such as plastic, data usefulness, in contrast, tend to increase with their actual usage, i.e. new uses arose, data are turned into training data at the very end etc.;
💡 as the world’s oil reserves dwindle, extracting it become increasingly difficult and expensive, while for the data – they are becoming increasingly available, incl. but not limited due to the technology advances as well as due to a high number and amount of data producers;
💡 and the last but not the least, oil drilling involves causing damage to the natural environment and exploitation of finite natural resources, while data mining doesn’t – at least there is no intrinsic damage to the environment and exploitation of finite natural resources. Of course, here we do not mention (but should not forget about) the electricity used to run the system and relatively low tendency of green computing (aka sustainable computing) for their further processing.
Thus, as Forbes suggests, if we want to talk about the data as a power source or fuel, it make much more sense to compare them with renewable sources 🌎🌎🌎 such as the sun ☀️, wind 💨 and tides 🌊. All in all, data can be seen to be more than oil. Hence the popularity and importance of the data quality topic.
The factors that can affect the DQM approach, in turn, can be different, starting with those implying from the relative nature of the data quality as a phenomenon, i.e., the definition, variety of (and non-ambiguity of) data quality dimensions, to which the data quality metrics are expected to be selected, DQ dynamism, dependence on the user and use-case etc. (some of the above are discussed in “Towards a data quality framework for EOSC“ and “Definition and Evaluation of Data Quality: a user-oriented data object-driven approach to data quality assessment”), as well as the data artifact whose quality is under analysis. In other words, is this about the data object or dataset? Database? Data repository? Information system?
If it is a data object, the next “level” of factors is data owner – known or unknown (third-party data such as open data), and their structure – structured, semi-structured, unstructured data?
While for the Information Systems / Software, I find that “think data quality first” and “data quality by design” are two mantras to be kept in mind. The later, however, is something we have studied together with my colleagues from Mexico , coming up with this modification of “quality by design” principle into “data quality by design”. I reported on the respective study before – “ISO/IEC 25012-based methodology for managing data quality requirements in the development of information systems: Towards data quality by design” (read here), where we proposed DAQUAVORD – a Methodology for Project Management of Data Quality Requirements Specification, which is based on the Viewpoint-Oriented Requirements Definition (VORD) method, and the latest and most generally accepted ISO/IEC 25012 standard, whose main idea was to start thinking of data quality as soon as the development of the system start to make sure that some data quality level is ensured by the design, i.e. transformed into both functional and non-functional requirements.
Alternatively, it can be done not necessarily before, but also during the development or even when the system is already in production. Some solutions exist here, but I typically use the opportunity to self-advertise previous projects and studies that I worked on, especially this one since it was based on the results of my PhD thesis, which is summarized “Definition and Evaluation of Data Quality: a user-oriented data object-driven approach to data quality assessment”, namely, Data Quality Model-based testing approach (DQMBT) for testing information systems that uses the data object-driven data quality model as a testing model, which was presented in the context of e-scooter system and Insurance System. Both, however, are rather ad-hoc approaches, whose main value lies in the conceptual idea, not the implementation, at least at this point.
For the repository, in turn, whether it is about the data warehouse, data lake? Or maybe even data lakehouse? For the later two, metadata and data governance become “must” to avoid GIGO (garbage in – garbage out effect) and turning the data lake into a data swamp, which is slightly addressed in “Combining data lake an data wrangling for ensuring data quality in CRIS“, incl. but not limited elaborating on why data wrangling should be given the preference over data cleaning.
The importance of both metadata and data governance was then emphasized, where for the later, the support from Elon Musk has been asked 😀 He was rather mentioned to support the speculations of data governance importance, which was once mentioned by him as a key to improve the product you are delivering, and I just wanted to make my words a bit more authoritative, i.e. he is seen to be more or less successful businessman, isn’t he? 😀
You can find slides here or watch the video 👇
Big thanks to both the organizers – Helve, and supporters, who made both the hackathon and the forum a success. More precisely, Techchill, techhub, Lift 99, #RigaTechGirls, justjoin.it, Oradea.Tech.Hub, RTU design Factory, Startup Lithuania, Kaunas Technology University, Stratup Estonia, Spring Hub. kood / Johvi, Technopol, Enterprise Forum CEE, Slush, Aaltos, AWS (Amazon Web Services), Google for Startups, Junction, Bird Incubator, EdTech Estonia. Sphere,it, Codecamp, Nine brains, Draper Startup House, Eiropas Digitālās inovācijas centrs, 28Stone.
And some more very special actors of the community, who were in the core of this hackathon edition – Emergn, Izglītības un zinātnes ministrija (Ministry of Education and Science), EPAM Systems Latvia, Atea Global Services Ltd., Ubiquiti Inc. & RigaTechGirls, Investment and Development Agency of Latvia (LIAA).
2 thoughts on “HackCodeX Forum Keynote “Data Quality as a prerequisite for you business success: when should I start taking care of it?””