CyberCommando’s meetup and my talk on Internet of Things Search Engines and their role in detecting vulnerable open data sources

October is Cybersecurity Awareness Month, as part of which CyberCommando’s meetup 2023 took place in the very heart of Latvia – Riga, where I was invited to deliver an invited talk that I devoted to IoTSE and entitled “What do Internet of Things Search Engines know about you? or IoTSE as a vulnerable open data sources detection tool“.

CyberCommando’s meetup organizers claim it to be the most anticipated vendor independent industry event in the realm of cybersecurity, a conference designed to empower our local and regional IT security professionals as we face the evolving challenges of the digital age by bringing together high-level ICT professionals from local, regional, and international businesses, governments and government agencies, tech communities, financial, public and critical infrastructure sectors. CyberCommando’s meetup covered a broad set of topics, starting from development of ICT security skills and Awareness Raising, to modern market developments and numerous technological solutions in the Cloud, Data, Mobility, Network, Application, Endpoint, Identity & Access, and SecOps, to corporate and government strategies and the future of the sector. Three parallel sessions and numerous talks delivered by 20+ local and international experts, including but not limited to IT-Harvest, Radware, DeepInstinct, Pentera, ForeScout Technologies, CERT.LV, ESET. It is a great honor to complement this list by the University of Tartu, which I represented delivering my talk at the main stage 🙂

Let’s refer to my talk – “What do Internet of Things Search Engines know about you? or IoTSE as a vulnerable open data sources detection tool“. Luckily, very few attendees knew or used OSINT (Open Source INTelligence), Internet of Things Search Engines (IoTSE) (however, perhaps they were just too shy to raise their hands when I asked this), so, hopefully, this was a good choice of topic. So, what was it about?

Today, there are billions of interconnected devices that form Cyber-Physical Systems (CPS), Internet of Things (IoT) and Industrial Internet of Things (IIoT) ecosystems. As the number of devices and systems in use and the volume and the value of data increases, the risks of security breaches increase as well.

As I discussed previously, this “has become even more relevant in terms of COVID-19 pandemic, when in addition to affecting the health, lives, and lifestyle of billions of citizens globally, making it even more digitized, it has had a significant impact on business [3]. This is especially the case because of challenges companies have faced in maintaining business continuity in this so-called “new normal”. However, in addition to those cybersecurity threats that are caused by changes directly related to the pandemic and its consequences, many previously known threats have become even more desirable targets for intruders, hackers. Every year millions of personal records become available online [4-6]. Lallie et al. [3] have compiled statistics on the current state of cybersecurity horizon during the pandemic, which clearly indicate a significant increase of such. As an example, Shi [7] reported a 600% increase in phishing attacks in March 2020, just a few months after the start of the pandemic, when some countries were not even affected. Miles [8], however, reported that in 2021, there was a record-breaking number of data compromises, where “the number of data compromises was up more than 68% when compared to 2020”, when LinkedIn was the most exploited brand in phishing attacks, followed by DHL, Google, Microsoft, FedEx, WhatsApp, Amazon, Maersk, AliExpress and Apple.”

And while Risk based security & Flashpoint (2021) [5] suggests that vulnerability landscape is returning to normal, , incl. but not limited due to various activities, such as #WashYourCyberHands INTERPOL capmaign and “vaccinate your organization” movements, another trigger closely related to cybersecurity that is now affecting the world is geopolitical upheaval. Additionally, according to Cybersecurity Ventures, by 2025, cybercrime will cost the world economy around $10.5 trillion annually, increasing from $3 trillion in 2015. Moreover, we are at risk of what is called Cyber Apocalypse or Cyber Armageddon, as was discussed during World Economic Forum (and according to Forbes), which is very likely to happen in coming 2 years (hopefully, it will not).

According to Forbes, the key drivers for this are the ongoing digitization of society, behavioral changes due to COVID-19 pandemic, political instability such as wars, the global economic downturn, while WEF relate this to the fact that technology becomes more complex, in particular, breakthrough technologies such as AI (considering current state-of-the-art, I would stress the role of quantum computing here), where I would stress that this “complexity” is two-fold, i.e., technologies become more advanced, while at the same time – easier to use, including those that can be used to detect and expose vulnerabilities. At the same time, although society is being digitized, society tend to lack digital literacy, data literacy & security literacy.

Hence, when we ask what should be done to tackle associated issues, the answer is also multi-fold, where some recommendations being actively discussed, including Forbes and Accenture, are to “secure the core”, which, in turn, involves ensuring that security and resilience are built into every aspect of the organization, understanding that cybersecurity is not something that’s only discussed within the IT department but rather at all levels of organization, organizations need to address the skills shortage within the cybersecurity domain, and it should involve utilizing automation where possible

To put it simply:

  • (cyber)security governance
  • digital literacy
  • cybersecurity is not a one-time event, but a continuous process
  • automation whenever possible
  • «security first!» as a principle for all artifacts, processes and ecosystem
  • preferably – «security-by-design» and «absolute security», which, of course, is rather an utopia, but still something we have to try to achieve (despite the fact we know it is impossible to achieve this level).

Or even simpler, as I typically say – “security to every home!”.

In the light of the above, i.e., “security first!” as a principle for all artifacts and the need to “secure the core” – are our data management systems always protected by default (i.e., secure-by-design)? While it can sound surprisingly and weird in 2023, but this is a fact that while various security protection mechanisms have been widely implemented, the concept of a “primitive” artifact such as a data management system seems to have been more neglected and the number of unprotected or insufficiently protected data sources is enormous. Recent research demonstrated that weak data and database protection in particular is one of the key security threats [4,6,9-11]. According to a list drawn up by Bekker [5] and Identity Force on major security breaches in 2020, a large number of data leaks occur due to unsecured databases. As an example:

  • Estee Lauder – 440 million customer records
  • Prestige Software hotel reservation platform – over 10 million hotel guests, including Expedia, Hotels.com, Booking.com, Agoda etc.
  • U.K-based Security Firm gained data of Adobe, Twitter, Tumbler, LinkedIn etc. and users with a total of over 5 billion records
  • Marijuana Dispensaries – 85 000 medical patient and recreational user records

to name just a few… At times it is due to their (mis)configuration, at times – due to the vulnerabilities in products or services, where additional security mechanisms would be required. Sometimes, of course, this due to the very targeted attacks, where the remaining of this post will have limited value, but let’s rather focus on those very critical cases, which refer to the above, especially in the context of the above mentioned fact that recent advances in ICT decreased the level of complexity of searching for connected devices on the Internet and easy access to them even for novices due to the widespread popularity of step-by-step guides on how to use IoTSE – aka Internet of Everything (IoE) or Open Source Intelligence (OSINT) Search Engines such as Shodan, BinaryEdge, Censys, ZoomEye, Hunter, Greynoise, Shodan, Censys, IoTCrawler – to find and gain access to insufficiently protected webcams, routers, databases, refrigerators, power plants, and even wind turbines. As a result, OSINT was recognized to be one of the five major categories of CTI (Cyber Threat Intelligence )sources (at times more than five are named, but OSINT remain to be part of this X categories), along with Human Intelligence (HUMINT), Counter Intelligence, Internal Intelligence and Finished Intelligence (FINTEL).

While these tools may represent a security risk, they provide many positive and security-enhancing opportunities. They provide an overview on network security, i.e., devices connected to the Internet within the company, are useful for market research and adapting business strategies, allow to track the growing number of smart devices representing the IoT world, tracking ransomware – the number and nature of devices affected by it, and therefore allow to determine the appropriate actions to protect yourself in the light of current trends. However, almost every of these white hat-oriented objectives can be exploited by black-hatters.

In this talk I raised several questions that can be at least partly answered with the help of IoTSE, such as:

  • Whether data source is visible and even accessible outside the organization?
  • What data can be gathered from it? and what is their “value” for external actors, such as attackers and fraudsters? I.e., whether these data can pose a threat to the organization using them to deploy an attack?
  • Are stronger security mechanisms needed? Is the vulnerability related to internal (mis)configuration or database in use?

To answer the above questions, I referred to the study that has been conducted by me and my former student – Artjoms Daškevičs (very talented student, whose bachelor thesis was even nominated to the best Computer Science thesis of in Latvia) some time ago. As part of that study an Internet of Things Search Engines- (IoTSE-) based tool called ShoBEVODSDT (Shodan- and Binary Edge-based Vulnerable Open Data Sources Detection Tool) was developed. This “toy example” of IoTSE conducts the passive assessment – it does not harm the databases but rather checks for potentially existing bottlenecks or weaknesses which, if the attack would take place, could be exposed. It allows for both comprehensive analysis for all unprotected data sources falling into the list of predefined data sources – MySQL, PostgreSQL, MongoDB, Redis, Elasticsearch, CouchDB, Cassandra and Memcached, or to define IP range to examine what can be seen from the outside of the organization about the data source (read more in (Daskevics and Nikiforova, 2021)).

The remainder was mostly built around four questions (and articles / book chapters) that we addressed with its help, namely:

  • Which data sources have proven to be the most vulnerable and visible outside the organization?
  • What data can be gathered from open data sources (if any), and what is their “value” for external actors, such as attacker and fraudsters? Whether these data can pose a threat to the organization using them to deploy an attack?

This part was built around our conference paper and this book chapter. In short (for a bit longer answer refer to the article), the number of data sources accessible outside the organization is less than 2% (more than 98% of data sources are not accessible via a simple IoTSE tool). However, there are some data sources that may pose risks to organizations and 12% of open data sources – data sources IoTSE tool was able to reach were already compromised or contain the data that can be used to compromise them. ElasticSearch and Memcached had the highest ratio of instances to which it was possible to connect, while MongoDB, PostgreSQL and ElasticSearch demonstrate the most negative trend in terms of already compromised databases (not by us, of course).

In addition, we might be interested in comparing SQL and NoSQL databases, where the latter are less likely to provide security measures, including sometimes very primitive and simple measures such as an authentication,authorization (Sahafizadeh et al., 2015) and data encryption. This is what we explored in the book chapter. We were not able to find significant differences, where from the “most secure”service viewpoint, CouchDB has demonstrated very good results in the context of security as the NoSQL database and MySQL as a relational database. However, if the developer needs to use Redis or Memcached, additional security mechanisms and/ or activities should be introduced to protect them. It must be understood, however, that these results cannot be broadly disseminated with regard to the security of the open data storage facility, mostly by demonstrating how many data storage holders were concerned about the security of their data storage facilities, since many data storage facilities have the potential to apply a series of built-in mechanisms. For the “most unsecure” service, Elasticsearch is characterized by weaker and less frequently used security protection mechanisms. This means that the database holder should be wary of using it. Similar conclusion can be drawn on Memcached (although it contradicts to CVE Details), where the total number of vulnerabilities found was the highest.However, the risk of these vulnerabilities was lower compared to ElasticSearch, so it can be assumed that CVE Details either does not respect such “low-level” weaknesses or have not yet identified them. Here in the future, an in-depth analysis of what CVE Details counts as vulnerability, and further exploration of the correlation with our results, could be carried out.

The next question we were interested in was:

  • Which Baltic country – Latvia, Lithuania, Estonia, has the most open & vulnerable data sources? and whether technological development of Estonia will be visible here as well?

This question was raised and partially answered in another conference paper. It is impossible to give an unambiguous answer here, since while Latvia showed the highest ratio of successful connections (and Estonia the lowest), Lithuania showed the most negative result in terms of already compromised data sources, and Estonia – for sensitive and non-sensitive data. Estonia, however, had the largest number of data sources from which data could not be obtained (with Latvia having a slightly lower but still relatively good result in this regard). And based on the average value of the data that could be obtained form these data sources, Lithuania again demonstrated the most negative result, which, however, was only slightly different from the result demonstrated by Estonia and Latvia (which may be a statistical error, since the total number of data sources found by our tool, differed significantly for these countries). When examining specific data sources that are more likely causing lower results, they vary from one country to another, so it is impossible to find the most insecure database that is the root of all problems.

And one more question I raised was:

  • Do “traditional” vulnerability registries provide a sufficiently comprehensive view of the DBMS security, or should they be subject for intensive and dynamic inspection by DBMS owners?

This was covered in the book chapter, which provides a comparative analysis of the results extracted from the CVE database with the results obtained as a result of the application of the IoTSE-based tool. It is not surprising – the results in most cases are rather complimentary, and one source cannot completely replace the second. This is not only due to scope limitations of both sources – CVE Details cover some databases not covered by ShobeVODSDT, as well as provide insights on more diverse set of vulnerabilities, while not providing the most up-to-date information with a very limited insight on MySQL. At the same time, there are cases when both sources refer to a security-related issue and their frequency, which can be seen as a trend and treated by users respectively taking action to secure the database that definitely do not comply with the “secure by design” principle. This refers to MongoDB, PostgreSQL and Redis.

All in all, it can be said that the answers to some of those questions may seem obvious or expected, however, as our research has shown, firstly, not all of them are obvious to everyone (i.e., there are no secure-by-design databases/data sources, so the data source owner has to think about its security), and, secondly, not all of these “obvious” answers are 100% correct.

All in all, both the talk and these studies show an obvious reality, which, however, is not always visible to the company. While “this may seem surprisingly in light of current advances, the first step that still needs to be taken thinking about date security is to make sure that the database uses the basic security features […] Ignorance or non-awareness can have serious consequences leading to data leakages if these vulnerabilities are exploited. Data security and appropriate database configuration is not only about NoSQL, which is typically considered to be much less secured, but also about RDBMS. This study has shown that RDBMS are also relatively inferior to various types of vulnerabilities. Moreover, there is no “secure by design” database, which is not surprising since absolute security is known to be impossible. However, this does not mean that actions should not be taken to improve it. More precisely, it should be a continuous process consisting of a set of interrelated steps, sometimes referred to as “reveal-prioritize-remediate”. It should be noted that 85% of breaches in 2021 were due to a human factor, with social engineering recognized as the most popular pattern [12]. The reason for this is that even in the case of highly developed and mature data and system protection mechanism (e.g., IDS), the human factor remains very difficult to control. Therefore, education and training of system users regarding digital literacy, as well as the definition, implementation and maintaining security policies and risk management strategy, must complement technical advances.

Or, to put it even simpler, once again: digital literacy “to every home”, cybersecurity is not a one-time event but a continuous process, automation whenever possible, cybersecurity governance, “security first!” principle for all artifacts, processes and ecosystem, and, preferably, “security-by-design” principle whenever and wherever possible. Or, as I concluded the talk – “We have got to start locking that door!” (by Ross, F.R.I.E.N.D.S) before we act as Commando

Big thanks goes to the organizers of the event, esp. to Andris Soroka and sponsors, who supported such a wonderful event – HeadTechnology, ForeScout, LogPoint, DeepInstinct, IT-Harvest, Pentera, GTB Technologies, Stellar Cyber, Appgate, OneSpan, ESET Digital Security, Veriato, Radware, Riseba, Ministry of Defence of Latvia, CERT.LV, Latvijas Sertificēto Persona Datu Aizsardzības Speciālistu Asociācija, Dati Group, Latvijas Kiberpshiloģijas Asociācija, Optimcom, Vidzeme University of Appliced Sciences, Stallion, ITEksperts, Kingston Technology.

P.S. If, considering the topics I typically cover, you are wondering, why I am talking about security this time, let me briefly answer your question. First, for those who knows me better, it is a well-known fact that cybersecurity was my first choice in a big IT world – it was, is and probably remain my passion, although now it is rather a hobby. This was also the central part of my duties in one of my previous workplaces, incl. the one when I worked with the organizer of this event (oh my first honeypot…). Second, but related to the first point, this was the topic, addressing which one of my professors (during the first or the second year of my studies) told me that I must become a researcher (“yes, sure 😀 😀 😀 you must be kidding” was my thought at that point, but I do not laugh on this “ridiculous joke” anymore, and am rather grateful that I was noticed so early and was then constantly reminded about this by other colleagues, which resulted in the current version of me). Third, data quality and open data that I am talking about a lot are all about the value of the data, while two main prerequisites for this are (1) data quality and (2) data security, so, in fact, data security is inevitable component that we must think and talk about.

References:

CFP for The IEEE International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS2023)

On behalf of the organizers (Technical Program Chair, Steering Committee, and finally publicity chair) of the IEEE International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS2023), I am inviting everyone, who is conducting research in this area, to consider submitting the paper to it.

Call for Papers:

New advancements in wireless communication systems such as Fifth-Generation (5G), Beyond Fifth-Generation (B5G), and Sixth-Generation (6G) networks will allow for new and unprecedented services to be made available for users with nearly unlimited capacity. These services will be the core driver for future digital transformation of our cities and communities. This will be accompanied by a ubiquitous deployment of Internet of Things (IoT) infrastructure and supported by computing capacity that will be available at the edge of the network and in the cloud. This computing infrastructure will handle the processing of data generated by users and services. Such a complex and diverse system will require the applications running on the computing/networking infrastructure to be Intelligent, efficient and sustainable. Additionally, the infrastructure will require smart control and automation systems to integrate and manage its different components. Artificial Intelligence (AI) and its applications will play a significant role in the design, deployment, automation, and management of future services. This will include applications that will be running on the edge and on cloud servers, networking applications to handle the flow of data between the users and the computing system, and intelligent automation and management software operating on the system. The International Conference on Intelligent Computing, Networking, and Services is aiming to provide an opportunity to present state of the art research in the intersections of Computing, Networking, and Services that are supported by Artificial Intelligence.

Submission Link: https://easychair.org/conferences/?conf=iccns2023

Researchers from both the industry and academia are encouraged to submit their original research contributions in all major areas, which include, but are not limited to the following main tracks:

💡Track 1: Artificial Intelligence Fundamentals

  • Artificial Intelligent Systems
  • Artificial Intelligent Theory
  • Artificial Intelligent applications in Computers and Communications
  • Artificial Intelligent and Robotics Technologies
  • Artificial Intelligent and cloud computing
  • Artificial Intelligent for Economic paradigms and game theory
  • Machine and Deep Learning of Knowledge
  • Artificial Intelligent based Distributed Knowledge and Processing
  • Artificial Intelligent for Human-Robot Interactions

💡Track 2: Intelligent Internet of Things and Cyber-Physical Systems

  • Intelligent IoT Applications and Services
  • Intelligent security for the Internet of Things and cyber-physical systems
  • Intelligent Internet of Things architectures and protocols
  • Intelligent Cyber Physical Systems (CPS)
  • Blockchain-based application in Intelligent Manufacturing: Industrial Internet of Things,
  • Blockchain and Secure Critical Infrastructure with Industry 4.0
  • Intelligent manufacture and management
  • Consensus and mining algorithms suited for resource-limited IoTs
  • Blockchain-based Controlled mobility and QoS
  • Blockchain-based energy optimization techniques in WSN
  • Blockchain-based Software defined networks

💡Track 3: Edge Intelligence and Federated Learning

  • Distributed and federated machine learning in edge computing
  • Theory and Applications of Edge Intelligence
  • Middleware and runtime systems for Edge Intelligence
  • Programming models compliant with Edge Intelligence
  • Scheduling and resource management for Edge Intelligence
  • Data allocation and application placement strategies for Edge Intelligence
  • Osmotic computing with edge continuum, Microservices and MicroData architectures
  • ML/AI models and algorithms for load balancing
  • Theory and Applications of federated learning
  • Federated learning and privacy-preserving large-scale data analytics
  • MLOps and ML pipelines at edge computing
  • Transfer learning, interactive learning, and Reinforcement Learning for edge computing
  • Modeling and simulation of EI and edge-to-cloud environments
  • Security, privacy, trust, and provenance issues in edge computing
  • Distributed consensus and blockchains at edge architecture
  • Blockchain networking for Edge Computing Architecture
  • Blockchain technology for Edge Computing Security
  • Blockchain-based access controls for Edge-to-cloud continuum
  • Blockchain-enabled solutions for Cloud and Edge/Fog IoT systems
  • Forensic Data Analytics compliant with Edge Intelligence

💡Track 4: Intelligent Networking in Beyond 5G (B5G) and 6G Wireless Communication

  • Intelligent Networking in Beyond 5G/6G Network Architectures
  • large-scale Internet of Things in B5G/6G
  • Vehicular networks in B5G/6G
  • Blockchain with lightweight computation
  • Service and applications for vehicular clouds in B5G/6G
  • Future internet architectures for B5G/6G
  • Intelligent networking services
  • Emerging networks in B5G/6G
  • Byzantine-tolerant FL
  • Churn-tolerant FL
  • FL for NGN and 6G
  • B5G/6G based IoT healthcare systems

💡Track 5: Intelligent Big Data Management and Processing

  • Intelligent Data Fusion
  • Intelligent Analytics and Data mining
  • Intelligent Distributed data management
  • Distributed transaction for blockchain
  • Intelligent Data Science and Data Engineering
  • Protocols for management and processing of data

💡Track 6: Intelligent Security and Privacy

  • Authentication and authorization
  • Applications of blockchain technologies in digital forensic
  • Privacy technologies
  • Blockchain-based threat intelligence and threat analytics techniques
  • Blockchain-based open-source tools
  • Forensics readiness of blockchain technologies
  • Blockchain Attacks on Existing Systems
  • Blockchain Consensus Algorithms
  • Blockchain-based Intrusion Detection/Prevention
  • Security and Privacy in Blockchain and Critical Infrastructure
  • Attacks on Blockchain and Critical Infrastructure
  • Blockchain and Secure Critical Infrastructure with Smart Grid

💡Track 7: Blockchain Research & Applications for Intelligent Networks and Services

  • State-of-the-art of the Blockchain technology and cybersecurity
  • Blockchain-based security solutions of smart cities infrastructures
  • Blockchain in connected and autonomous vehicles (CAV) and ITS)
  • Blockchain Technologies and Methodologies
  • Recent development and emerging trends Blockchain
  • New models, practical solutions and technological advances related to Blockchain
  • Theory of Blockchain in Cybersecurity
  • Applications of blockchain technologies in computer & hardware security
  • Implementation challenges facing blockchain technologies
  • Blockchain in social networking
  • Performance metric design, modeling and evaluation of blockchain systems
  • Network and computing optimization in blockchains
  • Experimental prototyping and testbeds for blockchains
  • Blockchain networking for Edge Computing Architecture
  • Blockchain technology for Edge Computing Security
  • Blockchain-based access controls for Edge-to-cloud continuum
  • Blockchain-enabled solutions for Cloud and Edge/Fog IoT systems
  • Forensic Data Analytics compliant with Edge Intelligence

Two workshops are scheduled to take place as part of ICCNS that you cannot miss, namely:

🗓️🗓️🗓️ IMPORTANT DATES

  • Full paper submission: April 21st, 2023 (Firm and Final)
  • Full paper acceptance notification: May 6th, 2023
  • Full paper camera-ready submission: May 20th, 2023

For any inquiries, please contact: intelligenttechorg@gmail.com.

Submit the paper and meet our team in Valencia in June, 2023!
 

With best wishes,

ICCNS2023 organizers