Building Your IoT Data Strategy – Why Your Database Matters

The market for services and solutions around the Internet of Things (IoT) continues to grow, driven by the big promise of 5G. Gartner lists the top five market opportunities for service providers over the next five years around IoT and 5G as outdoor surveillance, connected cars, fleet telematics devices, in-vehicle toll devices, and emergency services units. Overall, 5G-connected IoT endpoints will more than triple in a year, from 3.5 million units in 2020 to 11.3 million units in 2021. By 2023, the number of connected devices running on 5G will be up to 49 million units.

However, building the systems that will support all these connected devices and all the data they create will need a new approach to database design. Old approaches won’t be able to cope with the sheer volume of updates, or provide value from all that data once it has been gathered.

Here are six key challenges to consider if you are looking at IoT and data.

Time series data needs a new approach to modelling

 Since relational databases were first developed in the 1970s, they have been the primary approach to managing data. While this approach won’t be going away any time soon, relational data models were never designed to cope with the specific requirements that IoT devices will create. When every device can be connected and creating data all the time, that information will have to be handled in a different way.

To make data modelling work for IoT deployments requires a new approach that fits the kind of data being created and stored. Time-series data models are designed so you store data in columns instead of rows. This approach is better suited to managing data sets where events have to be written based on time, then analysed based on when they took place. More importantly, this approach provides better performance for applications that take and consume time-series data. Under the time-series data model, you can capture time-stamped data and track, monitor, and aggregate it. As IoT devices become more common, and use cases for this data grow as well, demand for looking at data in the context of time will increase the need for time-series modelling.

Real-time streaming can provide real-time results

 Connected devices can create data constantly, and that information has to be sent and received. These devices can provide information that may need to be acted on rapidly, rather than relying on analysis in a batch of data hours or days later. This may mean that you need to add streaming data support to your application.

Sensor data is a great example of where data should be streamed for analysis and a potential real-time response. In a sensitive environment such as a water treatment facility, pH monitors can provide real-time readings of acid and alkali levels in various containers so that a technician can ensure they stay within proper ranges. Too far either way, and the result could be costly.

Streaming the sensor data to a central service and using it to automatically alert someone can stop an issue developing into a riskier situation. To support this, you will need a streaming architecture that supports real-time data analysis, with data potentially coming from multiple sources at the same time.

Data tiering can reduce costs

 Data can cost a lot to keep. It’s, therefore, important to make sure you actually need to retain the data rather than holding onto it for the sake of holding on to it. Alongside real-time data, you may have data sets that only have a short shelf life and become essentially worthless. Holding all this data if you don’t need it can, therefore, be wasteful in terms of resources.

A good example here is fleet tracking: for some logistics companies, long term data analysis can show good performance and opportunities to improve routing. However, other firms in industries like construction may tag their vehicles for different reasons. If you have to make sure assets are in the right place during a job, that is a great use case for IoT and connected data; however, that data doesn’t have much relevance once the specific job is completed.

Understanding the value that data can offer – and equally where it can’t – involves looking at databases that support data tiering functionality. As time goes on and certain data sets lose their relevance, they can be stored in different tiers to lower storage costs without having to delete it automatically. Gradually, this data can be moved to long-term storage or deleted, based on your specific requirements.

Hybrid cloud makes scaling easier

 Cloud is becoming more popular with enterprises of all sizes, particularly to cope with the huge amounts of data being created by IoT devices. Moving to hybrid cloud can help you take advantage of the cost-effectiveness of the public cloud, while still retaining control over your approach for the future.

For IoT deployments, this means that data processing can happen closer to where data is being created to improve performance, while it can also be managed in a centralised repository, too. By looking at hybrid cloud, you can solve some of the problems around data management over time in a simpler fashion.

Scaling up will be an important criterion for any IoT application, particularly where data volumes will grow in unpredictable ways. While it might be easy to predict that a ten-fold increase in connected devices will lead to ten times more data, it’s not always going to be the case: Different users may create data in different ways. The cost to manage this data can be expensive if you can’t scale out your database horizontally, but being able to predict how much data you will have to store and flex appropriately over time will make this much easier.

Data replication should be automated

Without multiple copies of data, you run the risk of losing it. This can be painful from an operations perspective, and it can also break compliance rules for specific industries. Data replication allows you to transfer data between nodes.

However, getting this right can be hard. Database replication can be difficult if you are not an expert while getting data replicated from devices to the central database is another headache. This can be automated using the right approaches so that the IoT edge devices can easily replicate their data to a central repository. Getting the same approach to data in place across all your nodes – from individual devices through to edge data centres and on to the central database – can simplify this further, ensuring that your application can run the same way over everything.

Integrating analytics can tell you why things are happening

 To get the most out of your IoT application, you will need some form of analytics. This can help you discover usage patterns and identify weaknesses in devices. Alongside replicating data to a central database for storage and retrieval, it can provide a great source of information for analysis.

Integrating analytics tools for real-time insight or for spotting relationship trends can be a great way to extend the value that your data can deliver.

However, you approach your IoT data, you will need to consider how you manage, store and interrogate that information over time. Looking at your database can help you take advantage of your data and avoid some of the problems around scaling.

With millions of devices due to join the Internet thanks to 5G, the problem won’t be a lack of data – instead, it will be about how to take advantage of this data in the easiest and most productive way for developers.

Written by Patrick Callaghan, Enterprise Architect and Strategic Business Advisor, DataStax