The importance of Big Data and Cloud Computing

Big Data and Cloud Computing are two key drivers of our technology landscape in the 21st century and complement and amplify the importance and value of each other in many ways.

As the Big Data and Cloud Computing markets are both expected to rise in the next few years, it is vital to explore their role and future in our current economy.

Hence, we have talked to experts in the industry to shed light on this important topic.


Big Data & Cloud Computing

Over the past 15 years, we have seen an unprecedented accelerating build-up of massive volumes of data from every imaginable source of data collection or creation, in the most diverse format.

According to Arman Kamran, CTO of Prima Recon and enterprise scaled Agile Transition coach, the statistics show that we have amassed more data over the past two years than the entire known history of civilizations going back to thousands of years ago!

It is even estimated that in 2020, we created and collected more than 44 Zettabytes of data. That is 44,000,000,000,000 (44 trillion) gigabytes; and that, by 2025, the amount of data created in a single day will go beyond 460 Exabytes (which is sum up to around 168 trillion gigabytes a year).

There are quite a few reasons for this historic technological phenomenon, he continues, the most important contributor being the accelerated drop in the cost of computing and storage, which became possible through Cloud Computing.

In fact, the significant – and continued – drop in the cost of sensing and collecting data, cleaning up and transforming, and storing (or live-streaming) it for processing has enabled many innovative ways in looking at the hidden value of data that could be collected through observation of every trend of human life – with value delivery as its definitive drive for the economy – and has amplified efforts in tapping into more sources of data at a global level.

Kenneth Otabor Irabor, Data Science Consultant at Accenture, explains that Big data is Huge. Not just huge, it encompasses data that comes in all forms and sizes. Although it can be argued that the term Big Data is relative and contextual.

Thus, as Arman and Kenneth point out, Big Data has a set of important attributes that need to be considered:

  1. Volume: which refers to the amount of data that is being accumulated and poses a growing challenge when it comes to transferring, storing, and processing it. Typically, this would be terabytes and above.
  2. Velocity: refers to the speed at which new data is generated and the speed at which data moves around. The advent of IoT has made it possible for many organizations to collect data at an unprecedented rate. Self-driving cars, smart meters, smart home appliances, industrial sensors, etc are generating and sending data at and very fast rate.
  3. Variety: refers to all the structured (like sales data or customer profiles) and unstructured data (like social media posts, emails, voice messages, videos, data coming from telemetry devices on production lines).
  4. Veracity: refers to the provenance or reliability of the data source, its context, and how meaningful it is to the organization. When it comes to big data, quality and accuracy are less controllable.
  5. Value: refers to our ability to turn our data into a valuable asset. It is important that businesses make a case for any attempt to collect and leverage big data.

Hence, as Kenneth says, we can conclude that Big Data, at the very least, means large complex data that is being generated very quickly. Mobile technologies and the Internet of things (IoT) have been the major drivers for big data. If you imagine the amount of data that come from everyone using their mobile phones, it is phenomenally huge. From your location data that is always on and registering all kinds of GPS-related data to your activities on various social media sites, including the posts you like, the adverts you click on, the images and videos you upload. The size is just phenomenal. The data come in in all sorts of formats and sizes.

Santhosh Kumar Bhandari, DevOps Engineer at Atos Syntel, adds that Big Data is defined as terminology for handling and maintaining huge volumes of data that might be structured/unstructured.

He continues by defining Cloud Computing as a technology that deals with maintaining IT resource over the internet on-demand instead of buying, owning, and maintaining physical data centers on-premises. Everything on cloud computing is a pay-per-usage scheme.

Arman also points out that Cloud Computing refers to our ability to process, store and transfer all of our computing and data needs on the “Cloud”.

Indeed, the “Cloud” is essentially a series of data centers – which belong to service providers, like Amazon or Microsoft, or Google – around the world which are connected to the internet and also have their own network of global high-speed communication. They can provide cheap and scalable resources to us, as we need them to be engaged for our processing purposes, and also have the elasticity to reduce down when we need less of their power or capacity.

This is a great advantage when handling Big Data, as it allows us to raise our storage and processing capacity optimized to what is needed for the volume and speed of the data coming into our systems and also when we process the data in order to extract valuable information through a variety of analytical procedures.

Kenneth defines Cloud Computing as a model that allows for pay-per-use access to IT resources. In the traditional model of IT, organizations would typically over-budget and consequently over-provision IT resources. This means a huge capital expenditure (capex) for the organization. The pay-per-use model of the cloud takes away the capex and allows organizations to only deal with the operational expenditure (OPEX).

If you liken cloud computing to how you get power at home, you want to be able to get instant access to electricity and pay for only what you consume. You don’t care if the source of the electricity is hydro, thermal, or gas. You just want to switch on your bulbs, fridges, or TVs and watch them come on. That is how cloud computing works. Your electricity is the IT resources you want. This could be databases, networking, compute, etc. You just want to be able to provision these instantly and pay for only what you consume.

The energy at home is a measured service, every customer can know what the electricity or gas rate is at any given time. Similarly, cloud computing is a measured service. You know the cost associated with running a compute instance or a database server in the cloud. Just as you have certain responsibilities with using electricity, there are also certain responsibilities on your part with cloud computing. For instance, you cannot blame the electricity provider for any damage if you keep your electronics in a water-logged room. Similar responsibilities exist in cloud computing.


Why is it important to businesses?

According to Arman, Data, in general, is the bloodline of business decisions in all market sectors and defines the quality of that decision which will have a direct impact on the market position of any organization.

Hence, Big Data, is data at a massive scale and contains valuable information and insights, waiting to be extracted and analyzed out, and supports the organizations in planning for their next best move in becoming more relevant and successful in value delivery to customers.

Santhosh underlines that data is very useful for any business to run whether it is customer data, employee data, etc. The more data the more business. Indeed, the Big Giants (the IT Industries) in the world are running their businesses using customer data. They generate millions of data in a single day. To process the huge customer’s data, we need big data as an important weapon.

Big Data Analytics uses a set of tools and processes to turn Big Data into strategic assets. They help organizations assess their position in the competitive landscape of their market sector, develop insights about how customers evaluate their products and services, how their competitors are performing, how the market trends and customer demands are moving, and how to plan to respond to them.

Moreover, Arman highlights that Big Data Analytics helps organizations better manage their internal pipeline with higher efficiency and predictability and reduce their costs of production and service provisioning, and to transfer that value to their customer.

Analytical tools using Artificial Intelligence (AI) can use Big Data to provide several predictive insights on how the key factors in the sustainment and growth of an organization shift over time. They can also generate prescriptive sets of actionable recommendations that would put the organization in a more competitive position in the market. Advances in automation have also made it possible to run many of those recommendations where immediate action can prevent cascading damages to the organization’s value streams.

Similarly, Kenneth states that Artificial Intelligence plays a very significant part in transforming the way business is done today. Businesses employ various recommendation systems on e-commerce and movie streaming sites and continue to use predictive analytics to improving customer relationship management and detecting fraudulent transactions in finance. Conversational agents of various sorts are playing an active role in many businesses today. It is safe to assume that over 50% of banks today have deployed a chatbot as part of their business.

Big data is the main driver for all these innovations. Every one of the technologies mentioned above is very data demanding. The quality and performance of these technologies depend on the size and quality of the data used in building them.

These tools also use Big Data to create personalized advertising and product offerings to customers based on the available data on their preferences and service usage history, Arman continues.

Using Big Data, they can recommend products and services to a customer with the highest probability of getting accepted and turning into a sales transaction. They can also track customers’ sentiments and satisfaction rates of the social media channels and use that information to re-align their next product and service offering with what would delight customers the most.

The Big Data market mainly consists of Big Data Solutions, which include Discovery, Analytics, Intelligence, Management, Visualization, and services in Deployment, Integration, Maintenance. Besides, he says, banking and financial services are the largest participants in this market, closely followed by Government and Defense, Telecom, Transportation, Healthcare, Manufacturing, and Retail.


The roles of Big Data & Cloud Computing

For Arman, Big Data is the energy and bloodline to our ability to better understand and serve the dynamics of our digital and always-connected society, and cloud computing is the main provider of the required computing, storage, and processing power that is required to turn that data into actionable insights.

The impact of Big Data through Cloud Computing has created an ongoing “datafication” of all of life and business aspects which thanks to AI-enhanced Analytics, can convert into numerous actions towards re-alignment of service provisioning pipelines, marketing campaigns, customer care initiatives, logistics, and distribution of resources, education, welfare, healthcare, and innovation.

He points out that cloud computing now runs as the backbone of our digital society with an ever-increasing role in serving as the platform and network, where all the connected services of our digital society are provided from and maintained as needed. From personal assistant solutions, like Alexa or Siri, to visiting an online store to browse the products, like Amazon, to looking for something to watch on Netflix or Amazon Prime, to trying to get customer support from a provider, all are now hosted and provisioned by multiple cloud service providers.

Santhosh adds that our generation is partially digital, but the next generation, however, will be completely digitalized (soft data). We are in the stage of moving out on-premise servers to the cloud for easy maintenance and cost-effectiveness. In order to achieve that milestone, we need some technology to support our needs.

‘Cloud computing can be the best solution for this digital world and big data to support the processing.’

Kenneth points out that the notion of digital society suggests one that embraces the integration of IT technologies and services as an integral part of our daily lives, be that at work or at home. The main enablers for a truly digital society are mobile technologies, internet of things technologies, artificial intelligence technologies, big data, and cloud computing.

The role of big data in our digital society will be more of a driver for continuous innovation and monitoring. As we collect, curate, store and analyze more data, we can better understand any existing solutions, and consequently, design new/better solutions. In healthcare, for example, big data will allow better management of hospitals, especially emergency departments. Challenges like triaging, bed allocation, missed appointments, etc will be better understood and addressed. Big data will also allow for quicker diagnosis of medical conditions.

The cloud will provide all the necessary platforms and infrastructure for truly big data analytics. The ability of the cloud to scale on-demand will be invaluable for hosting big data. Being able to quickly provision IT resources on the cloud will also reduce the time-to-market for IT solutions built using big data.

Furthermore, he adds that organizations are already reporting massive gains since migrating to the cloud. Any serious organization that is looking at exploiting the many benefits big data brings must as a matter of fact combine big data and cloud computing as part of their enterprise strategy. A simple analysis of customer interaction on a website by analyzing clickstreams can review so much about a business’s customers and what their preferences are.

This is possible by combining big data and cloud computing technologies.


Big Data & Cloud Computing: a relationship?

Cloud computing provides the required computing, storage, and transfer capacity to collect, create, clean up, transform, and analyze Big Data in a variety of usage scenarios.

Indeed, as Arman emphasizes, cloud computing allows organizations to scale horizontally (scaling up by adding more similar resources) when they need more resources during the peak of their work and to release those resources when the workload drops down, and only pay for what they need, and when they need them. This not only optimizes their spending but also ensures that their service levels are maintained as expected by their customers, regardless of the changes to their workloads.

Moreover, it also lowers the cost of innovations and R&D as no capital investments are needed for exploring an idea and teams can experiment with temporarily allocated resources, in their quest for finding better ways to deliver value to customers. Cloud computing uses the internet and its own global network (backbone) to provide low latency data transfer and data redundancy (replication) where needed.

‘Cloud computing and Big Data are two vital elements in successful organizations in the digital age.’

Big data is huge data that is constantly increasing in size, Kenneth adds. Cloud computing provides a home for that, given its elastic nature. But it goes beyond that. The idea is not to just store big data. Businesses want to mine the huge data and derive values from it. Why store big data in the first place if you wouldn’t derive any form of benefits? The cloud allows for quick prototyping of IT solutions using big data.

Infrastructures including computes, databases, networking, storage, etc needed for working with big data can be provisioned within minutes in the cloud. Big data analytics, artificial intelligence technologies, conversational agents, etc. are some IT solutions that are possible only by truly combining cloud computing and big data.

Besides, Santhosh notes that we use big data to process huge data, and that cloud computing is the best hand for supporting the processing of this data as it provides on-demand services likes storage for structured data, unstructured data. Hence, Big data processing can be quite easily possible using cloud computing as it provides the pre-existing support for big data.

As we have native big data solutions on the cloud, it is easy for any new organization to adopt cloud computing instead of investing in local hardware. He thinks that organizations that already have their existing data centers can implement a hybrid-cloud computing model to ease the solution.

All in all, Arman believes that IT leaders should embrace both Big Data and Cloud Computing. Big Data is a great strategic advantage, waiting to be extracted and processed to the surface, and Cloud Computing is the best available technology solution that can provide us with the most cost-optimized, elastic processing power with the highest possible service availability.

For him, it is no longer a question of “IF” but a question of “HOW” for all organizations in the private and public sectors, to embrace and maximize their benefits from the two!


The benefits…

Arman points out that cloud’s elasticity in cost efficiency while allowing organizations to maintain their quality of services and their availability at a global level, has been a key driver of the growing amount of Big Data that is streaming and uploading into the storage and real-time analytical pipelines hosted and provisioned by Cloud Service Providers.

Solid security measures and the high reliability and availability of these providers have enabled many organizations to migrate to Cloud services and invest in developing their future state with Cloud-Native solutions at a fraction of the cost of owning the infrastructure needed. Besides, the global network of cloud service providers has also enabled low latency and cost optimization in data ingress and egress by using data centers that are the closest to the source of data or consumption of data and analytical outcomes.

According to Santhosh, we always try to implement technology that is scalable and portable. Indeed, the cloud is the one-stop solution for providing such capabilities of scalability, portability, high speed, and efficiency. Thus, having big data on the cloud is always an advantage if we can scale up the environment based on the data – whether it can be horizontal scaling or vertical scaling.

For instance:

  • AWS – Lambda, Amazon Redshift, Amazon Kinesis, Amazon Athena
  • On GCP – Big Query, Dataflow, App engine, Spanner, Dataflow.
  • On Azure – Azure Data Lake, Synapse Analytics, Data Lake Analytics

Kenneth also states that one fundamental benefit is to be able to quickly provision infrastructure necessary for hosting big data. You don’t have to wait months to build a data center, wire up all the networking, order the servers and provide security and access. With cloud computing, all of these can be provisioned in a matter of minutes, without over-provisioning.

The elasticity of the cloud is another benefit for big data. With cloud computing, you don’t have to estimate of the project the extent to which your big data will grow. The cloud is elastic and will continue to grow as your data grows. You can access your data globally over the internet, and cloud computing allows for rapid innovations with Big Data.

…And the drawbacks

Research shows that over 99% of the collected never gets used or even analyzed, and there are several challenges that would contribute to that very high number: Challenges in Data Quality.  

Indeed, Arman underlines the fact that the high volume of big data and its high velocity leads to many problems with the quality of the data that is being collected (or generated). Data may have inaccuracies, missing parts, or have damaged parts which would need a considerable amount of processing and interim storage services to get cleaned up and enriched, and filled to reach the needed quality to serve as an input to our systems.

Here are a few challenges that come with Big Data:

Challenges in Data Transmission

The volume and velocity of Big Data are growing exponentially. A large portion of Big Data is created in real-time. The speed and amount of data that are brought into the solutions to store and process them puts mounting pressure on the systems that are being used and can choke them above their capacity to handle. This can lead to the crashing of pipelines and processes which can create more data corruption and inaccuracies.

Challenges in Data Storage and Recovery

Data gathering (through a collection of sensory data or generated data during interactions) usually involves heterogeneous sources, with a variety of data quality levels, that need cleaning before they can be stored in any data storage and analytical solution. Recovery from the storage sources can also pose a big challenge due to the size of the data that needs to go through the regular and ad-hoc backup process and the growing demand for more capacity to hold data especially when organizations need to maintain them for a certain number of years for record management and regulatory compliance.

Challenges in Data Processing

The extremely large volume of the data that is coming in at high velocity needs to go through a variety of processes, designed for a diverse selection of purposes before it can turn into the asset that organizations can depend on for their decision making. This involves a wide range of transformation and enrichment and then grouping and conversions, tailored for many outcomes.

Providing meaningful information to business leaders, in the form of in-time and predictive insights requires the processing pipelines to address the ever-growing challenges in data processing as they develop to help the organization stay competitive in the market.

Challenges in Data Security and Privacy

As the volume and velocity of data pose the ongoing challenges mentioned before, it also opens the door to a number of challenges in the security of the data that is in transit, getting stored, or going through processing as that work is essentially done through a distributed cloud structure with many areas where the data may get exposed to access attempts from the outside and need to have proper measure to safeguard it through encrypted and secure channels, and have proper access management and encryption while at-rest and also through all stages of processing. Data Privacy is another aspect of Data Security that requires data to be only available to the authorized personnel following certain corporate, regional, and international regulations and compliance guidelines.

Kenneth sees the following challenges around Big Data:

  • Data size, structure, and ingestion: Lots of raw data to store, organize and ingest. This requires a new way of thinking and technology, such as data lakes to help address this challenge.
  • Data Evolution: Big data is constantly evolving. Businesses are constantly rethinking what they capture and how they capture data. The cloud infrastructure must also adapt to this change.
  • Cost: The bigger the data the more expensive it is to store and provision analytic resources to work with them.

To address these challenges, there are best practices that would help organizations in reducing their impact. They would include better data architecture design practices, optimization of data usage (elimination of the need for duplicated processes and interim storages), better processing models (using serverless and containerized solutions and more in-memory processes), and improved data streaming and analytics pipeline designs.

Santhosh also points out that in order to overcome these challenges, we need efficient resources expertise in the fields of big data and cloud computing as well as choosing the right tools as a solution for big data as many organizations fail due to insufficient knowledge of the tools and how to use them.

Overall, he continues, security is a big concern. Any data from the organization is critical and hence, it is extremely important to always give the first preference for security.


The future of Big Data & Cloud Computing

For Arman, both Big Data and Cloud Computing are vital success factors that would allow organizations to not only stay relevant in the highly competitive market landscape of this decade but also to benefit from this wealth of data through the available processing powers to thrive and move ahead of the curve.

Indeed, he believes that the benefit of the combined power of Big Data and Cloud computing is not limited to private sectors, as we see Governments have also been tapping into the strategic advantages of the predictive and prescriptive insights generated from Big Data using the power of Cloud computing over the past several years.

It is expected that Cloud computing will continue to grow from $370 billion in 2020 to over $830 billion by 2025. It also believed that about 90% of organizations are already using Cloud services at some level. The worldwide Big Data market size is expected to grow to $230 billion by 2025 which would show a 60% growth over 5 years from 2020. Over this decade, we can expect China’s Alibaba Cloud to displace Google Cloud as the number three Cloud service provider, behind Microsoft Azure as number two and Amazon Web Services as number one.

Therefore, he states that this decade belongs to those who can harness the power of Cloud computing in the rapid extraction of the value from their Big Data resources into an ever-growing range of insights that would put them ahead of their competition in the market.

Santhosh states that over 2.5 quintillion bytes of data are created every single day and about 1.7 MB of data per second by a single individual. Hence, to process this huge data we need some solution that can handle the data with ease and compute them for storage. He then believes that a combination of big data and cloud computing can be the best possible solution.

According to Kenneth, artificial intelligence is not just the future, but the now. Big data that is currently part of our lives today will continue to shape our future as a species in the near and far future. Cloud Computing will continue to play an enabler to both big data and artificial intelligence.

We are still scratching the surface as far as what is possible with the trio of artificial intelligence, big data, and cloud computing. Thus, the future is very exciting and some of us will be alive to witness the amazing technologies we will create by combining these.


Special thanks to Arman Kamran, Kenneth Otabor Irabor, and Santhosh Kumar Bhandari for their insights on the topic.