Why Cassandra has grown so much in popularity

Due to the rapid growth of Apache Cassandra, DataStax, a hybrid, and multicloud firm, is introducing subscription-based support for open source Cassandra. Results from a recent survey by DB Engines have shown that out of the 350 available, the database management system is the tenth most popular. It has also grown by 252% in the years between 2013 to 2019. Mani Srinivasan, Sr Product Director, Management and Development Solutions, DataStax talks in more detail about the results and what is happening in DevOps this year.

 The results of the survey show Cassandra to be a very popular database management system. Why do you think this is the case?

Cassandra came out of a problem that the team at Facebook had a decade ago. At the time, they were one of the only developer teams facing the problem around scaling. Today, so many more companies have the same issue around scaling up and handling data while keeping their applications available. More developers have to build apps that have to be continuously available to people around the world, and they can’t stop services from running.

Cassandra is built to run at scale and be available across multiple regions, geographies or cloud providers. With developers and CIOs alike all looking at flexible deployments, Cassandra meets a huge need around managing data.

Why do you think people were previously struggling with it so much?

Previously, developers would have the choice of using relational databases to handle data. While these were great for single applications – and indeed they still are good for set use cases – relational databases are not built to scale out or run continuously and cope with downtime or failure events. NoSQL databases were developed to solve those problems, and we have seen more developers adopt NoSQL over time to deal with those problems.

Who does Cassandra benefit the most and how can they get to grips with it?

Cassandra is aimed at all users and developers building modern applications, particularly those that must be fast, resilient and globally distributed.

 What do DevOps teams need to know about Cassandra that is different from other NoSQL databases, and from other types of database?

Cassandra is a wide column store, so it is a cross between a key-value store and a tabular database. It’s built to run across data centers while also providing the ability to ride through failure events, and to deliver speed, scale and reliability.

Cassandra is also different from traditional relational databases when it comes to scaling. A relational database will normally scale based on using more hardware to power the database instance; once it reaches that ceiling, the database instance will need to be sharded or run in a master-slave instance. This affects service reliability and availability over time. Cassandra scales by adding nodes and running in a truly ‘masterless’ fashion – this is a characteristic that really sets Cassandra apart and is what allows it to survive those failure events.

What roadblocks did your DevOps teams overcome last year?

DevOps teams have had a number of challenges to beat this year, from looking at security management issues and operational trends through to data services. Getting rid of these involves more collaboration and cooperation between developers, operations teams and cloud architects.

Developers will have to engage with their operations teams around areas like availability, resiliency, and security for data created by applications. Getting this right will involve looking at how to support these requirements from the start.

 What should DevOps teams looking at microservices be aware of?

Bulk loading / unloading data and connecting a database to a messaging / event streaming system are two of the most common developer tasks at the moment. We have worked on new tooling options for bulk loading data into Cassandra and for integrating with Apache Kafka as part of making Luna available, to make it easier to integrate Cassandra into those microservices applications. For developers, these tools can make it much simpler to get that initial data into Cassandra and keep data flowing between streaming and database infrastructures.

The other thing to bear in mind is the kind of use case that Cassandra is best at – it’s great at high throughput, low latency and storing data at scale across multiple nodes, data centres or even different clouds. For use cases around time-series data, it’s great for handling data at scale. Similarly, it can be very efficient for handling large volumes of data – DevOps teams at companies in the retail, finance and media sectors are using these data sets to deliver better services to customers.

What tech predictions do you see happening in 2020?

Alongside developers taking on more responsibility for areas like availability, resiliency, and security, DevOps teams will have to look at multi-cloud in more detail. Multi-cloud has been on the CIO wishlist for the past year — this has been a way for them to keep control over their companies’ IT strategies even as they adopt public cloud services. However, a lot of these projects have been standalone implementations that were not running across cloud deployments. For some, the ability to have some projects running on public cloud and some internally will be enough to be called ‘multi-cloud’.

For most companies, getting their mission-critical applications running across multiple cloud services independently and at scale will be a challenge that will be met in 2020. DevOps teams will look in more detail at how they can use multi-cloud, what it really means when it comes to data, and how to mix and match specific cloud services alongside cross-cloud database designs. Not everything will have to run across multiple clouds or in hybrid cloud installs, but there will be many teams that have this as a key requirement for the future.