DataOps Archives - DevOps Online North America

Why is DataOps essential to drive business value?

Yashesh Patel — Tue, 15 Jun 2021 10:39:02 +0000

As businesses use more and more data every passing day, it has become vital that new practices and disciplines take over in order to help improve the coordination between the analysis of this data and the overall operation of the enterprise.

This practice is then known as DataOps – Data Operations – and is now of essential value to businesses. Thus, we have talked with experts in the industry to shed light on this ever-growing topic and share with us the significant importance of DataOps.

What is DataOps?

DataOps is an emerging methodology that intertwines DevOps teams with data engineer and data scientist roles so as to provide the necessary tools, processes, and structures to support data-focused organizations.

Simon Trewin, DataOps transformation expert and thought leader at Kinaesis, describes DataOps as the art of driving data-driven projects in continuous integration and continuous deployment cycle. This then enables teams to be able to incrementally build solutions fast, potentially fail fast, and to get those solutions in front of stakeholders to help guide the direction and to build collaboration and momentum.

By carrying out a structured DataOps process, projects can maintain the momentum through changing requirements, which are likely, and through multiple phases and releases. A good DataOps methodology will enable a project team to establish a process knowing how to maintain the velocity of the release train.

Neill Cain, Lead Software Engineer at Craneware, adds that DataOps is foremost about how you as an organization enable the democratization of data – how you capture datasets across the enterprise, how you communicate the governance of that data and how people discover and use that data.

Thereafter, it becomes somewhat similar to DevOps [at least the ops side] in that DataOps focuses on the efficient and cost-effective delivery of that data to consumers.

The difference between DevOps and DataOps, Simon continues, is that DataOps incorporates DevOps processes, however, DataOps is an extension of DevOps to cater to the specific nuances related to data.

These are:

data is continually changing; therefore, you need to build patterns to manage this,
data is owned by the business therefore it is crucial to building collaboration,
data models need to be understood to structure data pipelines and support iteration,
requirements processes need to capture additional elements that software engineers do not need to consider.

The tooling is varied, and each tool has certain performance characteristics that make them specific to problems that potentially change, therefore flexibility is key.

Neill also points out that DevOps is a blend of tooling, culture, and working practices leveraged to deliver software efficiently combined with an agile and metrics-driven approach to incrementally enhancing that software.

The business value of DataOps

DataOps is a step-change in the productivity of your data and analytics projects in line with the changes that DevOps brought.

Indeed, as Simon states, all businesses are keen to be data-driven and to be using the latest AI and ML tools and techniques. Much of this is experimental and evolving, therefore it is important to be able to iterate fast and learn fast and be prepared to test theories and throw them away.

DataOps gives you the agility and the flexibility to be able to move quickly with controls in place. It enables you to establish reusability and consistency around knowledge and information to enable you to support decisions and help you make the right choices rapidly to help with the competition.

Businesses adopt DataOps to remain competitive, Neill underlines. Every organization wishes to deliver quickly to customers as well as evaluate a hypothesis without having to traverse layer upon layer of bureaucracy and red tape.

Moreover, DataOps will position data in front of the business efficiently and effectively. Hence, as Simon notes, DataOps will:

Be responsive to their needs,
Enable them to experiment and gain insight from trusted sources of data to prototype theories rapidly and then transition IT processes,
Provide them with metadata alongside their data to provide transparency and trust,
Help break down silos,
Build collaboration.
Significantly reduce costs.

Whilst it would be a challenge to convey effectively the correlation between DataOps practice adoption and the bottom line, Neill continues, it stands to reason that by introducing agility into the delivery of data together with the adoption of data democratization type initiatives and enabling data discovery, you’ll get to market quicker and be able to answer that all-important question: is this of value to our customers?

If it is, then the business should prosper and be a viable alternative to companies that do not.

DataOps practices

According to Neill, the best DataOps practices are automation, the cataloging of data assets, applying Agile principles to the process, supporting apps, infrastructure, etc. around the delivery of the data, and finally, enabling data discovery and therefore self-serve access, but adding layers of governance as necessary. The key to that is to make it crystal clear what the data governance is i.e., who the data owner is and how you request access to the data.

In order to have the best DataOps practices, Simon adds that you need to:

Define the use case and vision effectively,
Instrument your data pipeline all the way along,
Capture the right metadata,
Build extensibility into your platforms to cope with changing requirements,
Be able to build collaboration,
Govern the projects in a lean and efficient way that adds value to the customer.

Implementing DataOps

Firstly, Simon states, you need to start with a good candidate project and establish what works for you as an organization. You need to bring in experts from the start to either guide you or help you get things off the ground also to educate your teams on how to deliver effectively. Then, you need to build out your tooling and frameworks to support you, much like with DevOps. Build a thin slice of functionality front to back. Learn from it and then extend out.

‘Start small, build incrementally’, Neill emphasizes. You have to first establish a candidate pilot project. Then, you have to ensure there is diversity in project team members and ensure that you have at least one key stakeholder involved.

The benefits of DataOps…

One of the main advantages of DataOps is the facilitation of conversations around leveraging the company’s data assets, Neill points out.

Breaking down any so-called data silos in the company and applying the same tools and approaches to all data. Uniformity comes with multiple benefits: efficiency and reduction in code rot and archaic tools that cannot be supported – ETL tooling built in-house by a developer that left 2 years ago and everyone is terrified of touching, for example.

Another example is an industry-recognized technology that has been over-engineered by one person, on their own PC and has left the company but the company cannot figure out how it works, how to deploy it, or how to support it so they literally kept the former employee’s workstation running for years!

Simon also offers a few benefits of DataOps, which are:

The ability to test out models and analysis in a sandbox environment,
To build collaboration across business and IT teams,
The speed up the cadence of your releases,
Reduces the side effects of changes,
Empowers the IT teams and the business teams,
Helps align responsibilities around data with the right people,
Makes the teams more productive.

…and the challenges

For Neill, the challenges organizations that wish to implement DataOps face can come from being impeded by various data governance policies and data ETL processes and technologies can be siloed in disparate teams.

For instance, if you have a question you want to ask your data, but you cannot access the data and it is “owned” by a centralized team, such as a dedicated BI team. The process is initiated by asking them to ask the question for you and this immediately puts you at the mercy of someone else’s work schedule/load – what if they’re too busy to perform the task, or even worse, on annual leave for a week!

According to Simon, here are some of the biggest challenges of DataOps:

Cultural changes, it is new and therefore different from what people know,
It is not zero cost to set it up,
The tooling, support, and knowledge are not 100% consistent. Some of the practices lack maturity,
Small data gets in the way – Data Processes start with ingestion from sources and normally end with SMEs adjusting at the end of processes. Without full visibility of this, it is not possible to establish accuracy and consistency.
The skills gap, there is a significant knowledge gap in the market and a lack of educational resources around to skill people.

The future of DataOps

Simon believes that DataOps will, like DevOps, become the standard for data and analytics teams.

In the future, the tooling and the support for DataOps will improve and the methods will become more standardized and refined. The approaches will become mainstream. A new generation of data scientists and data engineers will accept it as the way to do things, and thus, this will drive the data-driven world forward rapidly and effectively.

Neill thinks that the forecasted increasing data volumes will warrant the adoption of automation of data ingestion and analysis. Hence, this is where AI/ML solutions will be developed: to do the heavy lifting that users cannot perform with point and click devised and keyboards!

Besides, they will also be diversifying across solutions and building custom code to “glue” them together. There are no silver bullets or multi-purpose tools that can accommodate the needs of an organization, so be prepared to stitch multiple solutions together. Finally, user interfaces will need to evolve to keep pace with the advancing backend technologies. This is vital as a large proportion of data users and experts cannot drive a CLI or write exploratory code, the UI components will need to enable data discovery and exploration.

Special thanks to Simon Trewin and Neill Cain for their insights!

The post Why is DataOps essential to drive business value? appeared first on DevOps Online North America.

Helping DataOps and DevOps teams break through the blockages

Barnaby Dracup — Thu, 28 Feb 2019 12:29:14 +0000

Any organisation looking to truly optimise its big data stack knows it has enemies stacked up against them, throwing blockages in the way of DataOps and DevOps teams. It can feel like climbing a mountain; the fabled golden city of uptime and peak efficiency always beyond another crag, around another boulder.

It’s a tough area, particularly when the IT team and wider business have other conflicting ideas about how to spend that time. Yet it’s a fantastic challenge to grapple with for those who like nothing better than overcoming whatever block that business, technology, or chance throws into their path.

For the DevOps and the DataOps teams operationalising big data and ensuring models and services remain in production there are certain blocks that loom larger than the rest. Here’s how teams can look at these blockages with determination rather than fear.

Let’s get it out of the way first, because when the business talks analytics and big data, the number one, front-of-mind issue, is volume. It’s one of the Vs of big data, and it’s part of why we call it ‘big’ data.

High volumes

The volume of data can be a challenge. Expanding giga, tera, and petabytes need some ‘tin’ to store it in. Everyone knows that the digital society is producing great quantities of data, and the volume is rising day-on-day from digital transactions and records, the internet of things (IoT), and ever-greater digitalisation and the ‘chips-with-everything’ mentality that’s connecting so many new categories of device each year. Even that sentence was high in word volume.

And so, with rising data volumes available, enterprises continue to leverage this resource to create new business value. The trend fuels a plethora of new applications spanning an alphabet soup, from ETL, AI, IoT, and ML, aimed at many business drivers.

These applications need an application performance management solution to meet robust enterprise production requirements as they are deployed on new data platforms.

Then there are the systems that begin to creak or fall over when data volumes start pushing at their technical boundaries. These are often solutions like relational database management systems, or statistics or visualisation software. Many examples of these struggle to manage really big data.

Data applications (data consumers), don’t exist in isolation of the underlying big data stack. Endlessly looking for more storage server space, reconfiguring clusters, and ensuring that databases are optimised is no mean feat for the DevOps and DataOps teams.

Applications are threaded together with many different systems e.g. ETL, Spark, MapReduce, Kafka, Hive, etc. How the stack performs has a direct impact on downstream consumers. Managing applications is highly complex and requires an end-to-end solution, especially to meet SLA agreements.

Stitch those silos together

The barriers and blocks that get in the way of a strong big data stack and a good analytics process go deeper, almost inevitably encompassing the silos most organisations have built up as data pools in departments and teams.

Each guards their information and narrowly optimised processes so that the business grows a complex data estate of co-mingled assets. Each silo is a barrier and a block to a single version of the truth, and a strong analytic process that encompasses the whole organisation. As is said in the context of customer service, ‘it’s easier when there’s one throat to choke’.

In fact, merging data sources and cataloguing data (best-practice aspects in busting those silos) really helps combat another block.

The data pipeline is only as good as its weakest link. Unexplained run-time problems in your applications often occur because one part of the analytics pipeline has changed, moved, been reconfigured, is starved of compute resources etc.

Troubleshooting is time-intensive and complex and configuration variables are a pain to parse through. And when aiming for perfection the wise DataOps person knows that ‘perfect is the enemy of good’.

The automata, your friend

Automation is one big data block busting ally. The DataOps team have some big fish to fry, and only so many pairs of hands with which to work. In fact, there’s a real skills shortage for good big data architects and cloud specialists.

Morgan McKinley, the recruiters, published an IT salary guide showing soaring demand and a massive shortage of available talent in emerging technologies and specialist fields. The dearth of talent has really driven up industry salaries.

A cloud architect could expect £100,000 to £130,000 with the right industry experience on top of their technology acumen.

Given the lack of freely available talent, it makes sense to safeguard the teams you have and allow them to use their skills to best effect by automating the parts of their roles that allow them to focus on their higher skill-set.

In case you were wondering, the hottest data technologies are still SQL, R, Python, Hadoop within Data Science, and Kafka, Scala, Spark underpinned with Java within Data Engineering – according to Morgan McKinley.

Giving the existing DataOps and analytical trouble-shooting talent a hand with some strong application performance management solutions will help them bust through blockages with ease, more easily troubleshooting ands optimising every aspect of the big data stack.

Not only does it make sense to optimise the process for better, faster business results – but given the strength of the DataOps team’s negotiating power in the current skills shortage, it’s wise to keep them happy and allow them to enjoy their jobs, rather than debugging, chasing their own tails all day.

By Kunal Agarwal, CEO, Unravel Data

The post Helping DataOps and DevOps teams break through the blockages appeared first on DevOps Online North America.

MapR Technologies optimises big data infrastructure

Leah Alger — Tue, 17 Oct 2017 13:12:50 +0000

MapR Technologies today announced new managed services for its MapR converged data platform.

The pioneer in delivering one platform for all data, across every cloud, manages the daily operations of its customer’s big data infrastructure.

MapR managed services is designed to enable customers to focus on accelerating greater value from the MapR Platform by turning over day-to-day operations to MapR.

‘Building a wide range of applications’

Customers are able focus on building a wide range of applications from analytics to machine learning and artificial intelligence.

The service includes MapR taking on operational responsibilities including patching, monitoring, optimising performance, SLA management, regular customer reviews and security updates, designed to help companies support workflows based on DataOps.

Thomas Weber, vice president for PaaS and big data at T-Systems, said: “The MapR converged data platform brings a modern data system to enterprises looking to harness the value of all their data.

‘Customised platform and application support’

“We look forward to collaborating with MapR on this new managed service to help take the complexity out of global big data infrastructure operations.

“With the MapR managed services our joint customers will have the agility and uptime to focus resources on the areas that make the most impact on their business.”

Dhiraj Gadia, director for ClearCARE at Emtec, added: “As an Elite SI MapR partner, Emtec is excited to be an integral part of the new managed services offering.

“Our comprehensive suite of services runs the gamut from setting up the infrastructure needed to get MapR up and running, to customised platform and application support, development and training that enables clients to leverage the power of big data.”

Written from press release by Leah Alger

The post MapR Technologies optimises big data infrastructure appeared first on DevOps Online North America.