Data Management is as old as civilization and was born the day the first human decided to keep records of important information many thousands of years ago. As the value of data became better known and more data was gathered on many topics, the need for finding and establishing approaches in managing, warehousing, and value extraction from it became more prevalent.
Marking clay tablets, scratching wax boards, writing with charcoal, using ink on papyrus, and finally printing on paper laid out the slow and painful evolutionary path of data management, taking millenniums to arrive at the age of computers during the 1950s. It would still take a few more decades before computer storage became cheap and spacious enough to be used by businesses as part of their data management system.
During the early 1980s, we witnessed the start of the decades-long rapid rise in the capacity of computer storage and their continued drop in costs. Fortunate to all of us, today, that trend is stronger than ever before and shows no signs of stopping in near future.
As computerized data management became the next evident step, many new approaches were put to trials and errors and as the computing architecture design evolved for software solutions, data management also went through several major changes.
Enter Data Fabric
A Data Fabric is an enterprise-wide data integration architecture and its wide range of data services that democratize data engineering, analytics, and other data services across a choice of endpoints spanning hybrid multi-cloud environments. Data Fabric standardizes data management practices and usages across the cloud, on-premises, and edge devices.
Data Fabric is an Augmented Data Management architecture that can optimize access to distributed data and intelligently process and prepare it for self-service delivery to data users just in time, regardless of where the data is stored.
A Data Fabric architecture is designed to be agnostic to data environments, data processes, data use, and geo-location of data repositories while integrating it with the core and supplementary data management capabilities. It provides a connective layer between data endpoints, enabling the complete range of data management capabilities (such as cross-silo integration, static and active Metadata discovery, governance, processing, and orchestration).
IBM predicts that using Data Fabric can lead up to a 158% increase in ROI and can reduce ETL requests by up to 65%. Moreover, it will enable self-serve data consumption and collaboration. Data Fabric’s Active Metadata will help automate governance, data protection, and security and it will augment data integration and automate data engineering tasks.
Data Fabric, which according to Gartner, is one of the key technology trends of this decade – is essentially a design concept that serves as an organization-wide integration layer of data and its needed connectivity and processing requirements that encompasses the existing and upcoming data’s Metadata information.
It virtually weaves a live and vibrant “Fabric” out of the threads of “Data”, enriched with the color of their Metadata, visualizing their patterns and symmetries.
Data Fabric is the most recent response to the challenges of data management, like the high cost of maintenance alongside the low efficiency of data value extractions, lack of continuous analytics or event-driven sharing, and connectivity issues with distributed storage. Data Fabric breaks down the in-efficient siloed structure of data by identifying how existing use cases establish usage patterns that replicate the way the business is done. It saves the organization from the never-ending and ever-increasing cost of merging, emerging, and redeploying silos with new data.
Data Fabric taps into both human and machine capabilities in using data processes to read, capture, integrate and deliver data based on the user, context of usage, and prevalent and changing usage patterns. It continuously identifies and weaves data from disparate sources to discover unique, and relevant relationships – in the form of Metadata – between the available data points.
Data Fabric establishes the Augmented Data Management using a system based on Metadata recognition and analysis. One of its core competencies is to reduce the effort of the manual work by raising the efficiency in consuming the Metadata that the organization’s lines of business generate continuously as the organization moves forward in the market.
It is estimated that by 2024, Data Fabric deployments will quadruple efficiency in data utilization while cutting human-driven data management tasks in half.
Metadata as the Bloodline for the Data Fabric
Data Fabric dynamically uses both the available – existing – Metadata as well the Metadata that is gathered through the discovery of changes in the data, usage, and even the changes in usage patterns. The Metadata that explains these changes is never readily available and has to be discovered from logs, exports, query processes, use of views, and even security access logs.
We cannot implement Metadata by limiting the scope to only collecting and analyzing it in a static manner. Data Fabric thrives through Active Metadata practice by receiving and tracking live updates, which means it would be insufficient to only gather Metadata from existing data storage and platforms.
A Data Catalog or Data Marketplace can serve as an anchor point for gathering Metadata. Many organizations already have implemented Data Catalogs to establish Self-Service – aka Democratized – Data Discovery and Analytics.
Data Fabric can begin with mere observation and processing of Metadata without having to impact, change or deviate from the existing processes. The improvements and optimizations can be planned and implemented later as incremental and evolutionary changes.
Metadata Activation is done through analysis of existing Metadata in combination with the discovered Metadata and running pattern analysis with the intention of identifying the areas in processes and practices where improvement can be implemented.
“Smartifying” Data Fabric
Machine Learning (ML) and Graph Analysis (GA) form the smart core of the Metadata discovery through running deep data analysis on existing and newly arriving data. They can profile the new data, extract value and structure and run comparisons with the available information to look for similarities and then map users and their usage patterns to that data.
Using ML enhances Data Fabric with learning models that can train by observing the functions of data engineers, enabling them to prescribe the best next actions and use the outcome to calibrate the next decisions into a continuous improvement practice. This serves as the backbone of an Augmented Data Management solution that can free up human data workers to focus their attention on innovation and enhancing value delivery to customers. This solution automates repetitive tasks such as discovering and profiling data, ETL and schema alignment, and data integration.
Data Fabric’s use of ML in analyzing content and design alignment of the Metadata moves it from static to active state which can then deliver pattern-based outcomes. This also supports an ongoing increase in the number of data assets and data service users in a wide spectrum (from the reuse of data assets to the rapid ingestion of data from public sources and business partners).
Data Fabric’s use of ML also helps with transparency, traceability, and provability of Metadata analytics, which supports the human data workers in their continuous exploratory work in creating more customer values. Its ability to mark up the differences between the expected data patterns versus the actual patterns that are discovered (aka pattern-based deviation detection), allows for its automatic data failure response protocols to step in and take care of most data problems without the need for human intervention.
Data Fabric being “smart” means that it also gets better over time as it learns from humans how to fix the problems it could not tackle before. This makes the Data Fabric more efficient and resourceful as it grows its understanding of the data (and its Metadata) through continued exposure to the patterns through time. The direct outcome would be carving away the worries of human data workers and freeing up more of their time for innovative work in value delivery as Data Fabric continues to learn and practice more.
Human data workers will adjust Data Fabric Augmented Data Management’s understanding in the areas that are open for automation, and the areas that would still need human supervision (for security compliance or other reasons).
Where to begin from?
Like any new solution design, having a strong technology base to build upon, understanding the Current Mode of Operation (CMO) of the Data Management in and across the data siloes, identifying the existing gaps and pain points and the expected target states, in a phased evolutionary change roadmap, would be a good approach.
Creating a deep understanding of where we are and what we are trying to achieve will help us better envision what needs to be achieved at the completion of each step towards having a fully functional, enterprise-wide Data Fabric solution.
Since Data Fabric has a deep dependency on contextual information, we need an integrated pool of Metadata to help the newly established solution to identify, connect, and analyze all kinds of Metadata such as Business, Operational, Technical and more.
Creating Graph Models by analyzing the Metadata in a continuous model would allow for the continuous data sharing and democratization of data processing and provisioning functions. Graph Models provide the needed visualization and enrichments through adding Semantics for easier sharing and understanding of data across the enterprise.
The Semantic enrichment of the Metadata makes it more insightful and helps with its interpretation into practical knowledge. It enhances the value delivery through data usage and helps the ML algorithms train on relevant, up-to-date, and rich data for more relevancy and accuracy.
Frictionless technology integration and seamless data portability are two other vital factors in the successful implementation of a Data Fabric solution. They will ensure an easy flow of usable data across siloes and in and out of the enterprise as needed. Technical compatibility with the common data curation, processing, transformation, aggregation, streaming, replication, and virtualization is extremely important and can make or break the entire transformation efforts.
Using Metadata analytics to map out the existing data usage patterns, and then measuring the alignment – or misalignment – to the initial intent within the lines of business using them, in addition to identifying the rate of introducing new data assets to the organization, can also help us better plan the needed transformation roadmap and make a better selection of the heated areas that can benefit from the pressure relief that Data Fabric can introduce to them.
From the bottom-up the Data Fabric solution stack has several layers, which would include – but not be limited to – the following:
- Data Sources, are the data repositories or streams of data.
- ML-Enhanced Metadata Management, Orchestration, and Data Cataloguing, where the Meta information about the Data from the Data sources are identified, enriched and gathered.
- Master and Reference Data Management, which serves as the storage and distribution point of best-fit data for the variety of uses by the consumers.
- Data Integration, where the data curation, processing, normalization, transformation, virtualization, and analytical services can be done.
- Data Governance and Standards, which enforce the policies and access control.
- Data delivery, where several data connectivity and transport protocols and interfaces can work in parallel to deliver the data to the consumers (which in turn can be serve-service analytics, business intelligence solutions, data science pipelines, and all types of dashboards)
Starting with Quick-Wins through Pilot/POC projects would allow their teams to get experience with the transformation process, learn about the existing hurdles, and fine-tune their approach for escalation towards larger implementation work. Starting with one data asset or domain-oriented prototype that can test our ability in integrating currently disparate data from a small number of sources (preferably across a few areas of business operations) would be a great learning experience for the transformation teams, and lowers the risk of disrupting the existing data management system during the transformation.
As the Data Fabric’s Augmented Data Management comes into pragmatic use and starts the discovery work across the existing siloes, leadership will run into many unused data assets that were covered under layers of integration issues and data format incompatibilities. They should take this opportunity and try to learn more about the hidden layers of their own business by digging through these data, seeking valuable insights about the existing flow of data and its value delivery impact across the pipelines.
Caveats
Since Data Fabric brings a fundamental change to the organizations’ data management and the methodology behind their orchestration and the integration of the tools that supports them, care must be taken to implement it through an incremental evolutionary shift.
While a System Thinking approach is required to ensure we are not creating segregated patches of Data Fabric islands, unable to establish the data flow and democratization of data service in the enterprise, leadership should also avoid pushing for hasty large-scale changes without proper impact analysis, especially when the organization already has a patchwork of legacy and modern tools and repositories that are maintained through painful manual work by the humans.
Data Fabric implementation demands replacing tools and platforms that are unable to share their Metadata with the rest of the data management system; this is a delicate move for well-rooted organizations that have been a participant in the data management evolution of past several decades, and have ended up with a number of old-tech legacy data siloes that need to be reverse-engineered for data migration before they can participate in a phase-out roadmap and be replaced with newer solutions.
One key success factor of a Data Fabric implementation would be its ability to adapt to existing data management platforms and master data and data governance practices without having to go for a one-shot, all-or-nothing flip-over.
Conclusion
According to Gartner, by 2024, 75% of organizations will have established a centralized data and analytics (D&A) center of excellence to support federated D&A initiatives and prevent enterprise failure. This is while through 2025, 80% of organizations seeking to scale digital business will fail because they do not take a modern approach to data and analytics governance.
Gartner also believes that by 2024, organizations that utilize active Metadata to enrich and deliver a dynamic
Data Fabric will reduce time to integrated data delivery by 50% and improve the productivity of data teams by 20%.
Data Fabrics’ Augmented Data Management, now enhanced and empowered by expanding adoption of Public Cloud service have been ramping up as the best answer to the looming data management trouble that is spreading its shadow in all market sectors, not to mention that it brings a definitive technological advantage to the organizations who succeed in its proper and well-structured implementation.
Data Fabric allows the organization to maximize its value recognition from the data that it has and to raise the ROI on the data that it receives from the market and leverage that into stronger, differentiating insights in support of better business agility and customer value delivery.
Article written by Arman Kamran, CTO of Prima Recon, Professor of Transformative Technologies, Advisory Board Member to The Harvard Business Review, and Enterprise Transition Expert in Scaled Agile Digital Transformation