What is the Challenge?
Successful enterprise data processing systems have a tendency to grow out of their well managed walled-gardens. This happens because they need additional resources and tools found outside the garden, e.g. offered by a Cloud provider, or because they start to connect into other enterprise systems. Traditional metadata catalogs do not scale well to handle this organic growth, depending on static and predefined processes to manage and curate data within some well defined perimeter. This makes enterprise data management an intractable problem as the data is constantly been moved and processed between these heterogeneous systems by a large number of independently managed tools.
Significant value can be enabled by gathering, linking and enriching metadata at enterprise scale allowing cost reduction through, for example identifying cold data, removing unnecessary data duplication or better scheduling of workflows. New value added services can be introduced with the guarantee that sensitive data is being used in the way it is intended and being properly managed and protected.
This is analogous to the ways in which enterprises have unlocked hidden value in their data through big data processing. The processing of heterogeneous metadata at scale is in fact a type of big data problem with the same classic 5 Vs of: velocity, value, variety, volume and veracity.
Pathfinder data-centric enterprise metadata management

Automated discovery and metadata collection
Extensive heterogeneous metadata sources
Metadata enriched and linked creating Data Map
Event-based notification of key changes
Supporting a wide range of data management applications
Pathfinder
Pathfinder is an event-based system that dynamically collects metadata from data catalogs, data stores (databases, file systems) and systems that transform and process data. The metadata is linked, enriched and stored in data maps tailored for data-management applications.
Introducing the Enterprise Data Map
The metadata is extracted from source systems into a location where is is gathered, linked and enriched.
This can be thought of as analogous to a data lake in a big data system.
As most of the value in the metadata is gained through understanding the complex relationships between entities this is stored as a graph that allows the entire data of the enterprise to be mapped, such that questions about where data is stored, how are those storage system protected, where does data flow can be answered at the level of the enterprise rather than a single isolated processing platform. Metadata stored in the graph can be enriched just as in a classic data lake by multiple independent processes that can add new relationship and entities that were not in the raw data, e.g. that data pipelines resemble each other, which can possibly be merged enabling cost saving, or data at a certain classification is not being properly handled, thus exposing the company open to possible litigation. The map is open and scalable meaning that advanced machine learning techniques can be brought to bare to extract latent information hidden in the complexity.
We enable the creation of the Enterprise Data Map with a system called Pathfinder.
How Does Pathfinder Work?
We are using a new event-based approach to data management which discovers what is actually occurring on the systems. Metadata is collected from heterogeneous data processing and storage systems, distributed through streaming and combined and analyzed via specific data management enrichment processes. The metadata is extracted from sources by collectors, serialized into a graph of entity/relationships and stored as a sequence of events in a change log. The events are generated and propagated in real-time. The enrichment processes consume these events and then write new metadata back to the change log. The compacted log contains the canonical set of metadata for the enterprise and its evolution over time. It can be materialized in different processing systems for specific usages.
Metadata Lake Architecture

Pathfinder Compliance Scenario
The screenshots made with the Pathfinder GUI and explanations illustrate how having a set of linked, enriched metadata can provide insight into complex data compliance scenario.