Graph based Data Governance & Data Lineage

Graph based Data Governance & Data Lineage

  |   Blog

Data Governance, Data impact analysis and Data Lineage requirements are becoming ever more complex thanks to growing volumes of data, new types of data (such as Open Data) and new regulations. Traditionally relational technologies, ETL components and integration stacks have struggled to document full end to end lineage and dependencies in a timely and scalable manner.

 

This has led us to develop a new approach to understanding our customers’ data lineage and data governance landscapes using leading graph technologies such as Neo4j and our partners Linkurious and Manta tools.

 

Whilst there are no magic bullets we have made significant progress in modelling and understanding complex data lineage requirements and empirically identifying key components such as the most widely used Critical Data Elements (CDEs).

 

If you are a complex organisation struggling with implementing Data Governance and understanding your Data Lineage please contact us for more information.

Read More
Open Data Cloud

Introduction to open data

  |   Blog

Today the world generates vast quantities of data each day that can be used to enhance the quality of living of virtually anyone in the world. Information is power but also a tool for supporting development, knowledge sharing and social initiatives. Tracking natural disasters, crowdsourcing rainfall data and mapping out the night’s sky are amongst a diverse range of open data initiatives.

 

Three key terms of data are used to describe how available it is to people who wish to access it. There is closed, shared and open data. Closed data is confidential and is not meant to be shared with the public. This can vary from confidential companies’ reports, government security data or any other data that is deemed classified. Open data is readily available to anyone who wishes to access it. Governments and companies have allowed access to various types of data for those wishing to find new solutions to problems that can benefit society. For data to be considered open the owner of the data must specifically state that the data is free to use in any way, shape or form that the user sees fit. In the middle of this spectrum is shared data. Shared data can be accessed and used by specific groups of people, who meet certain criteria, for clear defined purposes. That might include medical data, consumer shopping habits or electoral data.

 

Open data has potential to create tremendous value and has started to be used on a wider scale. It can also have positive economic and social effects. New products and business models are emerging off the back of the Open Data movement. App developers for example are using weather reports to warn people of pollution in specific areas. Traffic data is being used for real-time traffic reporting to ease congestion in urban areas. Government data is being utilised to track how tax income is being spent. Repurposed open data is helping people improve their household energy efficiency and linking property owners with construction companies that can make it happen.  There are many great examples of how open data is already saving lives and changing the way we live and work.

 

However with this growth in access to data sources also comes the challenge of managing the growth in volume and variety.  Many organisations have struggled historically to manage and extract value from their own internal datasets.  Fortunately alongside these trends a rapid development of new tools and techniques has also taken place enabling firms with the right approach to truly leverage internal and external Open Data assets.

 

For more information about efficient and new ways of managing data please contact us.

Read More

Taxonomies – the most under-rated yet critical component of a Data Strategy?

  |   Blog

Originally posted on Linkedin Pulse.

 

For many practitioners implementing data strategies there is a long list of priorities to work through before reaching the backlog item named “Optimise taxonomies”. For many it’s not even on the list. Burning platforms tend to be things sponsors and stakeholders can more easily relate to – poor quality data, excessive time spent finding relevant data, an inability to gain insights from data and so on.  Data Modelling and Semantics requirements in general often receive little attention so its unsurprising that highly specific areas such as Taxonomy Management are often neglected.

 

Taxonomies tend to be associated more with academia and science than profit-seeking organisations and often are an easy target for those wishing to keep the ‘navel gazers’ quiet.  This is somewhat unfair however as most knowledge workers in fact encounter taxonomies a surprising amount in their day to day work.  Even if they are not always called taxonomies.

 

Whether it’s the Data Management team assigning industrial sectors to the company database or the MIS team generating performance reports using customer groupings and product ranges – taxonomies feature more than you think. It’s a natural way in which the human brain organises complex information. Indeed more frequently than not ineffectively managed taxonomies are also a key source of pain for senior managers and C-suite executives too. How often have you heard your CXO’s grumble that comparing sales, costs, margins and risk data across divisions is next to impossible?  A large part of this is down to data quality and definition issues arising from poor taxonomy management.

 

Fortunately optimising your organisation’s taxonomies and leveraging them in rich analysis, search and reporting is actually easier than many would think. It doesn’t have to be a long-term, intensive upfront modelling endeavour consuming lots of resources and involving woolly conversations about what a product or customer is. Using the latest metadata discovery, profiling and taxonomy management tools such as our partner Poolparty it’s surprising how rapidly your taxonomies can be turned from an inconvenience into an asset.

Read More