Using Graph databases for data lineage

Data To Value Manta Flowchart

Using Graph databases for data lineage

  |   Blog

Graph databases have come of age and are being applied to many use cases outside of traditional requirements now such as infrastructure analysis and social networking.  Optimised for storing relationships between things, or nodes as they are often called, many companies are unaware of the scope to apply this technology to their day to day problems.  Many areas once viewed as very challenging to model using traditional tools such as Data Lineage, Enterprise Architecture and dependency analysis are now much easier to model and analyse. This enables rich reporting for business stakeholders underpinned by robust and empirical analysis.

 

One area we have had great success is analysing data lineage, particularly within complex Data Warehouse and Data Integration stacks where often the linkages between data sources and dashboard elements / report columns can be buried beneath transformations, mappings and flows. Our software partner Manta Tools (powered by Titan DB) enables users to rapidly understand the relationships hidden in their Oracle, Teradata and Informatica stacks. This saves a great deal of time and cost when maintaining your Data Warehouse and implementing changes or upgrades.

 

Please do get in touch if you would like to know more about how this latest generation of graph tools can help your organisation.

Read More

Introducing the Data To Value Lean Data Process

  |   Blog

Here at Data To Value we have pooled together the many years of experience of our partners, consultants and affiliates and taken what works best to create an iterative and agile data development methodology. This work has evolved into the Lean Data Process and we will be building out each of the components over the coming months as we continue to apply the principles, tools and techniques to real problems that our customer’s experience.

 

The overall approach is represented by the Lean Data Framework which encapsulates the data development and management life-cycle. All data projects should be driven by business need and the start of the process is always the business requirement or the business problem to be addressed.

The top and bottom of the stack below are business owned domains. These are supported by the middle two domains which are specialist IT domains where the business requirements are satisfied and the business problems are solved using technology.

 

We also recognise that the world of data and information is changing rapidly. New technologies for data management are coming on the scene in quick-fire succession. The Lean Data Framework covers all types of data, structured, semi-structured and unstructured; be it stored in databases, files, email, website, content stores or wherever it needs to be understood, used, tagged and accessed.

 

Lean Data Framework

 

The Lean Data Process uses a Build, Measure, Learn cycle to create a continuous development environment geared to delivering rapid business benefit. A very popular engagement is our Lean Data Quality service. Another typical application is the creation of an enterprise wide information model. This and other components from the overall process will be described in more detail in future posts.

 

To achieve accelerated delivery we leverage partnerships with innovative software tool vendors. Each tool  support specific areas of the process. The benefit to our customers is that following a consulting engagement they will be left with real collateral and not merely powerpoint slides.

 

 

Lean Data Tool Accelerators

Each tool has specific capabilities and components that support that capability. We have demonstrations of each tool and how it fits within the Lean Data Process which will be shared in future posts. If you want to find out more about the process and the tools contact us here

 

 

Lean Data Framework Tools Stack

 

Semanta Encyclopaedia

 

Experian Pandora

 

Manta Tools

 

poolparty thesaurus server

 

poolparty Semantic Integrator   poolparty Power Tagging   poolparty Web Mining

 

 

 

 

Read More

The future is Lean Data not Big Data

  |   Blog

Big Data has dominated the headlines for some time in technology and business media. In recent years it has also become much more widely discussed in the general media as innovations spread from traditional roots in online retail and social media to other sectors such as Healthcare, Government, Manufacturing and Finance. Research by Deloitte suggested during 2012 90% of Fortune 500 companies would pursue Big Data projects. Undoubtedly there are numerous, indisputable and powerful use cases demonstrating the social, environmental and economic benefits of leveraging Big Data. Is following the trend of hoarding data with the aim of finding nuggets of insight the right approach however?

 

Whilst many commentators refer to Data as the new oil it has been a key corporate asset for some time, decades in fact. The first Data Warehouse was conceived in the 1960s and as relational databases grew in popularity in the 1980s so did corporate data volumes. This growth led to control challenges and thus in response a number of organisations and frameworks formed to bring structure, organisation and planning to Data Management. A variety of sectors began to adopt the principles espoused by organisations such as DAMA and BCS and frameworks such as Zachman and TOGAF. Many of these approaches focused on waste and a belief that ‘less is more’. Minimising storage of data, movement of data, cleansing of data and processing of data was the objective. Thus Big Data to an extent represents an antithesis for these traditional principles.

 

We are now in the era of feast, not famine. Falling processing and storage costs, a range of new technologies and new approaches including NoSQL databases, Machine Learning and Neuro-Linguistic Programming have enabled many organisations’ data volumes increase without causing meltdowns. If anything many firms such as Facebook, Google and Ebay have demonstrated that accumulating huge volumes of data can be immensely valuable and yield previously undiscoverable insights. Clearly many facets of Big Data are quite distinct to traditional Data Management tools and techniques. In fact the latest data analysis techniques are so different to historical techniques that many herald the triumph of Data driven decision making over gut instinct. Does this however mean the Big Data blueprint should be embraced by all types of organisations and that greater data storage and processing is the answer? Not quite.

 

Whilst Data Management costs have fallen as computer processor and storage technologies have advanced this was driven by historical requirements for data. One could argue that these falling costs have not only been a significant driver but also a result of Big Data innovation. Many studies using cost per gigabyte measures also show storage cost decreases have slowed over the last 6 years. Some even believe storage costs will begin to increase again. Thus in many ways Big Data’s growth shares characteristics with other societal trends such as car ownership and road congestion. Historically it was believed that by building more roads congestion would decrease. In fact the reverse has happened as the cost to drivers of using the road network has effectively fallen leading to more drivers making more journeys. We are still developing Big Data techniques and clearly powerful innovations are still being developed. One trend however is clear – the genie is out of the bottle and it is likely that future infrastructure cost reductions will be rapidly consumed by new data requirements.

 

Big Data isn’t solely about storage however, it’s also about processing and interpreting data. One major constraint that is often overlooked in this area is the growing problem of Information Overload and our limitations as humans to absorb and process growing volumes of data. Many studies have found worrying trends to support this such as attention spans shortening and IQ scores falling, particularly in highly developed economies. Within the Big Data domain inference algorithms, Natural Language Processing and other semantic technologies in particular are reducing the requirement for detailed human decision making. How likely it is that organisations will increasingly delegate decision making to machines however depends on society’s ability to progress related issues such as increasing democracy in decision making, privacy, data security and control. Indeed with eminent thought leaders such as Elon Musk and Stephen Hawking amongst those voicing concerns about Artificial Intelligence developments it’s unlikely that the need for considerable human oversight will disappear any time soon.

 

So what’s the solution? It appears organisations may be dammed if they fully embrace Big Data, but also dammed if they don’t. One movement that has grown in popularity in recent years but retained its traditional roots may be able to help. Lean as a philosophy has been around for some time, originating in manufacturing. Recently it’s been very successfully applied to areas as diverse as Change Management, start-ups and project delivery. It uses a number of techniques and principles to focus on reducing waste – any activity that doesn’t directly create value for customers. Whether these customers are internal stakeholders or external clients. Applying this framework to Information Management has proven to yield some interesting results which go some way to helping organisations decide on what level of Big Data adoption is right for them. A central premise is the rule that every piece of Data within an organisation’s Information landscape should in some way be linked to creating value for the end customer and the organisation’s objectives – whether this is revenue maximisation, cost reduction or something else. Whilst this is easier said than done there are a number of Lean Information Management techniques organisations can employ.

 

Using automated discovery techniques it’s now possible to classify, catalogue, model and define an organisation’s data assets more rapidly and thoroughly than ever before. Metadata discovery, Profiling and Semantic technologies in particular are becoming much more usable and cost effective. Not only does this reduce time spent finding data, which can be as much as 25% of an employee’s day, it also aids data security, archiving and deletion strategies. Modelling your Data Architecture and Data Management practices is also invaluable for understanding whether data travels between producers and consumers via the shortest path. For Lean this modelling generates valuable metrics to help steer its hypothesis driven approach to Data Strategies. Time is also a key form of waste and thus a key principle is also to only collate data for decision making when absolutely necessary. Not all decisions need to be made using hard fought empirical evidence, sometimes common sense and trust is enough.

 

If your organisation is considering major investments in Big Data its worth considering whether these key concerns and principles are addressed. If not perhaps it’s time to consider Lean Data as an alternative.

Read More