Author: jphare

Open Data in Government

  |   Blog

Open data has several definitions but our preferred one at Data To Value is from the Open Data Institute‘Open data is data that anyone can access, use and share.’ Simple really but there is a follow-on – ‘For data to be considered ‘open’, it must be published in an accessible format, with a license that permits anyone to access, use and share it’.

 

The growing open data movement is using these principles to make what is traditionally considered internal data more readily available to anyone who wishes to access it, use it and manipulate in any way, shape or form.

 

At national level in 2013 the UK along with the other G8 countries signed up to the Open Data Charter and committed to five key principles:

 

  • Open Data by Default
  • Quality and Quantity
  • Useable by All
  • Releasing Data for Improved Governance
  • Releasing Data for Innovation

 

This level of driver is helping government departments, agencies and local authorities to become more transparent and accountable while enabling tech entrepreneurs to create disruptive technology that benefits society. A 2013 study done by Deloitte estimates that the economic benefit of public sector information is worth around £1.8 billion, with social benefits amounting to £5 billion. The study highlights that the use and re-use of public information helps organisations and individuals in the following ways:

 

  • Fuel innovation – develop new products and services.
  • Increase the accountability of public service providers, improve engagement rates of individuals in the democratic function, increase transparency and help better policymaking.
  • Reduce barriers to entry for markets with information asymmetry.
  • Inform people about the social issues happening around them.

 

Some case studies include:

 

  • Publishing Open Data on cardiac surgery that had positive impacts on mortality rates with an economic value of £400 million p.a.
  • Providing Open Data streams – the clear benefits is evident in many mobile apps, for example tracking congestion zones and helping users find alternative routes or live transport information such as the next bus.
  • The use of live weather data to identify if users are in danger of storms, floods, snow or other hazards has an estimated economic value between £15 million and £58 million p.a..

 

Open Data is beginning to affect us all and it is important to use sound Data Management principles to achieve public trust and to deliver the maximum benefit.

For more information on how Data To Value could help develop your Open Data Strategy click here

Read More

Introduction to data quality

  |   Blog

How many times have you heard managers and colleagues complain about the quality of the data in a particular report, system or database? People often describe poor quality data as unreliable or not trustworthy. Defining exactly what high or low quality data is, why it is a certain quality level and how to manage and improve it is often a trickier task.

 

Within the Data Quality Management community there is a generally held view that the quality of a dataset is dependent on whether it meets defined requirements. Managers often define these requirements as outcomes such as higher sales, lower costs or less defects. Whilst this is important however it doesn’t help practitioners at the coalface to codify rules and other tests designed to measure the quality of a dataset.  For this a more specific definition of requirements such as completeness or uniqueness levels is required. An example requirement statement for example could be “All clients should have a name and address populated in our CRM system”.

 

 

Measuring Data Quality

 

Data Quality dimensions are often used by practitioners to generically group different types of tests that typically span different project requirements. Whilst there is some disagreement on the number of dimensions and the terms used for these many practitioners use definitions such as the below:

 

  • Completeness – requires that a particular column, element or class of data is populated and does not feature null values or values in place of nulls (e.g. N/As).

 

  • Consistency – something that tests whether one fact is consistent with another e.g. gender and title in a CRM database.

 

  • Uniqueness – are all the entities or attributes within a dataset unique?

 

  • Integrity – are all the relationships populated for a particular entity – for example its parent or child entities?

 

  • Conformity – does the data conform to the right conventions and standards. For example a value may be correct but follow the wrong format or recognised standard.

 

  • Accuracy – the hardest dimension to test for as this often requires some kind of manual checking by a Subject Matter Expert (SME).

 

Dimensions are often used not only as a checklist to check that the best mix of rules have been implemented to test the quality of a dataset, they are also often used for aggregating data quality scores for tracking trends and MIS. Many more complex measurement methods also exist which help to translate individual pass/fail results into more business friendly cost, risk and revenue calculations.

 

 

Improving Data Quality

 

A different set of skills and tools are often used for improving data quality after it has been measured. A good Data Quality Analyst tends to exhibit a mix of skills typically found in Data Analysts, Data Scientists and Business Analysts amongst others. At a strategic level a good understanding of corporate culture, architecture, technology and other factors is often important.  However a number of essential technical skills are also required when dealing with the data itself. These include parsing, standardising, record linkage/matching, data scrubbing/cleansing, data profiling and data auditing/monitoring. These skills are often extensively used when conducting projects such as data migrations where data quality improvements need to be achieved in tight timescales.

 

Parsing

Parsing is the process of analysing data and determining if a string of data conforms to one or few main patterns. Parsing is fairly easy to automate if a dataset has a recognisable or predictable format.

 

Standardising

When the main formats are recognised and parsing is complete the next step is to standardise the dataset. This is done by correcting the data in a pre-defined way that is consistent and clear throughout the whole dataset.

 

Record linkage/matching

Record linkage or matching describes a process of identifying and linking duplicate records that refer to the same real world entity, but may not be completely identical in the datasets. For instance having the same product entered as “Leather chair – black” and “Chair, Blk. – Leather”.

 

Data scrubbing/cleansing

Describes the process of amending or removing data that is incorrect, incomplete, improperly formatted and or duplicated. Typically a software tool uses rules and algorithms to amend specific types of mistakes saving the data quality professional a significant amount of time.

 

Data profiling, auditing and monitoring

Data profiling is the process of analysing and gathering information about the data. This information can be used for specific data quality metrics and help determine if the metadata accurately describes the source data. Data profiling is one of the main tools used for data auditing, it helps assessing the fit of data for a specific purpose, which then in turn ties in with long term data monitoring that helps prevent serious issues.

 

 

Hopefully the above article gives a flavour of some of the skills and techniques involved in Data Quality Management. For more in depth coaching please do consider our upcoming Data Quality Management fundamentals course in London this March.

Read More