About Us
Our Approach
Services
Projects
News
Contact Us
 

Introduction to open data

Open Data Cloud

Introduction to open data

  |   Blog

Today the world generates vast quantities of data each day that can be used to enhance the quality of living of virtually anyone in the world. Information is power but also a tool for supporting development, knowledge sharing and social initiatives. Tracking natural disasters, crowdsourcing rainfall data and mapping out the night’s sky are amongst a diverse range of open data initiatives.

 

Three key terms of data are used to describe how available it is to people who wish to access it. There is closed, shared and open data. Closed data is confidential and is not meant to be shared with the public. This can vary from confidential companies’ reports, government security data or any other data that is deemed classified. Open data is readily available to anyone who wishes to access it. Governments and companies have allowed access to various types of data for those wishing to find new solutions to problems that can benefit society. For data to be considered open the owner of the data must specifically state that the data is free to use in any way, shape or form that the user sees fit. In the middle of this spectrum is shared data. Shared data can be accessed and used by specific groups of people, who meet certain criteria, for clear defined purposes. That might include medical data, consumer shopping habits or electoral data.

 

Open data has potential to create tremendous value and has started to be used on a wider scale. It can also have positive economic and social effects. New products and business models are emerging off the back of the Open Data movement. App developers for example are using weather reports to warn people of pollution in specific areas. Traffic data is being used for real-time traffic reporting to ease congestion in urban areas. Government data is being utilised to track how tax income is being spent. Repurposed open data is helping people improve their household energy efficiency and linking property owners with construction companies that can make it happen.  There are many great examples of how open data is already saving lives and changing the way we live and work.

 

However with this growth in access to data sources also comes the challenge of managing the growth in volume and variety.  Many organisations have struggled historically to manage and extract value from their own internal datasets.  Fortunately alongside these trends a rapid development of new tools and techniques has also taken place enabling firms with the right approach to truly leverage internal and external Open Data assets.

 

For more information about efficient and new ways of managing data please contact us.

Read More
Regulatory impact chart banks and clients

Client regulatory data management challenges and opportunities

  |   Blog

Stricter rules are being imposed on the financial sector in a variety of areas following the financial crisis. Regulators are more than ever involved in scrutinising Know Your Customer (KYC) processes and are large fines for non-compliant firms. Anti-money laundering (AML) practices have become deeper and more advanced to counter the financing of terrorism, corruption and crime. EMIR and Dodd Frank legislations were also put into place with the aim of improving transparency and reducing systemic risk in derivatives trading. To address tax avoidance, regulations such as The Foreign Account Tax Compliance Act (FATCA) have also come into play enforcing yearly reporting for US persons. All of these reforms have significantly increased the complexity of procedures financial services firms have to adhere to and the data requirements they have to fulfil.

 

With the increasing volume and scope of regulations firms are struggling to comply in many areas due to a variety of reasons.  The use of legacy technology and siloed working practices are often highlighted as key barriers. Organisations need to address their data management issues not only to minimise the risk of regulatory sanctions but also to improve competitiveness in an environment where margins are often tightening due to new forms of competition. The smartest firms are using this regulatory impetus to develop a new competitive edge.  Many banks for example are using KYC to understand their customers and their behaviours as intimately as retailers have known theirs for some time employing demographic profiling and other techniques.

 

 

Level of regulatory impact on firms and clients

Regulatory impact chart banks and clients

 

Regulations are often amended and revisited meaning that firms need to use flexible tools and develop an adaptable architecture that can scale and meet these challenges. Historically companies adopted a siloed approach for the collection and delivery of data in client on-boarding. Different reporting and compliance teams, many touchpoints between divisions and disparate data sources and unintegrated technologies all make it harder to ensure regulatory compliance.

 

Fortunately a new generation of technologies and approaches designed to deal with these barriers are making compliance more achievable and demonstrable. Many have a proven track record in other industries such as online retail and social media and work well in a prototyping implementation approach.

 

For more information about innovative and lean data management techniques book a meeting with us.

Read More
Apache drill introduction

An introduction to Apache drill and why is it useful

  |   Blog

With the rapid growth of data and the shift towards rapid development solutions much data is being stored in NoSQL stores such as Hadoop and MongoDB. The infrastructure built upon relational databases that have been used for decades cannot keep up with the volume and scope of data being captured. Further to this SQL is also a really good invention and method for extracting and analysing data that is very widely used.  In short it will not be replaced by hierarchical query techniques such as XPATH anytime soon.

 

In the common case study of Google, the data that Google captured in the early 2000’s was just too large to for the traditional database structure to handle. Google developed an innovative algorithm that divided their data into smaller, more manageable sets of data across multiple machines and mapped the data to come together after required processing is done. They called this algorithm Map reduce, this algorithm was used to develop an open-source platform called Hadoop.

 

Hadoop is one of many frameworks that has been developed to allow massively parallel computing for fast and cost efficient results. With the massive increase of data being captured from new sources businesses started using old and new frameworks together. The challenge of this occurrence is how to link up all of this information from different sources and different formats to extract the right data for an ad-hoc business case that would yield valuable and rapid results. Google solved this with an innovation they called Dremel.  The open source community in tribute then created Apache Drill.  Drill solves this relational & non-relational problem by enabling the user to query data across different frameworks and formats to deliver low-latency results that can be interpreted in familiar tools and language.

Visual representation of how Apache drill works

Apache Drill software has a few differentiating features that gives it a competitive edge. It is schema free, users can quickly query raw data without the time consuming and costly task of schema creation and significant IT involvement. Apache Drill is considered to be one of the fastest query engines on the market today. There is no need for data loading, it has specialized memory management that reduces memory footprint and eliminates garbage collections. It also supports locality-aware execution that reduces network traffic when Drill is co-located with the datastore. Lastly, Apache Drill has been developed with the user in mind. The software is easy to install and supports all major operating systems. It leverages user’s acquired SQL skillsets, there is no need to learn a new coding language. It is also integrated to work with popular business intelligence tools such as Tableau, Qlikview, MicroStrategy and more.

 

Overall, Apache Drill is a software solution that users can implement to leverage their traditional relational data assets alongside newer nosql sources in a quick and convenient way while continuing to use familiar tools and language.

 

At Data to Value we have been using Drill as a way of accessing GPS track data stored in its native format (GPX) in order to create derived data such as speed and heading.  This is great as it enables us to retain the data in a standard format for using in GPX compliant tools whilst also generating the custom analysis that we need to analyse in BI and other visualisation tools.

Read More