About Us
Our Approach
Contact Us

analytics Tag

Apache Spark image

Introduction to Apache Spark

  |   Blog

New technologies continue to emerge enabling faster data processing and advanced analytics. The Hadoop platform was a great breakthrough in this space as it solved many of the storage and retrieval challenges for very large and varied datasets by dividing and processing across multiple machines. This was faster, more cost-effective, and less prone to failures than traditional RDBMS systems. Though Hadoop was a big step forward and made it easier to store, process and retrieve data in a schemaless environment it is already 10 years old and is not capable of multi-pass computations. When using Hadoop the output data of a job needs to be stored after each step slowing things down due to replication and storage. Apache Spark solves this problem by supporting multi-step data pipe-lines and allows jobs to be run in-memory.


It’s calculated that Apache Spark can run programs up to a 100 times faster in memory and 10 times faster on disc compared to Hadoop alone. As with many Apache projects it prides itself on simplicity and compatibility. It provides simplified code for developers and is compatible with Java, Scala and Python languages. Spark is also not limited to being run just on top of Hadoop; it can be integrated with other platforms such as Mesos, EC2 and even be run as a standalone platform.


Apache spark has some great features that synergises very well with its “Lightning-fast cluster computing”. These high-level libraries currently include: Spark SQL, Spark Streaming, MLlib and GraphX. Spark SQL lets users to ETL their data from formats such as JSON or Parquet and query their data via SQL or HIVE. Spark Streaming utilises Spark’s speed and allows users to process data in a real-time. It uses a stream of resilient distributed datasets (RDDs) to process the data. MLlib is a machine learning library that uses various algorithms to process the data in a meaningful way that can then be used with GraphX to visualise the results.


All in all, Apache Spark is one of the fastest big data analytics engines in the market that is widely compatible, easy to use and packs a lot of features in one solution.


For more data management solutions and news please book a meeting with us.

Read More
Search icon for ElasticSearch

An introduction to ElasticSearch

  |   Blog

Search engines are now an integral part of people’s everyday lives. We are used to having access to information at the click of a button. However we rarely think how much work goes into this ability to search for information. Search engine software has become extremely advanced in recent years, now using complex algorithms to provide the most relevant information with predictive search and search suggestion capabilities. Many engines can do this in real-time, processing millions of pieces of information at once.


ElasticSearch diagram, how it works


One of the most advanced search engines on the market today is ElasticSearch. This product is a full-text search and analytics engine. The engine is built on Apache Lucene – a high-performance text search engine library. Essentially ElasticSearch used Lucene software as its complex backbone and built upon it to enable a quick and easy user interface. Moreover, ElasticSearch goes one step further and offers the user not only the ability to search for indexed data, but also the ability to visualise and analyse using components Kibana and Logstash. ElasticSearch also takes advantage of faceting. Faceted search is more advanced comparing to a text search as it enables a user to apply various filters and use a data classification system for better understanding what data assets an organisation has and where. ElasticSearch is schemaless, enabling business users to gain insight and manipulate data in a much quicker and more convenient manner as they work.  As well as other new and innovative products in this space ElasticSearch has the capability to be scaled to hundreds of servers and handle petabytes of structured and unstructured data. Moreover, ElasticSearch operates under the Apache 2 licence making it fully open-source – users can download it, share it and modify it as they see fit.


There are many great use cases for ElasticSearch for organisations that are struggling to search, explore, govern and analyse large volumes of data in a variety of structures. A number of these can be rapidly tackled thanks to the simplicity of the products deployment options and architecture. Indeed many of the core requirements for building a datalake can often be met using this relatively simple toolset. Some of our favourite use cases include analysing log data within an IT departments application landscape in order to identify processing errors, data leakage and predict risks of downtime.  Similarly within a business environment we have found the toolset extremely useful when tackling complex risk and compliance projects where linkages and facts are often distributed and hidden within a variety of documents and databases.


For more innovative ways of turning your data into your organisation’s most powerful asset please book a meeting with us.

Read More

Data-driven actioncams – Garmin Virb XE review and roadtest

  |   Blog

Data to Value’s Director James Phare takes Garmin’s latest data-driven actioncam for a test ride on his commute to the office.




Thanks to a rise in the use of smartphones, falling sensor prices, and growing data literacy the fitness tracking and wearables market continues to rapidly grow with 70m units shipped in 2014 according to Gartner. New hardware such as smart glasses, watches and bracelets, new software apps and new sports use cases continue to emerge. As someone who enjoys extreme sports and data in equal quantities I find this a genuinely exciting time and look forward to seeing what the Internet of Things (IoT) can bring to this sector.


One area where technology has matured quite a bit recently is the actioncam market where GoPro’s traditional dominance is now being challenged by competitors. Many have chosen to go down the analytics and sensors route by loading their compact cameras with internal sensors and Bluetooth or wifi connectivity to external sensors. Garmin are one of the leaders in this space thanks to their successful Virb series of water, dust and snow proof actioncams. These have proven popular in a wide range of sports including snowsports, sailing and cycling to name a few. The latest model busting with features, the Virb XE, recently hit the shelves and we couldn’t wait to test it out.


Initial impressions of the device are that it will be very familiar to GoPro users in terms of dimensions and usability. It features a micro SD card, rechargeable batteries and will shoot high definition video at up to 1440p30 and 12 megapixel photos. It also has a number of other latest generation actioncam features such as wifi connectivity to line up shots.


Virb XE actioncam


What’s different about the Virb XE however is it’s sensor and data logging capabilities, branded G-Metrix by Garmin.  Included as standard in the Virb are GPS, G-force and orientation / gyro sensors.  This enables the capture of a number of datapoints that can help sports enthusiasts to understand what was going on in a particular photo or video. The Virb XE can also be connected to a range of compatible sensors such as heart-rate sensors, cadence sensors for cycling, remotes and watches. This data can then be seamlessly over layed onto video footage using the Virb Edit software (pictured below) and enriched with analytics, graphs, maps and other widgets. For someone that has worked with Data Visualisation and Business intelligence tools for years this ability to overlay useful data over my sports videos is fantastic. For those wishing to get the data into third party software there are also some helpful GPX and FIT file export options.


Virb Edit cycling video


To test out the cam I strapped it to the handlebars of my roadbike for my morning commute into the office and connected my garmin heart rate sensor.  I was impressed with the results, not only was the footage quality excellent the sensor data has also proven to be very complete and accurate. For my morning commute I’m not that bothered about finding insights to improve performance, however for improving general fitness awareness I can see how it could be useful. For me the Virb really comes into its own for sports where it has traditionally been difficult to capture metrics for meaningful performance improvements such as sailing. I can see the cam being a really useful training age for improving techniques and setup over the autumn and winter months.


Virb Edit Studio software

Read More