Watch our joint webinar with Likurious that explores how to manage Data Governance and metadata requirements in the age of Big Data.
James Phare of Data to Value and Jean Villedieu of Linkurious demonstrate how you can answer burning questions using the latest graph and network tools and techniques. They provide an overview of how many of the largest bluechip organisations are now applying graph technology to intimately understand the hidden connections in their metadata and how you can apply these techniques to your landscape.
Big Data has dominated the headlines for some time in technology and business media. In recent years it has also become much more widely discussed in the general media as innovations spread from traditional roots in online retail and social media to other sectors such as Healthcare, Government, Manufacturing and Finance. Research by Deloitte suggested during 2012 90% of Fortune 500 companies would pursue Big Data projects. Undoubtedly there are numerous, indisputable and powerful use cases demonstrating the social, environmental and economic benefits of leveraging Big Data. Is following the trend of hoarding data with the aim of finding nuggets of insight the right approach however?
Whilst many commentators refer to Data as the new oil it has been a key corporate asset for some time, decades in fact. The first Data Warehouse was conceived in the 1960s and as relational databases grew in popularity in the 1980s so did corporate data volumes. This growth led to control challenges and thus in response a number of organisations and frameworks formed to bring structure, organisation and planning to Data Management. A variety of sectors began to adopt the principles espoused by organisations such as DAMA and BCS and frameworks such as Zachman and TOGAF. Many of these approaches focused on waste and a belief that ‘less is more’. Minimising storage of data, movement of data, cleansing of data and processing of data was the objective. Thus Big Data to an extent represents an antithesis for these traditional principles.
We are now in the era of feast, not famine. Falling processing and storage costs, a range of new technologies and new approaches including NoSQL databases, Machine Learning and Neuro-Linguistic Programming have enabled many organisations’ data volumes increase without causing meltdowns. If anything many firms such as Facebook, Google and Ebay have demonstrated that accumulating huge volumes of data can be immensely valuable and yield previously undiscoverable insights. Clearly many facets of Big Data are quite distinct to traditional Data Management tools and techniques. In fact the latest data analysis techniques are so different to historical techniques that many herald the triumph of Data driven decision making over gut instinct. Does this however mean the Big Data blueprint should be embraced by all types of organisations and that greater data storage and processing is the answer? Not quite.
Whilst Data Management costs have fallen as computer processor and storage technologies have advanced this was driven by historical requirements for data. One could argue that these falling costs have not only been a significant driver but also a result of Big Data innovation. Many studies using cost per gigabyte measures also show storage cost decreases have slowed over the last 6 years. Some even believe storage costs will begin to increase again. Thus in many ways Big Data’s growth shares characteristics with other societal trends such as car ownership and road congestion. Historically it was believed that by building more roads congestion would decrease. In fact the reverse has happened as the cost to drivers of using the road network has effectively fallen leading to more drivers making more journeys. We are still developing Big Data techniques and clearly powerful innovations are still being developed. One trend however is clear – the genie is out of the bottle and it is likely that future infrastructure cost reductions will be rapidly consumed by new data requirements.
Big Data isn’t solely about storage however, it’s also about processing and interpreting data. One major constraint that is often overlooked in this area is the growing problem of Information Overload and our limitations as humans to absorb and process growing volumes of data. Many studies have found worrying trends to support this such as attention spans shortening and IQ scores falling, particularly in highly developed economies. Within the Big Data domain inference algorithms, Natural Language Processing and other semantic technologies in particular are reducing the requirement for detailed human decision making. How likely it is that organisations will increasingly delegate decision making to machines however depends on society’s ability to progress related issues such as increasing democracy in decision making, privacy, data security and control. Indeed with eminent thought leaders such as Elon Musk and Stephen Hawking amongst those voicing concerns about Artificial Intelligence developments it’s unlikely that the need for considerable human oversight will disappear any time soon.
So what’s the solution? It appears organisations may be dammed if they fully embrace Big Data, but also dammed if they don’t. One movement that has grown in popularity in recent years but retained its traditional roots may be able to help. Lean as a philosophy has been around for some time, originating in manufacturing. Recently it’s been very successfully applied to areas as diverse as Change Management, start-ups and project delivery. It uses a number of techniques and principles to focus on reducing waste – any activity that doesn’t directly create value for customers. Whether these customers are internal stakeholders or external clients. Applying this framework to Information Management has proven to yield some interesting results which go some way to helping organisations decide on what level of Big Data adoption is right for them. A central premise is the rule that every piece of Data within an organisation’s Information landscape should in some way be linked to creating value for the end customer and the organisation’s objectives – whether this is revenue maximisation, cost reduction or something else. Whilst this is easier said than done there are a number of Lean Information Management techniques organisations can employ.
Using automated discovery techniques it’s now possible to classify, catalogue, model and define an organisation’s data assets more rapidly and thoroughly than ever before. Metadata discovery, Profiling and Semantic technologies in particular are becoming much more usable and cost effective. Not only does this reduce time spent finding data, which can be as much as 25% of an employee’s day, it also aids data security, archiving and deletion strategies. Modelling your Data Architecture and Data Management practices is also invaluable for understanding whether data travels between producers and consumers via the shortest path. For Lean this modelling generates valuable metrics to help steer its hypothesis driven approach to Data Strategies. Time is also a key form of waste and thus a key principle is also to only collate data for decision making when absolutely necessary. Not all decisions need to be made using hard fought empirical evidence, sometimes common sense and trust is enough.
If your organisation is considering major investments in Big Data its worth considering whether these key concerns and principles are addressed. If not perhaps it’s time to consider Lean Data as an alternative.
Professional sport has taken great interest in terms of performance analysis and backroom management with moneyball style analytics. Given I’m a keen sailor in the International Moth class and we specialise in helping organisations adopt the latest Information Management techniques we thought we would try to apply some our expertise to this domain. In order to do this however there are a number of challenges to overcome, including a lack of commercially available sensors, so we’ve decided to create a Data Logger project which we will be covering using our Blog. The aim is to build a data logger using a variety of sensors using the Arduino Open source electronics platform. This can then be used to for repeatable analysis using the latest Big Data tools to gain performance insights. In this first post I’ll talk about the motivations for the project, challenges and where we are with prototyping our sensor package and analysis toolset.
The principle objectives of the project are to analyse data in order to:
Achieve a repeatable way of assessing whether kit and settings changes impact performance.
To analyse technique and compare approaches across helms.
To collate data to identify patterns, insights and trends that we may not currently be aware of.
Moth racing is fast paced with speeds exceeding 30knots.
The Data to Value liveried Moth.
Using Big Data in Sailing presents its own set of challenges distinct from other sports. A large part of what makes it an exciting sport is the sheer number of variables that impact performance and how difficult these variables are to model, understand and predict. Pre-racing there are a wide range of variables that can impact performance such as kit selection, boat design, rig setup and the like. Some of these variables are easier to measure than others. Rig tension for example can be measured using a gauge, measuring the shape of a sail when rigged on the other hand can be somewhat more difficult. During racing a whole host of new variables such as weather conditions, tide and wind to Sailor characteristics (ability, weight, fitness etc.) also emerge.
Winners of the Americas Cup, Team Oracle, proved that leveraging Data can help overcome the impossible by achieving one of the most impressive sporting comebacks of all time. They were able to do this by investing heavily not only in hardware and software but also sensors and trained staff. Team Oracle used around 300 sensors generating 3000 datapoints, ten times per second. Combining this with historical data, video and other datasets its unsurprising they needed Oracle Exadata kit. Fortunately for our project however the Moth is considerably smaller, single handed and has less controls and settings to change therefore whilst still significant data volumes should be smaller.
We always recommend prototyping projects before committing to large and complex builds. To clarify some of my thinking last season I installed a waterproof housing on my moth to use my Android smartphone as a temporary data logger. This proved to be a relatively easy way of collecting initial accelerometer, gyroscope and GPS data. It helped to progress thinking about which analytics were worth generating, how to streamline calculations and work through using the data in practice to make tuning decisions. I set the device to capture gps coordinates, gyroscope and accelerometer (x,y,z) data 2 times per second. From this I was able to calculate real-time speed, average speeds and bearing.
The analytics below are screenshots from Tableau, a Data Visualisation tool, for the first day of last year’s Parkstone Grand Prix where fortunately the wind was reasonably consistent in terms of direction and speed. Using estimated windspeed and direction I calculated estimate polars (speed by sailing direction) and absolute Velocity Made Good (VMG) and coded some rules to determine lap numbers and times, port-starboard tacks and when a tack or a gybe had been completed. One thing that became apparent during the prototyping was that the sensors do from time to time throw up errors that can be difficult to spot can be corrected or filtered out. An example of this was when the number of connected satellites for the GPS fell to a number that generated inaccurate fixes for location data. To get around these issues I used our Data Mining & Profiling partner’s tool X88 Pandora to build a rule that compared average speed against leg, bearing and number of connected satellites. This should prove useful in the future as the project progresses.
Scattergraph of downwind speed vs boat trim (sailing angle).
Downwind speed histogram by tack.
Average and actual 10 second speed for tacks.
Average and actual 10 second speed for gybes.
Upwind speed histogram by tack.
Heatmap of speeds in Poole Harbour.
So what did I learn from the data collected that day? One of the first insights was that my top upwind and downwind speeds do not appear to be evenly spread across port and starboard tack. I was considerably faster downwind on port tack and upwind on starboard tack. From the mapping data these differences didn’t appear to be due to significantly overstanding lay-lines and thus gaining higher speeds at the expense of VMG. Unfortunately given I was using estimate wind direction and speed numbers this could be due to coincidental gusts whilst sailing on those tacks. I don’t think this was the case either however as the wind was reasonably consistent and particularly upwind the difference is a hugely noticeable couple of knots. This got me thinking about other key settings that could be noticeably different from tack to tack such as the wand on the bow of the moth which is mounted to one side. This changes in relative length from tack to tack, especially upwind where the boat is sailed heeled over on top of the helm. At the time my wand controls were poorly setup making it difficult to adjust length from tack to tack so I decided to take this hypothesis forward by fitting a longer wand and logging data with an equally long wand on both tacks – sure enough this reduced the difference in average speeds from tack to tack.
Another interesting analytic is comparing speed to boat trim – i.e. the angle of the hull and foils relative to the water. This can be adjusted by moving your body weight back and forward or by twisting the tiller to adjust the rudder angle in a moth. There’s a lot of debate in the moth fleet about what is the optimum setting for downwind sailing. Some suggest keeping the bow down so that the boat can be driven hard in the waves, others suggest keeping the bow up and reasonably level so that foil lift is generated more from the cross section of the mainfoil rather than through more flap down – which adds to drag and reduces speed. The scattergram shows that there is a cluster of faster speeds when the bow is slightly dropped, with faster speeds particularly on port tack. There is the odd high speed where the bow is slightly raised, however given this was during racing I wasn’t deliberately experimenting with different trim settings. I think this analytic will be useful in the future to try and shed some more light on this area and work out what the optimum level of trim is. I suspect the way forward will be to start doing tuning runs once I have accurate, real-time wind direction and strength figures and can produce similar analysis for a constant VMG.
Tacking and gybing tends to be an area where significant gains can be made by helms that are able to keep speeds up and stay on the foils whilst changing direction. Techniques to do this slightly vary, particularly across moth hull designs and what works for one person might not work for another. As expert Nathan Outeridge shows below a lot of things happen during a tack. The average and actual 10 second gybe and tack breakdowns were useful for working out which were my best tacks and gybes and then looking into why. My worst tacks generally seem to be where I dont have good speed going into the tack and don’t have a slight amount of windward heel – I’m either flat or heeled to leeward. For gybes one thing I’m looking forward to analysing is whether slow, arched gybes or fast gybes are better for average speed and VMG. Either way looking at these analytics first makes measuring improvement and finding the relevant video footage to troubleshoot a lot easier. Going forward I suspect I’ll try and automate some of this analysis by calculating how flat the curves are.
Scoping and next steps
The use of a smartphone was great for helping to form ideas around next steps and helped to find a number of insightful things buried in the data. Taking the project forward however really we want to use a more advanced suite of sensors that can be fitted to multiple boats. To do this I’ve been brushing up on my electronics knowledge by learning about the Arduino platform and prototyping different sensor configurations. I’ve also got a wind speed and direction finder on order to get more accurate wind data. The eventual aim is to also incoporporate other real-time sensor data such as wand movements and angle, mainsheet pulls, sail-shape and angle, height above the water, foil angles, steering movement and other measurements. This will take time however given I dont have a great deal of electronics expertise.
In the next blog I’ll spend some time going through the Data Architecture and IT components we are using as well as some more detail on progress with the sensors. Stayed tuned.