About Us
Our Approach
Contact Us

The Panama Papers – how did they pull off history’s biggest data leak?

The Panama Papers – how did they pull off history’s biggest data leak?

  |   Blog

Find out how Data to Value’s Graph Data software partners Neo4j and Linkurious have been used in the Panama Papers investigation.


Recently there has been a lot of interest around the newly published Panama papers. This giant trove of data that is said to contain a whopping 11.5 million documents or 2.6TB of data. This completely dwarfs pervious leaks like the 1.7GB WikiLeaks scandal or the 30GB Ashley Madison leak. It took two years, more than 400 journalists and cutting edge technology solutions to process all of this information and gain valuable insight.


The data was leaked from one of the world’s leading firms in incorporation offshore entities – Mossack Fonseca. The data was then gradually transferred to a German journalist that worked in the Süddeutsche Zeitung (SZ) via encrypted chat. The real work began shortly after the data started pouring in, as the SZ was not able to make sense of data that size and got in contact with the International Consortium of Investigative Journalists (ICIJ) to find a way of handling these millions of documents. The ICIJ were very efficient and very prudent when handling this data. The data and its copies were stored in encrypted drives using open-source software – VeraCrypt. The choice was made to use Apache Solr – as the main search server coupled with Apache Tika, a toolkit that detects and extracts metadata and text from over a thousand different file types. This made it possible for a seamless and near real-time way of searching different file types, such as PDFs, Word documents and emails. A custom UI developed by Blacklight was put on top of the solution for ease of use. Once built one of more than 400 journalists needed a link and a randomly generated password to start discovering interesting data.


To make sense of the highly connected and complex data the investigators decided to ask the help of two of our software partners – Neo4j and Linkurious. Using Neo4j, the world’s leading graph database, made it easy to find and analyse complex connections as graphs use special structures incorporating nodes, properties and edges to define and store data. Linkurious, a graph visualisation platform helped the journalists to navigate through this ocean of data uncovering unique insights into the offshore banking world, showing the relationships between banks, clients, offshore companies and their lawyers.


The entire dataset of the Panama Papers is expected to be released early May. For more interesting articles about finding meaning in data visit our website and follow us on LinkedIn or Twitter.

Read More

The General Data Protection Regulation in a nutshell

  |   Blog

After more than three years of discussion the EU General Data Protection Regulation or GDPR framework has been finally agreed on. This directive will replace the current 1998 Data Protection Act. As with most major legislative change it will not be enforced immediately and will likely become compulsory at the first half of 2018.  The main intent of the GDPR is to give individuals more control over their personal data, impose stricter rules to companies handling it and make sure companies embrace new technology to process the influx of data produced. Here are the major changes that are mentioned in this new legislation:


  • Expanded territorial reach

Companies that are based outside of the EU, but targeting customers that are in the EU will be subject to the GDPR which is not the case now.


  • Consent

Consent of personal data must be freely given, specific, informed and unambiguous. Consent is not freely given if a person is unable to freely refuse consent without detriment.


  • Accountability and privacy by default

The GDPR has placed great emphasis on the accountability for data controllers to demonstrate data compliance. They will be required to maintain certain documentation, conduct impact assessment reports for riskier processing and employ data protection practices by default – such as data minimisation.


  • Notification of a data breach

Data controllers must notify the Data Protection Authorities as quickly as possible, where applicable within 72 hours of the data breach discovery.


  • Sanctions

This new legislation allows the Data protection Authorities to impose higher fines – up to 4% of annual worldwide turnover. The maximum fines can be applied for discrepancies related to international data transfers or breach of processing principles, such as conditions for consent. Other violations can be fined up to 2% of annual worldwide turnover.


  • Role of data processors

Data processors will now have direct obligations to implement technical and organisation measures to ensure data protection, this could include appointing a Data Protection Officer if needed.


  • One stop shop

This legislation will be applicable in all EU states without the need of implementing national legislation. Having a single set of rules will benefit businesses as they will not need to comply with multiple authorities, streamlining the process and saving an estimate of €2.3 billion a year.


  • Removal of notification requirement

Some data controllers will be glad to hear that the requirement of notifying or seeking approval from a Data Protection Authority is going to be removed in many circumstances. This decision is made to save funds and time. Instead of notification the new directive requires data controllers to put in place appropriate practices for large scale processing in the form of new technology.


  • Right to be forgotten

This change is one of the most useful changes for the average person managing their data protection risks. A person will be able to require their data to be deleted when there is no legitimate reason for an organisation to retain it. Following this is requested the organisation must also take appropriate steps to inform any third party that might have any links or copies of the data and request them to delete it.


This new directive has clearly been created acknowledging that people produce much more sensitive data than they have ever before. Managing data on a large scale can be risky for organisations if they do not plan out an appropriate strategy and update their systems to handle the influx. This kind of negligence can lead to data breaches or leaks.


Data to Value are data specialists that can help you stay ahead of the curve when it comes regulatory compliance. We use the latest No-SQL technologies that can rapidly assess your data quality, identify problem areas, solve them and set up alerts that promptly notify you of any new issues. Engage Data to Value to help you to ensure compliance with the law, mitigation against the risk of regulatory fines and maintenance of a good reputation. For more information contact us directly at info@datatovalue.co.uk

Read More