About Us
Our Approach
Services
Projects
News
Contact Us
 

How semantics is creating a better internet

Semantic web blog

How semantics is creating a better internet

  |   Blog

The internet has been improved drastically since its origins of being used in university supercomputers to collaborate on research projects. It has revolutionised human behaviour – changing the way we communicate, conduct business, create and consume information. It has become very sophisticated and at the same time very convenient to use. Indeed many modern apps and conveniences rely on the internet without us often realising.

 

The internet is a global computer network that is interlinked through URLs or a Uniform Resource Locator to access information stored on a different computer. Often a user does not know the exact URL directed to the information he/she wants. To find this information we use search engines such as Google or Bing where we search indexes to find relevant information. These rely on extracted keywords and other metrics such links with other sites and visitor numbers. Although this is a great breakthrough in technology, computers and search engines still struggle to understand the subtleties around meaning and context that often determine whether something is relevant or not. Machine learning and semantics are being developed to solve this problem of machines understanding context, meaning, relationships and other semantically challenging concepts.

 

A cornerstone of the semantic web is its use of newer graph-based approaches and technologies – such as the RDF and SPARQL W3C initiatives.  Given the internet is a giant web of connected data this model works well compared to traditional relational techniques where it has been necessary to structure data in ways less geared to showing complex relationships such as hierarchies.

 

RDF or Resource Description Framework is a way of describing data in a way that it can be queried using the SPARQL language. Using a RDF framework is beneficial as instead of just relying on keywords search engines and browsers can also read the RDF files and understand the concepts references and the context of a web page – for example whether it is about a book, product, company or many other things a user might be searching for. The Resource Description Framework works in a simple way of dividing information into triples: subject – predicate – object. For example Robert was born in 1986, in this case Robert is the subject, was born is the predicate and the object is 1986. These triples then can have relationships with each other providing even more related information or Linked Data as it is often known. This data can then be easily accessed by using SPARQL to traverse the Knowledge Graph – for example Robert who was born in 1986 lives in London and has three children, Robert knows John, John has 1 child and so on. Flexibility is king with RDF and adding new relationships or concepts is much easier than traditional techniques such as adding rows and columns into a table.

 

Semantics is still in early stages of its lifecycle. As the need for machine learning grows it will change the way we organise, search and interact with data. It will drastically increase the scale, efficiency and capabilities providing users with an intuitive, tailored and insight rich snapshot of a dataset.

 

For innovative data consulting services please book a meeting with us.

Read More

Graph based Data Governance & Data Lineage

  |   Blog

Data Governance, Data impact analysis and Data Lineage requirements are becoming ever more complex thanks to growing volumes of data, new types of data (such as Open Data) and new regulations. Traditionally relational technologies, ETL components and integration stacks have struggled to document full end to end lineage and dependencies in a timely and scalable manner.

 

This has led us to develop a new approach to understanding our customers’ data lineage and data governance landscapes using leading graph technologies such as Neo4j and our partners Linkurious and Manta tools.

 

Whilst there are no magic bullets we have made significant progress in modelling and understanding complex data lineage requirements and empirically identifying key components such as the most widely used Critical Data Elements (CDEs).

 

If you are a complex organisation struggling with implementing Data Governance and understanding your Data Lineage please contact us for more information.

Read More
Search icon for ElasticSearch

An introduction to ElasticSearch

  |   Blog

Search engines are now an integral part of people’s everyday lives. We are used to having access to information at the click of a button. However we rarely think how much work goes into this ability to search for information. Search engine software has become extremely advanced in recent years, now using complex algorithms to provide the most relevant information with predictive search and search suggestion capabilities. Many engines can do this in real-time, processing millions of pieces of information at once.

 

ElasticSearch diagram, how it works

 

One of the most advanced search engines on the market today is ElasticSearch. This product is a full-text search and analytics engine. The engine is built on Apache Lucene – a high-performance text search engine library. Essentially ElasticSearch used Lucene software as its complex backbone and built upon it to enable a quick and easy user interface. Moreover, ElasticSearch goes one step further and offers the user not only the ability to search for indexed data, but also the ability to visualise and analyse using components Kibana and Logstash. ElasticSearch also takes advantage of faceting. Faceted search is more advanced comparing to a text search as it enables a user to apply various filters and use a data classification system for better understanding what data assets an organisation has and where. ElasticSearch is schemaless, enabling business users to gain insight and manipulate data in a much quicker and more convenient manner as they work.  As well as other new and innovative products in this space ElasticSearch has the capability to be scaled to hundreds of servers and handle petabytes of structured and unstructured data. Moreover, ElasticSearch operates under the Apache 2 licence making it fully open-source – users can download it, share it and modify it as they see fit.

 

There are many great use cases for ElasticSearch for organisations that are struggling to search, explore, govern and analyse large volumes of data in a variety of structures. A number of these can be rapidly tackled thanks to the simplicity of the products deployment options and architecture. Indeed many of the core requirements for building a datalake can often be met using this relatively simple toolset. Some of our favourite use cases include analysing log data within an IT departments application landscape in order to identify processing errors, data leakage and predict risks of downtime.  Similarly within a business environment we have found the toolset extremely useful when tackling complex risk and compliance projects where linkages and facts are often distributed and hidden within a variety of documents and databases.

 

For more innovative ways of turning your data into your organisation’s most powerful asset please book a meeting with us.

Read More