The Role Ontology plays in Big Data

Apr 112014

Ontology

This document contains my views on the subject and I have used some source data found on the web (Wikipedia). Comments on the subject are very welcome.

An ontology formally represents knowledge as a hierarchy of concepts within a domain, using a shared vocabulary to denote the types, properties and interrelationships of those concepts.

Ontologies are the structural frameworks for organizing information and are used in artificial intelligence, the Semantic Web, systems engineering, software engineering, biomedical informatics, library science, enterprise bookmarking, and information architecture as a form of knowledge representation about the world or some part of it. The creation of domain ontologies is also fundamental to the definition and use of an enterprise architecture framework.

As it relates to the Big Data trend:

Ontology claims to be to applications what Google was to the web. Instead of integrating the many different enterprise applications within an organization to obtain, for example, a 360 degrees view of customers, Ontology enables users to search a schematic model of all data within the applications. They extract relevant data from a source application, such as a CRM system, big data applications, files, warranty documents etc. These extracted semantics are linked into a search graph instead of a schema to give users the results needed.

Ontology gives users a different approach in using enterprise applications, removing the need to integrate the different applications. It allows users to search and link applications, databases, files, spreadsheets, etc. anywhere. The product of Ontology is very interesting because in the past years a vast amount of enterprise applications for various needs and with various requirements have been developed and used by organizations. Integrating these applications to obtain a company-wide integrated view is difficult, expensive and often not without risks.

Why is it important?

It eliminates the need to integrate systems and applications when looking for critical data or trends.

How is it applied and what are the important elements that make it all work?

Ontology uses a unique combination of an inherently agile, graph-based semantic model and semantic search to reduce the timescale and cost of complex data integration challenges. Ontology is rethinking data acquisition, data correlation and data migration projects in a post-Google world.

Enables the Semantic Web

The Semantic Web

The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.

While its critics have questioned its feasibility, many others argue that applications in industry, biology and human sciences research have already proven the validity of the original concept.

The main purpose of the Semantic Web is driving the evolution of the current Web by enabling users to find, share, and combine information more easily. Humans are capable of using the Web to carry out tasks such as finding the Estonian translation for “twelve months”, reserving a library book, and searching for the lowest price for a DVD. However, machines cannot accomplish all of these tasks without human direction, because web pages are designed to be read by people, not machines. The semantic web is a vision of information that can be readily interpreted by machines, so machines can perform more of the tedious work involved in finding, combining, and acting upon information on the web.

The Semantic Web, as originally envisioned, is a system that enables machines to “understand” and respond to complex human requests based on their meaning. Such an “understanding” requires that the relevant information sources be semantically structured.

The Semantic Web is regarded as an integrator across different content, information applications and systems. It has applications in publishing, blogging, and many other areas.

Often the terms “semantics“, “metadata“, “ontologies” and “Semantic Web” are used inconsistently. In particular, these terms are used as everyday terminology by researchers and practitioners, spanning a vast landscape of different fields, technologies, concepts and application areas. Furthermore, there is confusion with regard to the current status of the enabling technologies envisioned to realize the Semantic Web.

Semantic Web solutions

The Semantic Web takes the solution further. It involves publishing in languages specifically designed for data: Resource Description Framework (RDF), Web Ontology Language(OWL), and Extensible Markup Language (XML). HTML describes documents and the links between them. RDF, OWL, and XML, by contrast, can describe arbitrary things such as people, meetings, or airplane parts.

These technologies are combined in order to provide descriptions that supplement or replace the content of Web documents. Thus, content may manifest itself as descriptive data stored in Web-accessible databases, or as markup within documents (particularly, in Extensible HTML (XHTML) interspersed with XML, or, more often, purely in XML, with layout or rendering cues stored separately). The machine-readable descriptions enable content managers to add meaning to the content, i.e., to describe the structure of the knowledge we have about that content. In this way, a machine can process knowledge itself, instead of text, using processes similar to human deductive reasoning and inference, thereby obtaining more meaningful results and helping computers to perform automated information gathering and research.

Components

The term “Semantic Web” is often used more specifically to refer to the formats and technologies that enable it. The collection, structuring and recovery of linked data are enabled by technologies that provide a formal description of concepts, terms, and relationships within a given knowledge domain.

Resource Description Framework (RDF), a general method for describing information
RDF Schema (RDFS)
Simple Knowledge Organization System (SKOS)
SPARQL, an RDF query language
Notation3 (N3), designed with human-readability in mind
N-Triples, a format for storing and transmitting data
Turtle (Terse RDF Triple Language)
Web Ontology Language (OWL), a family of knowledge representation languages
Rule Interchange Format (RIF), a framework of web rule language dialects supporting rule interchange on the Web

The Semantic Web Stack illustrates the architecture of the Semantic Web. The functions and relationships of the components can be summarized as follows:

XML provides an elemental syntax for content structure within documents, yet associates no semantics with the meaning of the content contained within. XML is not at present a necessary component of Semantic Web technologies in most cases, as alternative syntaxes exists, such as Turtle. Turtle is a de facto standard, but has not been through a formal standardization process.
XML Schema is a language for providing and restricting the structure and content of elements contained within XML documents.
RDF is a simple language for expressing data models, which refer to objects (“web resources“) and their relationships. An RDF-based model can be represented in a variety of syntaxes, e.g., RDF/XML, N3, Turtle, and RDFa. RDF is a fundamental standard of the Semantic Web.
RDF Schema extends RDF and is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized-hierarchies of such properties and classes.
OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. “exactly one”), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.
SPARQL is a protocol and query language for semantic web data sources.
RIF is the W3C Rule Interchange Format. It’s an XML language for expressing Web rules which computers can execute. RIF provides multiple versions, called dialects. It includes a RIF Basic Logic Dialect (RIF-BLD) and RIF Production Rules Dialect (RIF PRD).