Defining semantics is a matter of semantics, not less so in the Big Data space.
The Semantics conference is one of the biggest events for all things semantics. Key research and industry players gathered this week in Leipzig to showcase and discuss, and we were there to get that vibe.
Enterprise Knowledge Graphs
Graphs are everywhere: we have social graphs and knowledge graphs and office graphs, and in the minds of most these have been associated with Facebook and Google and Microsoft. But the concept of Knowledge Graphs is broader and vendor-agnostic.
All graphs can be considered as knowledge graphs, insofar as they represent information by means of nodes and (directional) edges. Nodes represent entities and edges represent relationships between them, such as Customer-buys-Product. Another way of stating this is by using the Subject-Predicate-Object metaphor borrowed from natural language.
However, not all information is represented by means of graphs, for a number of reasons mostly having to do with complexity, cost, and performance. In the enterprise, the new imperative to deal with such issues is the data lake: a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data.
By adding a semantic layer to data lakes, what we get is Enterprise Knowledge Graphs. Even though there are a number of approaches and implementations to representing graphs, a set of standards under the Linked Data moniker combined with extensible, curated vocabularies for numerous domains offers a lightweight semantically enriched approach to enterprise data integration.
Linked Data has had a rough time finding its way to the enterprise, but a lot of water has flowed under the bridge connecting the 2009 PwC technology forecast to the 2015 Gartner Hype Cycle for Advanced Analytics and Data Science. Linked Data are shown as currently being in the Trough of Disillusionment, expected to reach the Plateau of Productivity in the next 5-10 years, which according to industry pundits is a good thing as it means we're finally getting there.
Managing the Graph
Promising-sounding or not, enterprises need more than cool technology and hype cycles to move to adoption: they need solutions for managing their data and metadata. Data solutions are well-known, metadata solutions less so, but adopters like BBC, Credit Suisse, and Roche show the way.
Managing data vocabularies and mappings is crucial for instating Enterprise Knowledge Graphs, and the aptly named Semantic Web Company (SWC) presented its own solution in this space called PoolParty. PoolParty is a semantic middleware that helps organizations develop knowledge graphs based on Semantic Web standards.
PoolParty is the outcome of intensive R&D since 2009 and provides features such as vocabulary management, text mining and entity extraction, concept tagging, semantic search, recommendations, analytics, and visualization. SWC reinvests 50 percent of its revenue in constantly improving its modular product to reach new audiences.
Social Engagement
The fact that IBM was at the conference speaks volumes on the importance semantics has for the company. IBM presented the research underpinnings of the Social Engagement Dashboard (SED), a solution for analytics in enterprise social networks. SED is built on top of IBM Connections, but IBM touts it as a generalized solution to ingest any social network or collaboration-related data.
Research on SED (codenamed Project Breadcrumb) has been running within IBM for a couple of years, and during that time IBM has successfully integrated and analyzed data from CRMs, social networks, and other sources using property graphs. SED tries to capture context through metrics such as activity, influence, and broker potential to build recommendation systems for the enterprise.
SED derives behavior patterns for employees, with the end game being to provide management with an overview of connectivity, roles, and efficiency in the organization, to classify employees in behavior templates, and to motivate them to observe and adjust. Admittedly, this raises an array of questions and brings a Circle-like setting to mind, but IBM pledges to transparency and privacy as a remedy.
Learning for Healthcare
Last but not least, Siemens presented some of its current research with semantic underpinnings and applications in the healthcare domain. Siemens has a long-established tradition in healthcare, and is now looking into ways to move forward utilizing semantic technology.
Siemens is engaged in various R&D projects with a multitude of research and industrial partners, and together they have identified some key dimensions in the digital transformation of healthcare: mobilizing data in a trusted network, integrating external data sources, putting the patient in charge, and providing always-on and personalized services.
Siemens is focused on addressing three key issues: generating a model for clinical processes and decisions, mapping decisions to outcomes, and drawing causal conclusions. Its approach, called Learning with Memory Embeddings, uses semantics to model healthcare knowledge graphs and a Machine Learning combinatorial model inspired by the human brain.