Tutorial: Mining Sensor Data in Cyber Physical Systems

You are here

Mining Sensor Data in Cyber Physical Systems

Paul O'Leary

University of Leoben, Franz Josef-Straße 18, 8700 Leoben, Austria

Matthew Harker

University of Leoben, Franz Josef-Straße 18, 8700 Leoben, Austria

 

ABSTRACT

This tutorial addresses the issues involved in mining sensor data in cyber physical systems. The IEEE defines "A CPS is a system with a coupling of the cyber aspects of computing and communications with the physical aspects of dynamics and engineering that must abide by the laws of physics. This includes sensor networks, real-time and hybrid systems." Since the system must abide by the laws of physics, so should the results of the data mining. Consequently, the solution of the inverse problems associated with systems being monitored by the sensors is a prerequisite is causality is to be used as a measure of significance and not mere correlation. Causality if also a prerequisite for physically meaningful semantics. Large system with many sensors deliver parallel real-time streams of data. Mining such data has not been at the center point of most research. The volume, rate and complexity of such data places very specific requirements on the management and analytics. This tutorial will discuss the issue of data structures, file systems and data bases types which are suitable for storing such data. Quantitative results resulting from test different storage systems and strategies are also part of the tutorial. The importance of differentiating between the requirements when storing meta- and streaming data is also addresses. The central concept for the analytics is to differentiate clearly between the perceptive and conceptive tasks involved in extracting understanding from data streams. The perceptional task is to generate preprocessed and hierarchically summarized data which simplifies the conventional task of incrementally establishing improved models. This is modelled as a lexical analysis of parallel sensor data streams, enabling symbolic queries and comparisons in the data sets. The results of applying such techniques to large dynamic plant and machinery is presented. This gives a feeling for the attainable performance.