Technical Report HW-MACS-TR-0050
|Title||Answering Queries over Incomplete Data Stream Histories|
|Authors||Alasdair J. G. Gray, Werner Nutt, M. Howard Williams|
|Abstract||Streams of data often originate from many distributed sources. A distributed stream processing system publishes such streams of data and enables queries over the streams. This allows users to retrieve and relate data from the distributed streams without needing to know where they are located.
Stream data is important not only for its current values but also for past values produced. In order to support this, the history of the stream must be archived and stream processing systems must support history queries. However, one problem which then arises is that data streams published by distributed sources may have missing data values, e.g. due to a network
failure. Since the stream has missed some values, the stored history of the stream contains gaps.
This paper considers the effects of missing information on the answers generated for history queries. The assumptions about the data streams are analysed so that techniques for detecting missing values can be developed. A model for representing the incomplete information has been developed together with an approach to answering history queries where relevant
data is missing.
Case studies have been drawn from the context of the r-gma system, which integrates distributed data streams to provide information and monitoring data about resources on a Grid. However, the model and techniques considered are general and could be applied wherever there is a need to query the history of distributed data streams.|