Information from which knowledge can be discovered is frequently distributed due to having been recorded at different times or to having arisen from different sources. Such information is often subject to both imprecision and uncertainty. The Dempster-Shafer representation of evidence offers a way of representing uncertainty in the presence of imprecision, and may therefore be used to provide a mechanism for storing imprecise and uncertain information in databases. We consider an extended relational data model that allows the imprecision and uncertainty associated with attribute values to be quantified using a mass function distribution. When a query is executed, it may be necessary to combine imprecise and uncertain data from distributed sources in order to answer that query. A mechanism is therefore required both for combining the data and for generating measures of uncertainty to be attached to the (imprecise) combined data. In this paper we provide such a mechanism based on aggregation of evidence. We show first how this mechanism can be used to resolve inconsistencies and hence provide an essential database capability to perform the operations necessary to respond to queries on imprecise and uncertain data. We go on to exploit the aggregation operator in an attribute-driven approach to provide information on properties of and patterns in the data. This is fundamental to rule discovery, and hence such an aggregation operator provides a facility that is a central requirement in providing a distributed information system with the capability to perform the operations necessary for Knowledge Discovery.
|Journal||Information Sciences (Special Issue on Knowledge Discovery from Distributed Information Sources)|
|Publication status||Published - 15 Oct 2003|
- data mining
- evidence theory
- rule induction