We are concerned with sequences that comprise heterogeneous symbolic data that have an underlying similar temporal pattern. The data are heterogeneous with respect to classification schemes where the class values differ between sequences. However, because the sequences relate to the same underlying concept, the mappings between values, which are not known ab initio, may be learned. Such mappings relate local ontologies, in the form of classification schemes, to a global ontology (the underlying pattern). On the basis of these mappings we use maximum likelihood techniques to learn the probabilistic description of local probabilistic concepts represented by individual temporal instances of the expression sequences. This stage is followed by one in which we learn the temporal probabilistic concept that describes the underlying pattern. Such an approach has a number of advantages: (1) it provides an intuitive way of describing the underlying temporal pattern; (2) it provides a way of mapping heterogeneous sequences; (3) it allows us to take account of natural variability in the process, via probabilistic semantics; (4) it allows us to characterise the sequences in terms of a temporal probabilistic concept model. This concept may then be matched with known genetic processes and pathways.
Bibliographical noteThis paper develops a principled mechanism for clustering temporal patterns expressed by data from heterogeneous classification schemes, thus enabling temporal concepts to be identified even when the data are from a “mixed bag”. The approach was developed in the EU-IST MISSION project to handle data heterogeneity issues between national statistical institutes. However, it has wider applicability: in this paper, to identify significant gene expression patterns; and the work is currently being used in a project (cell-phone video streaming in care support for Alzheimer's disease), funded by the ETAC consortium, to develop algorithms to identify behavioural patterns from sensor data.
- Sequence processing
- Schema mapping