|
Many discretization methods for converting numeric time series to symbolic time series ignore the temporal order of values. This can lead to symbols that do not correspond to states of the process generating the time series and cannot be interpreted meaningfully.
The Persist discretization algorithm optimizes the persistence of the resulting states. It achieves much higher accuracy than existing static methods and is robust against noise. It also outperforms Hidden Markov Models for all but very simple cases.
You can download the code for Matlab and try it on your own data.
This research was performed under the supervision of Prof. Dr. Alfred Ultsch in the Databionics Research Group, Philipps-University of Marburg, Germany. The muscle data was provided and analyzed by Dr. Olaf Hoos from the Institute for Sports Medicine, Philipps-University of Marburg, Germany.
| Mörchen, F., Ultsch, A.: Optimizing Time Series Discretization for Knowledge Discovery, Grossman, R.L., Bayardo, R., Bennet, K., Vaidya, J. (Eds), In Proceedings The Eleventh ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, Chicago, IL, USA, (2005), pp. 660-665 ACM Digital Library |
| Mörchen, F., Ultsch, A.: Finding persisting states for knowledge discovery in time series, In From Data and Information Analysis to Knowledge Engineering - Proceedings 29th Annual Conference of the German Classification Society (GfKl 2005), Magdeburg, Germany, Springer, Heidelberg, (2005), pp. 278-285
url SpringerLink |
|
The time series describes the activation of a muscle during inline speed skating. It was discretized using Persist into into the three non obvious states high, on, and off.
The picture shows the marginal density estimation of the values with the bins chosen by Persist. One cut is placed to the right of the second density peak resulting in a state for very high activation. This interesting temporal structure is not visible from the density plot and is not discovered by the other methods.
|