Geometric methods for the estimation of structured covariancesIntroductionConsider a zero-mean, real-valued, discrete-time, stationary random process . Let denote the autocorrelation function, and the covariance of the finite observation vector i.e. . Estimating from an observation record is often the first step in spectral analysis. may not have the Toeplitz structure. We seek a Toeplitz structured estimate by solving where . In this, represents a suitable notion of distance. In the following, we overview some notions of distance and the relation among them. If the distance is not symmetric, we use instead. This approach can be carried out verbatim in approximating state covariances. Likelihood divergenceAssuming that is Gaussian, and letting , the joint density function is where is the sample covariance matrix. Given , it is natural to seek a such that , equivalently , is maximal. Alternatively, one may consider the likelihood divergence Using , then the relevant optimization problem in is which is not convex. A necessary condition for a local minimum is given in [Burg, et al.]: for all Toeplitz . An iterative algorithm for solving the above equation and obtain a (local) minimum is proposed in [Burg, et al.]. This is implemented in CovEst_blw.m. Kullback-Leibler divergenceThe Kullback-Leibler (KL) divergence between two probability density functions , on is In the case where and are normal with zero-mean and covariance and , respectively, their KL divergence becomes while switching the order given earlier. If is used in , then the optimization problem becomes which is convex. This is implemented in CovEst_KL.m which requires the installation of cvx. Fisher metric and geodesic distanceThe KL divergence induces the Fisher metric, which is the quadratic term of For probability distributions parameterized by a vector , the corresponding metric is often referred to as the Fisher-Rao metric and given by For zero-mean Gaussian distributions parametrized by the corresponding covariance matrices, it becomes the Rao metric The geodesic distance between two points and is the log-deviation distance When is used in , the corresponding optimization problem is not convex. Linearization of the objective function about may be used instead, this leads to which is convex. Bures metric and Hellinger distanceThe Fisher information metric is the unique Riemannian metric for which stochastic maps are contractive [N. Cencov, 1982]. In quantum mechanics, a similar property has been sought for density matrices, which are positive semi-definite matrices with trace equal to one. There are several metrics for which stochastic maps are contractive. They take the form For example, in the Bures metric The Bures metric can also be written as If , we regard as a point on the unit sphere, and this point is not unique. The geodesic distance between two density matrices is the smallest great-circle distance between corresponding points, which is call Bures length. The smallest “straight line” distance is the Hellinger distance which is The distance has a closed form When the above distance is used in , the corresponding optimization problem is actually equivalent to a linear matrix inequality (LMI) [1]. This will be shown in the following section. Transportation distanceLet and be random variables in having pdf's and . A formulation of the Monge-Kantorovich transportation problem (with a quadratic cost) is as follows. Determine where is the joint distribution for and . The metric is known as the Wasserstein metric. For and being zero-mean normal with covariances and and also jointly normal, we obtain where . The distance has a closed form which is given by So . So using either , or , the relevant optimization problem becomes It is an LMI thus a convex optimization problem. This is implemented in CovEst_transp.m. An alternative interpretation for the transportation distance is as follows. We postulate the stochastic model where represents noise, and and the covariances of and , respectively. By seeking the least possible noise variance while allowing possible coupling between and brings us the minimum transportation distance. Analogous rationale, albeit with different assumptions has been used to justify different methods. For instance, assuming that while and are independent leads to This is implemented in CovEst_Q.m (SCovEst_Q.m for state-covariance estimation). Then, also, assuming a “symmetric” noise contribution as in where the noise vectors and are independent of and , we are called to solve where and designate covariances of and , respectively. This is implemented in CovEst_QQh.m (SCovEst_QQh.m for state covariance estimation). Mean-based covariance approximationWe investigate the concept of the mean as a way to fuse together sampled statistics from different sources into a structured one. In particular, given a set of positive semi-definite matrices , we consider the mean-based structured covariance for a metric or a divergence (and a suitable choice of power). Below, we consider each of the distances mentioned earlier. Mean-based on KL divergence and likelihoodThe optimization problem based on is and is equivalent to where is the harmonic mean of the 's. On the other hand, if the likelihood divergence is used, the optimization problem is equivalent to where is the arithmetic mean of 's. Mean based on log-deviationIn this case is not convex in . If the admissible set is relaxed to the set of positive definite matrices, the minimizer is precisely the geometric mean, and it is the unique positive definite solution to the following equation When , is the unique geometric mean between two positive definite matrices One may consider instead the linearized optimization problem which is convex in . Mean based on transportation/Bures/Hellinger distanceUsing the transportation/Bures/Hellinger distance, the optimization problem becomes When is only constrained to be positive semi-definite, it has been shown that the transportation mean is the unique solution of the following equation The routine Tmean.m provides a solution to the mean based Toeplitz estimation problem which is based on cvx ExamplesIdentifying a spectral line in noiseWe compare the performance of the likelihood estimation and the transportation-based method in identifying a single spectral line in white noise. Consider a single “observation record” with the “noise” generated from a zero-mean Gaussian distribution. Hence the covariance matrix is and of rank equal to . We use CovEst_blw and CovEst_transp to approximate with a covariance having admissible structure. We compare the corresponding maximum-entropy spectral estimates (subplot 2&3) with the spectrum obtained using pburg (matlab command) shown in subplot 1. The true spectral line is marked by an arrow in each plot. Averaging multiple spectraConsider different runs with , , and , , an and generated from zero-mean Gaussian distribution with variance . We consider the three rank one sample covariances. We estimate the likelihood and transportation mean of these three sampled covariance with the requirement that the mean also be Toeplitz (using CovEst_blw and Tmean). We plot the corresponding maximum entropy spectra. As a comparison, we also take the average of three observation records, and compute the Burg's spectrum based on the averaged data. The three results are shown in the following figure. Both the likelihood and the Burg's methods have two peaks near the central frequency , whereas the transportation-based method has only one peak at approximately the mean of the three instantiations of the underlying sinusoid. Reference[1] L. Ning, X. Jiang and T.T. Georgiou, “Geometric methods for the estimation of structured covariances,” submitted, http://arxiv.org/abs/1110.3695. |