#### DMCA

## Square Root Graphical Models: Multivariate Generalizations of Univariate Exponential Families that Permit Positive Dependencies

### Citations

814 | Graphical models, exponential families, and variational inference
- Wainwright, Jordan
- 2008
(Show Context)
Citation Context ...the vector direction given the length. While the model in (Inouye et al., 2015) allowed for both positive and negative dependencies, the joint distribution needed to be modified by an ad hoc scalar weighting function to avoid very low likelihood values for vectors of long length—i.e. documents with many words. Therefore, we develop a novel parametric generalization of univariate exponential family distributions with nonnegative sufficient statistics—e.g. Gaussian, Poisson and exponential—that allows for both positive and negative dependencies. We call this novel class of multivariate dis1See (Wainwright & Jordan, 2008) for an introduction to exponential families. tributions Square Root Graphical Models (SQR) because the square root function is fundamentally important as will be described in future sections. SQR models have a simple parametric form without needing to specify any hyperparameters and can be fit using `1-regularized nodewise regressions similar to previous work (Yang et al., 2015). The independent model—e.g. independent Poisson or exponential—is merely a special case of this class unlike in (Yang et al., 2013). We show that the normalizability of the distribution puts little to no restriction o... |

591 | Sparse inverse covariance estimation with the graphical lasso
- Friedman, Hastie, et al.
- 2008
(Show Context)
Citation Context ...n and likelihood approximation methods using sampling. Finally, we demonstrate our exponential generalization on a synthetic dataset and a real-world dataset of airport delay times. 1. Introduction Gaussian, binary and discrete undirected graphical models—or Markov Random Fields (MRF)—have become popular for compactly modeling and studying the structural dependencies between high-dimensional continuous, Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48. Copyright 2016 by the author(s). binary and categorical data respectively (Friedman et al., 2008; Hsieh et al., 2014; Banerjee et al., 2008; Ravikumar et al., 2010; Jalali et al., 2010). However, real-world data does not often fit the assumption that variables come from Gaussian or discrete distributions. For example, word counts in documents are nonnegative integers with many zero values and hence are more appropriately modeled by the Poisson distribution. Yet, an independent Poisson distribution would be insufficient because words are often either positively or negatively related to other words—e.g. the words “machine” and “learning” would often co-occur together in ICML papers (positi... |

331 | Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data - Banerjee, Ghaoui, et al. - 2008 |

262 | Annealed importance sampling
- Neal
(Show Context)
Citation Context ...gradient, only three 1D numerical integrations are needed. Another significant speedup that could be explored in future work would be to use a Newton-like method as in (Hsieh et al., 2014; Inouye et al., 2015), which optimize a quadratic approximation around the current iterate. Because these Newton-like methods only need a small number of Newton iterations to converge, the number of numerical integrations could be reduced significantly compared to gradient descent which often require thousands of iterations to converge. 3.4. Likelihood Approximation We use Annealed Importance Sampling (AIS) (Neal, 2001) similar to the sampling used in (Inouye et al., 2015) for likelihood approximation. In particular, we need to approximate the SQR log partition function A(θ,Φ) as in Eqn. 4. First, we derive a slice sample for the node conditionals in which the bounds for the slice can be computed in closed form. Second, we use the slice sampler to develop a Gibbs sampler for SQR models. Finally, we derive an annealed importance sampler (Neal, 2001) using the Gibbs sampler as the intermediate sampler by linearly combining the offdiagonal part of the parameter matrix Φoff with the diagonal part Φdiag—i.e. Φ =... |

261 | Modelling dependence with copulas and applications to risk management. Handbook of heavy tailed distributions in finance,
- Embrechts, Lindskog, et al.
- 2003
(Show Context)
Citation Context ...ayed then it is likely that the return flight of the same airplane will also be delayed. Other examples of non-Gaussian and non-discrete data include highthroughput gene sequencing count data, crime statistics, website visits, survival times, call times and delay times. Though univariate distributions for these types of data have been studied quite extensively, multivariate generalizations have only been given limited attention. One basic approach to forming dependent multivariate distributions is to assume that the marginal distributions are exponentially distributed (Marshall & Olkin, 1967; Embrechts et al., 2003) or Poisson distributed (Karlis, 2003). This idea is related to copula-based models (Bickel et al., 2009) in which a probability distribution is decomposed into the univariate marginal distributions and a copula distribution on the unit hypercube that models the dependency between variables. However, the exponential model in (Marshall & Olkin, Square Root Graphical Models 1967; Embrechts et al., 2003) gives rise to a distribution that is composed of a continuous distribution and a singular distribution, which seems unusual and unlikely for general real-world situations. The multivariate Poisso... |

178 |
A multivariate exponential distribution.
- Marshall, Olkin
- 1967
(Show Context)
Citation Context ...ancisco, CA (SFO) is delayed then it is likely that the return flight of the same airplane will also be delayed. Other examples of non-Gaussian and non-discrete data include highthroughput gene sequencing count data, crime statistics, website visits, survival times, call times and delay times. Though univariate distributions for these types of data have been studied quite extensively, multivariate generalizations have only been given limited attention. One basic approach to forming dependent multivariate distributions is to assume that the marginal distributions are exponentially distributed (Marshall & Olkin, 1967; Embrechts et al., 2003) or Poisson distributed (Karlis, 2003). This idea is related to copula-based models (Bickel et al., 2009) in which a probability distribution is decomposed into the univariate marginal distributions and a copula distribution on the unit hypercube that models the dependency between variables. However, the exponential model in (Marshall & Olkin, Square Root Graphical Models 1967; Embrechts et al., 2003) gives rise to a distribution that is composed of a continuous distribution and a singular distribution, which seems unusual and unlikely for general real-world situations... |

92 | The nonparanormal: Semiparametric estimation of high dimensional undirected graphs.
- Liu, Lafferty, et al.
- 2009
(Show Context)
Citation Context ... 2003) gives rise to a distribution that is composed of a continuous distribution and a singular distribution, which seems unusual and unlikely for general real-world situations. The multivariate Poisson distribution (Karlis, 2003) is based on the sum of independent Poisson variables and can only model positive dependencies. The copula versions of the multivariate Poisson distribution have significant issues related to non-identifiability because the Poisson distribution has a discrete domain (Genest & Neslehova, 2007). There has also been some recent work on semi-parametric graphical models (Liu et al., 2009) that use Gaussian copulas to relax the assumption of Gaussianity but these models are not parametric and only consider continuous real-valued data. Another line of work assumes that the node conditional distributions—i.e. one variable given the values of all the other variables—are univariate exponential families1 and determines under what conditions a joint distribution exists that is consistent with these node conditional distributions. Besag (1974) developed this multivariate distribution for pairwise dependencies, and Yang et al. (2015) extended this model to n-wise dependencies. Yang et ... |

62 |
Highdimensional Ising model selection using L1-regularized logistic regression.
- Ravikumar, Wainwright, et al.
- 2010
(Show Context)
Citation Context ... demonstrate our exponential generalization on a synthetic dataset and a real-world dataset of airport delay times. 1. Introduction Gaussian, binary and discrete undirected graphical models—or Markov Random Fields (MRF)—have become popular for compactly modeling and studying the structural dependencies between high-dimensional continuous, Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48. Copyright 2016 by the author(s). binary and categorical data respectively (Friedman et al., 2008; Hsieh et al., 2014; Banerjee et al., 2008; Ravikumar et al., 2010; Jalali et al., 2010). However, real-world data does not often fit the assumption that variables come from Gaussian or discrete distributions. For example, word counts in documents are nonnegative integers with many zero values and hence are more appropriately modeled by the Poisson distribution. Yet, an independent Poisson distribution would be insufficient because words are often either positively or negatively related to other words—e.g. the words “machine” and “learning” would often co-occur together in ICML papers (positive dependency) whereas the words “deep” and “kernel” would rarely c... |

41 |
A primer on copulas for count data.
- Genest, Neslehova
- 2007
(Show Context)
Citation Context ...ver, the exponential model in (Marshall & Olkin, Square Root Graphical Models 1967; Embrechts et al., 2003) gives rise to a distribution that is composed of a continuous distribution and a singular distribution, which seems unusual and unlikely for general real-world situations. The multivariate Poisson distribution (Karlis, 2003) is based on the sum of independent Poisson variables and can only model positive dependencies. The copula versions of the multivariate Poisson distribution have significant issues related to non-identifiability because the Poisson distribution has a discrete domain (Genest & Neslehova, 2007). There has also been some recent work on semi-parametric graphical models (Liu et al., 2009) that use Gaussian copulas to relax the assumption of Gaussianity but these models are not parametric and only consider continuous real-valued data. Another line of work assumes that the node conditional distributions—i.e. one variable given the values of all the other variables—are univariate exponential families1 and determines under what conditions a joint distribution exists that is consistent with these node conditional distributions. Besag (1974) developed this multivariate distribution for pairw... |

21 | On learning discrete graphical models using group-sparse regularization.
- Jalali, Ravikumar, et al.
- 2010
(Show Context)
Citation Context ...tial generalization on a synthetic dataset and a real-world dataset of airport delay times. 1. Introduction Gaussian, binary and discrete undirected graphical models—or Markov Random Fields (MRF)—have become popular for compactly modeling and studying the structural dependencies between high-dimensional continuous, Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48. Copyright 2016 by the author(s). binary and categorical data respectively (Friedman et al., 2008; Hsieh et al., 2014; Banerjee et al., 2008; Ravikumar et al., 2010; Jalali et al., 2010). However, real-world data does not often fit the assumption that variables come from Gaussian or discrete distributions. For example, word counts in documents are nonnegative integers with many zero values and hence are more appropriately modeled by the Poisson distribution. Yet, an independent Poisson distribution would be insufficient because words are often either positively or negatively related to other words—e.g. the words “machine” and “learning” would often co-occur together in ICML papers (positive dependency) whereas the words “deep” and “kernel” would rarely co-occur since they usu... |

20 |
An EM algorithm for multivariate Poisson distribution and related models,
- Karlis
- 2003
(Show Context)
Citation Context ... the same airplane will also be delayed. Other examples of non-Gaussian and non-discrete data include highthroughput gene sequencing count data, crime statistics, website visits, survival times, call times and delay times. Though univariate distributions for these types of data have been studied quite extensively, multivariate generalizations have only been given limited attention. One basic approach to forming dependent multivariate distributions is to assume that the marginal distributions are exponentially distributed (Marshall & Olkin, 1967; Embrechts et al., 2003) or Poisson distributed (Karlis, 2003). This idea is related to copula-based models (Bickel et al., 2009) in which a probability distribution is decomposed into the univariate marginal distributions and a copula distribution on the unit hypercube that models the dependency between variables. However, the exponential model in (Marshall & Olkin, Square Root Graphical Models 1967; Embrechts et al., 2003) gives rise to a distribution that is composed of a continuous distribution and a singular distribution, which seems unusual and unlikely for general real-world situations. The multivariate Poisson distribution (Karlis, 2003) is based... |

10 |
On graphical models via univariate exponential family distributions.
- Yang, Ravikumar, et al.
- 2015
(Show Context)
Citation Context ...some recent work on semi-parametric graphical models (Liu et al., 2009) that use Gaussian copulas to relax the assumption of Gaussianity but these models are not parametric and only consider continuous real-valued data. Another line of work assumes that the node conditional distributions—i.e. one variable given the values of all the other variables—are univariate exponential families1 and determines under what conditions a joint distribution exists that is consistent with these node conditional distributions. Besag (1974) developed this multivariate distribution for pairwise dependencies, and Yang et al. (2015) extended this model to n-wise dependencies. Yang et al. (2015) also developed and analyzed an M-estimator based on `1 regularized node-wise regressions to recover the graphical model structure with high probability. Unfortunately, these models only allowed negative dependencies in the case of the exponential and Poisson distributions. Yang et al. (2013) proposed three modifications to the original Poisson model to allow positive dependencies but these modifications alter the Poisson base distribution or require the specification of unintuitive hyperparameters. Allen & Liu (2013) allowed posit... |

6 | Admixture of Poisson MRFs: A Topic Model with Word Dependencies,”
- Inouye, Ravikumar, et al.
- 2014
(Show Context)
Citation Context ...se to Atlanta. These qualitative results suggest that the exponential SQR model is able to capture multiple interesting and intuitive dependencies. 6. Discussion As full probability models, SQR graphical models could be used in any situation where a multivariate distribution is required. For example, SQR models could be used in Bayesian classification by modeling the probability of each class distribution instead of the classical Naive Bayes assumption of independence. As another example, SQR models could be used as the base distribution in mixtures or admixture composite distributions as in (Inouye et al., 2014; Inouye et al.)—similar to multivariate Gaussian mixture models. Another extension would be to consider mixed SQR graphical models in which the joint distribution has variables using different exponential families as base distributions as explored for previous graphical models in (Yang et al., 2014; Tansey et al., 2015). 7. Conclusion We introduce a novel class of graphical models that creates multivariate generalizations for any univariate exponential family with nonnegative sufficient statistics—including Gaussian, discrete, exponential and Poisson distributions. We show that SQR graphical ... |

6 |
On poisson graphical models.
- Yang, Ravikumar, et al.
- 2013
(Show Context)
Citation Context ...nivariate exponential families1 and determines under what conditions a joint distribution exists that is consistent with these node conditional distributions. Besag (1974) developed this multivariate distribution for pairwise dependencies, and Yang et al. (2015) extended this model to n-wise dependencies. Yang et al. (2015) also developed and analyzed an M-estimator based on `1 regularized node-wise regressions to recover the graphical model structure with high probability. Unfortunately, these models only allowed negative dependencies in the case of the exponential and Poisson distributions. Yang et al. (2013) proposed three modifications to the original Poisson model to allow positive dependencies but these modifications alter the Poisson base distribution or require the specification of unintuitive hyperparameters. Allen & Liu (2013) allowed positive dependencies by only requiring the Local Markov property rather than a consistent joint distribution that would have Global Markov properties. In a different approach, Inouye et al. (2015) altered the Poisson generalization by assuming the length of the vector is fixed or known similar to the multinomial distribution in which the number of trials is ... |

4 | Mixed graphical models via exponential families.
- Yang, Baker, et al.
- 2014
(Show Context)
Citation Context ...SQR models could be used in Bayesian classification by modeling the probability of each class distribution instead of the classical Naive Bayes assumption of independence. As another example, SQR models could be used as the base distribution in mixtures or admixture composite distributions as in (Inouye et al., 2014; Inouye et al.)—similar to multivariate Gaussian mixture models. Another extension would be to consider mixed SQR graphical models in which the joint distribution has variables using different exponential families as base distributions as explored for previous graphical models in (Yang et al., 2014; Tansey et al., 2015). 7. Conclusion We introduce a novel class of graphical models that creates multivariate generalizations for any univariate exponential family with nonnegative sufficient statistics—including Gaussian, discrete, exponential and Poisson distributions. We show that SQR graphical models generally have few restrictions on the parameters and thus can model both positive and negative dependencies unlike previous generalized graphical models as represented by (Yang et al., 2015). In particular, for the exponential SQR model, the parameter matrix Φ can have both positive and nega... |

2 | QUIC: Quadratic approximation for sparse inverse covariance estimation.
- Hsieh, Sustik, et al.
- 2014
(Show Context)
Citation Context ...imation methods using sampling. Finally, we demonstrate our exponential generalization on a synthetic dataset and a real-world dataset of airport delay times. 1. Introduction Gaussian, binary and discrete undirected graphical models—or Markov Random Fields (MRF)—have become popular for compactly modeling and studying the structural dependencies between high-dimensional continuous, Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48. Copyright 2016 by the author(s). binary and categorical data respectively (Friedman et al., 2008; Hsieh et al., 2014; Banerjee et al., 2008; Ravikumar et al., 2010; Jalali et al., 2010). However, real-world data does not often fit the assumption that variables come from Gaussian or discrete distributions. For example, word counts in documents are nonnegative integers with many zero values and hence are more appropriately modeled by the Poisson distribution. Yet, an independent Poisson distribution would be insufficient because words are often either positively or negatively related to other words—e.g. the words “machine” and “learning” would often co-occur together in ICML papers (positive dependency) where... |

1 |
Copula Theory and Its Applications.
- Bickel, Diggle, et al.
- 2009
(Show Context)
Citation Context ...on-Gaussian and non-discrete data include highthroughput gene sequencing count data, crime statistics, website visits, survival times, call times and delay times. Though univariate distributions for these types of data have been studied quite extensively, multivariate generalizations have only been given limited attention. One basic approach to forming dependent multivariate distributions is to assume that the marginal distributions are exponentially distributed (Marshall & Olkin, 1967; Embrechts et al., 2003) or Poisson distributed (Karlis, 2003). This idea is related to copula-based models (Bickel et al., 2009) in which a probability distribution is decomposed into the univariate marginal distributions and a copula distribution on the unit hypercube that models the dependency between variables. However, the exponential model in (Marshall & Olkin, Square Root Graphical Models 1967; Embrechts et al., 2003) gives rise to a distribution that is composed of a continuous distribution and a singular distribution, which seems unusual and unlikely for general real-world situations. The multivariate Poisson distribution (Karlis, 2003) is based on the sum of independent Poisson variables and can only model pos... |

1 | Vector-space Markov random fields via exponential families.
- Tansey, Padilla, et al.
- 2015
(Show Context)
Citation Context ... used in Bayesian classification by modeling the probability of each class distribution instead of the classical Naive Bayes assumption of independence. As another example, SQR models could be used as the base distribution in mixtures or admixture composite distributions as in (Inouye et al., 2014; Inouye et al.)—similar to multivariate Gaussian mixture models. Another extension would be to consider mixed SQR graphical models in which the joint distribution has variables using different exponential families as base distributions as explored for previous graphical models in (Yang et al., 2014; Tansey et al., 2015). 7. Conclusion We introduce a novel class of graphical models that creates multivariate generalizations for any univariate exponential family with nonnegative sufficient statistics—including Gaussian, discrete, exponential and Poisson distributions. We show that SQR graphical models generally have few restrictions on the parameters and thus can model both positive and negative dependencies unlike previous generalized graphical models as represented by (Yang et al., 2015). In particular, for the exponential SQR model, the parameter matrix Φ can have both positive and negative dependencies and ... |