1 Introduction
Graphical models provide an appealing framework to characterize dependence in multivariate data
in an intuitive way. This article focuses on undirected graphical models or Markov random fields (MRFs). In this approach, each random variable is assigned as a node of a graph
which is characterized by the pair . Here and denote the set of nodes and set of connected edges of the graph , with and . The graph encodes conditional independence relationships in the data. We say and are conditionally independent if , with denoting all random variables excluding and . Conditional independence between two random variables is equivalent to the absence of an edge between those two corresponding nodes in the graph. Thus the conditional independence of and would imply that the edge is not present i.e. .Although there is a rich literature on graphical models, most of the focus has been specifically on Gaussian graphical models. For bounded discrete data, Ising (Ravikumar et al., 2010; Kolar et al., 2010) and multinomial graphical models (Jalali et al., 2011) have been studied. However, for unbounded countvalued data, the existing literature is limited. Multivariate count data are routinely collected in genomics, sports, imaging analysis and text mining among many other areas, but most of the focus has been on latent factor and covariance structure models (Wedel et al., 2003; Zhou et al., 2012). The goal of this article is to address this gap and provide a flexible framework for statistical inference in count graphical models.
Besag first introduced pairwise graphical models, deemed ‘automodels’ in his seminal paper on MRFs (Besag, 1974). To define a joint distribution on a spatial lattice, he started with an exponential family representation of the marginal distributions and then added first order interaction terms. In the special case of count data, he introduced the Poisson automodel. In this approach, the random variable at the  location
follows a conditional Poisson distribution with mean
, dependent on the neighboring sites. Then is given the form . It can be shown that this conditional density model admits a joint density if and only if for all pairs of . Hence, only nonnegative dependence can be accommodated. Gamma and exponential automodels also have the same restriction due to nonnegativity of the random variables. Yang et al. (2013) proposed to truncate the count support within the Poisson automodel to allow both positive and negative dependence, effectively treating the data as ordered categorical. Allen and Liu (2012) proposed to fit the Poisson graphical model locally in a manner that allows both positive and negative dependencies, but this approach does not address the problem of global inference on G.In the literature on spatial data analysis, many countvalued spatial processes have been proposed, but much of the focus has been on including spatial random effects instead of an explicit graphical structure. De Oliveira (2013)
considered a random field on the mean function of a Poisson model to incorporate spatial dependence. However, conditional independence or dependence structure in the mean does not necessarily represent that of the data. The PoissonLog normal distribution, introduced by
Aitchison and Ho (1989), is popular for analyzing spatial count data (Chan and Ledolter, 1995; Diggle et al., 1998; Chib and Winkelmann, 2001; Hay and Pettitt, 2001). Here also the graph structure of the mean does not necessarily represent that of the given data. Hence, these models cannot be regarded as graphical models for count data. To study areal data, conditional autoregressive models (CAR) have been proposed
(Gelfand and Vounatsou, 2003; De Oliveira, 2012; Wang and Kockelman, 2013). Although these models have an MRFtype structure, they assume the graph is known based on the spatial adjacency structure, while our focus is on inferring unknown .Gaussian copula models are popular to characterize multivariate nonnormal data (XueKun Song, 2000; Murray et al., 2013). Mohammadi et al. (2017) developed a computational algorithm to build graphical models based on Gaussian copulas using methods developed by Dobra et al. (2011). However, it is difficult to model multivariate counts with zero inflated or multimodal marginals using a Gaussian copula.
Within a semiparametric framework, Liu et al. (2009) proposed a nonparanormal graphical model in which an unknown monotone function of the observed data follows a multivariate normal model with unknown mean and precision matrix subject to identifiability restrictions. This model has been popular for continuous data, providing a type of Gaussian copula. For discrete data the model is not directly appropriate, as mapping discrete to continuous data is problematic. To the best of our knowledge, there has been no work on nonparanormal graphical models for counts. In general conditional independence cannot be ensured if the function of the random variable is not continuous. For example if is not monotone continuous, then conditional independence of and does not ensure conditional independence of and .
Apart from proposing a flexible graphical model for counts, we also aim at developing an efficient Bayesian computation scheme. Bayesian computation for Gaussian graphical models (GGMs) is somewhat welldeveloped (Dobra et al., 2011; Wang, 2012, 2015; Mohammadi et al., 2015). Unfortunately, outside of GGMs likelihoodbased inference is often problematic due to intractable normalizing constants in the likelihood functions. For example, the normalizing constant in the Ising model is too expensive to compute except for very small . There are approaches related to surrogate likelihood (Kolar and Xing, 2008) by the relaxation of the logpartition function (Banerjee et al., 2008). Kolar et al. (2010) use conditional likelihood. Besag (1975)
chose a product of conditional likelihoods as a pseudolikelihood to estimate MRFs. For exponential family random graphs,
Van Duijn et al. (2009)compared maximum likelihood estimates with maximum pseudolikelihood estimates in terms of bias, standard errors, coverage rates and efficiency. The pseudolikelihood formulation is attractive since the global normalizing constant in the complete likelihood is replaced by
local normalizing constants. Zhou and Schmidler (2009) numerically compared the estimates from a pseudoposterior with exact likelihood based estimates and found they are very similar in small samples for Ising and Potts models. Also for pseudolikelihood based methods asymptotic unbiasedness and consistency have been studied (Comets, 1992; Jensen and Künsch, 1994; Mase, 2000; Baddeley and Turner, 2000). Pensar et al. (2017) showed consistency of marginal pseudolikelihood for discrete valued MRFs in a Bayesian framework.Recently Dobra et al. (2018) used pseudolikelihood for estimation of their Gaussian copula graphical model. Although pseudolikelihood is popular in the frequentist domain for count data (Inouye et al., 2014; Ravikumar et al., 2010; Yang et al., 2013), its usage is still not very common in Bayesian estimation for countvalued MRFs. This is mainly because calculating conditional densities is expensive for count data due to unbounded support, making posterior computations hard to conduct. We implement an efficient Markov Chain Monte Carlo (MCMC) sampler for our model using pseudolikelihood and pseudoposterior formulations. Our approach relies on a provably accurate approximation to the normalizing constant in the conditional likelihood. We also provide a bound for the approximation error due to the evaluation of the normalizing constant numerically.
In Section 2, we introduce our novel graphical model. In Section 3, some desirable theoretical results are presented. Then we discuss computational strategies in Section 4 and present simulation results in Section 5. We apply our method to neuron spike data in mice in Section 6. We end with some concluding remarks in Section 7.
2 Modeling
Before introducing the model, we define some of the Markov properties related to the conditional independence of an undirected graph. A clique of a graph is the set of nodes where every two distinct nodes are adjacent i.e. connected by an edge. Let us define . For three disjoint sets , and of , is said to be separated from by if every path from to goes through . A path is an ordered sequence of nodes such that . The joint distribution is locally Markov if for any , and such that are conditionally independent. If for three disjoint sets and of , and are independent given whenever and are separated by , the distribution is called globally Markov. The joint density is pairwise Markov if for any such that , and are conditionally independent.
We consider here a pairwise MRF (Wainwright et al., 2007; Chen et al., 2014)
which implies the following joint probability mass function (pmf) for the
dimensional random variable ,(2.1) 
where is called a node potential function, an edge potential function and we have if there is no edge . Thus this distribution is pairwise Markov by construction. Then (2.1) satisfies the HammersleyClifford theorem (Hammersley and Clifford, 1971)
, which states that a probability distribution having a strictly positive density satisfies a Markov property with respect to the undirected graph
if and only if its density can be factorized over the cliques of the graph. Since our pairwise MRF is pairwise Markov, we can represent the joint probability mass function as a product of mass functions of the cliques of graph . The existence of such a factorization implies that this distribution has both global and local Markov properties.Completing a specification of the MRF in (2.1) requires an explicit choice of the potential functions and . In the Gaussian case, one lets and , where and correspond to the diagonal and offdiagonal elements of the precision matrix . In general, the node potential functions can be chosen to target specific univariate marginal densities. If the marginal distribution is Poisson, the appropriate node potential function is . One can then choose the edge potential functions to avoid overly restrictive constraints on the dependence structure, such as only allowing nonnegative correlations. Yang et al. (2013) identify edge potential functions with these properties for count data by truncating the support; for example, to the range observed in the sample. This reduces ability to generalize results, and in practice estimates are sensitive to the truncation level. We propose an alternative construction of the edge potentials that avoids truncation.
2.1 Model
We propose the following modified pmf for dimensional countvalued data ,
where is a monotone increasing bounded function with support , and using the notation of (2.1).
Lemma 1.
Let be uniformly bounded by , then the normalizing constant, say , can be bounded as,
By elementary calculations, these bounds can be obtained. The constant is the sum of the above pmf over the support of . The sum reduces to a product of many exponential series sums after replacing the function by its maximum.
Thus by modifying the edge potential function in this way using a bounded function of , we can allow unrestricted support for all the parameters, allowing one to estimate both positive and negative dependence. Under the monotonicity restrictions on , inference on the conditional independence structure tends to be robust to the specific form chosen. We let for some positive to define a flexible class of monotone increasing bounded functions. The parameter imposes additional flexibility and is calibrated to minimize differences in covariance between and . Figure 1 illustrates how controls the range and shape of . Figure 2 shows how the difference between covariances of and vary for different values of in sparse and non sparse data cases. In both cases, the difference function has a unique minimizer.
Letting denote the independent realization of , for , the pmf is
(2.2) 
where ’s are coefficients of different node potential functions and ’s are coefficients of the edge potential functions as before. We vary with to allow more flexibility in modeling the marginal densities. If , then and are conditionally independent for all . We call our proposed method COunt Nonparametric Graphical Analysis (CONGA).
Now we reparametrize (2.2) using and rewrite the model as,
(2.3) 
This reparametrizated model is more intuitive to understand. Due to the Poisson type marginal in (2.3), this model is suitable for data with overdispersed marginals with respect to the Poisson at each node. Overdispersion is typical in broad applications. We consider this reparametrized model in the rest of the paper.
2.2 Prior structure
To proceed with Bayesian computation, we put priors on the parameters. We have two set of parameters in (2.3), and . For the parameters, we choose simple iid Gaussian priors. It is straightforward to consider more elaborate shrinkage or variable selection priors for the ’s, but we find usual Gaussian priors have good performance in small to moderatedimensional applications
The parameter ’s represent random effects; these parameters are not individually identifiable and are given random effects distributions . The distribution controls overdispersion and the shape of the marginal count distribution for the node. To allow these marginals to be flexibly determined by the data, we take a Bayesian nonparametric approach using Dirichlet process priors DP(), with a Gamma base measure and a precision parameter, having Ga() for increased data adaptivity.
3 Theoretical properties
We explore some of the theoretical properties of our proposed CONGA method.
Theorem 2.
If we have , then and are conditionally independent for all under (2.3).
This result is easy to verify by simply calculating the conditional probabilities. The details of the proof are in the Appendix.
Theorem 3.
Any that follows the probability mass function in (2.3) cannot be under dispersed with respect to the Poisson distribution.
We study posterior consistency under a fixed and increasing regime, assuming the prior of Section 2.2 with prespecified . Let the parameter space for be and that for be , where . Thus the complete parameter space is . We consider the prior on and on .
Let be the truth for . We make following assumptions.
Assumptions

For some large , let . Then and is in the support of .

For some large , let , where stands for the infinity norm. Then and is in the support of .

for all pairs of
Theorem 4.
Under the assumptions 13, the posterior for is consistent at .
We show that the truth belongs to the KullbackLeibler support of the prior. Thus the posterior probability of any neighbourhood around the true p.m.f converges to one in
probability as goes to as a consequence of Schwartz (1965). Here is the distribution of a sample of observations with parameter . Hence, the posterior is weakly consistent. The posterior is said to be strongly consistent if the posterior probability of any neighbourhood around the true p.m.f convergences to one almostsurely. Support of the data is a countable space. The weak and strong topologies on countable spaces are equivalent by Scheffe’s theorem. In particular, weak topology and total variation topology are equivalent for discrete data. Weak consistency implies strong consistency. Thus the posterior for is also strongly consistent at . A detail proof is in the Appendix.Instead of assuming bounded support on the true distribution of random effects, one can also assume it to have subGaussian tails. The posterior consistency result still holds with minor modifications in the current proof.
4 Computation
As motivated in Section 2.1, we estimate to minimize the differences in the sample covariance of and . In particular, the criteria is to minimize . This is a simple one dimensional optimization problem, which is easily solved numerically.
To update the other parameters, we use an MCMC algorithm, building on the approach of Roy et al. (2018). We generate proposals for MetropolisHastings (MH) using a Gibbs sampler derived under an approximated model. To avoid calculation of the global normalizing constant in the complete likelihood, we consider a pseudolikelihood corresponding to a product of conditional likelihoods at each node. This requires calculations of local normalizing constants which is computationally tractable.
The conditional likelihood at the  node is,
(4.1) 
The normalizing constant is
We truncate this sum at a sufficiently large value for the purpose of evaluating the conditional likelihood. The error in this approximation can be bounded by
where
is the cumulative distribution function of the Poisson distribution with mean
evaluated at . The above bound can in tern be bounded by a similar expression with replaced by . One can tune based on the resulting bound on the approximation error. In our simulation setting, even makes the above bound numerically zero. We use as a default choice for all of our computations.We update using the MCMC sampling scheme described in the Chapter 5 of Ghosal and Van der Vaart (2017) for the Dirichlet process mixture prior of based on the above conditional likelihood. For clarity this algorithm is described below:

First calculate the probability vector for each , such and .

Sample an index from with probability .

If , . Otherwise we sample a new value as described below.

is sampled from Gamma, where is the number of unique elements in , is sampled from Beta, and Ga() a priori.
When we have to generate a new value for in step (iii), we consider the following scheme.

Generate a candidate from Gamma.

Adjust the update , where is the current value and is a tuning parameter, adjusted with respect to the acceptance rate of the resulting MetropolisHastings (MH) step.

We use the pseudolikelihood based on the conditional likelihoods in (4.1) to calculate the MH acceptance probability.
To generate , we consider a new likelihood that the standardized
follows a multivariate Gaussian distribution with precision matrix
such that with and . Thus diagonal entries do not change over iterations. We update successively. We also define as the submatrix by removing  row and column. Let . Thus is the gram matrix of , standardized over columns.
Adjust the update , where is the current value and is a tuning parameter, adjusted with respect to the acceptance rate of the following MH step. Also should always be less than .

Use the pseudolikelihood based on the conditional likelihoods in (4.1), multiplying over to calculate MH acceptance probability.
5 Simulation
We consider four different techniques for generating multivariate count data. One approach is based on a Gaussian copula type setting. The other three are based on competing methods. We compare the methods based on false positive and false negative proportions. We include an edge in the graph between the and
nodes if the 95% credible interval for
does not include zero. There is a decision theoretic proof to justify such an approach in Thulin (2014). We compare our method CONGA with TPGM, SPGM, LPGM, huge, BDgraph and ssgraph. The first three are available in R package XMRF and the last two are in R packages BDgraph and ssgraph respectively. The function huge is from R package huge which fits a nonparanormal graphical model. The last two methods fit graphical models using Gaussian copulas and ssgraph uses spike and slab priors in estimating the edges.To simulate data under the first scheme, we follow the steps given below.

Generate many multivariate normals of length from MVN, where is the vector of zeros of length . This produces a matrix of dimension .

We calculate the matrix , which is , where is the cumulative density function of the standard normal.

The Poisson random variable is for a given mean parameter
with QP the quantile function of Poisson(
).
Let denote the  column of . In the above data generation setup, implies that and are conditionally independent due to Lemma 3 of Liu et al. (2009). The marginals are allowed to be multimodal at some of the nodes, which is not possible under the other simulation schemes.
Apart from the above approach, we also generate the data using R package XMRF from the models SubLinear Poisson Graphical Model (SPGM), Truncated Poisson graphical Model (TPGM) (Yang et al., 2013), and Local Poisson Graphical Model (LPGM) (Allen and Liu, 2012).
We choose , which is the prior variance of the normal prior of for all . The choice makes the prior weakly informative. The parameter is chosen to be 5 as given in Wang (2012)
. For the gamma distribution, we consider
. For the Dirichlet process mixture, we take . We consider and . We collect 5000 post burn MCMC samples after burning in 5000 MCMC samples.We compare the methods based on two quantities and . We define these as = Proportion of falsely connected edges in the estimated graph (false positive) and = Proportion of falsely not connected edges in the estimated graph (false negative). We show the pair in Tables 1 to 3 for number of nodes 10, 30 and 50. All of these results are based on 50 replications. To evaluate the performance of CONGA, we calculate the proportion of replications where zero is included in the corresponding 95% credible region, constructed from the MCMC samples for each replication. For the other methods, the results are based on the default regularization as given in the R package XMRF. Our proposed method overwhelmingly outperforms the other methods when the data are generated using a Gaussian copula type setting instead of generating from TPGM, SPGM or LPGM. For other cases, our method performs similarly to competing methods when the number of nodes is large. In these cases, the competing methods TPGM, SPGM or LPGM are levering on modeling assumptions that CONGA avoids. CONGA beats BDgraph and ssgraph in almost all the scenarios in terms of false positive proportions. The false negative proportions are comparable. The function ‘huge’ performs similarly to CONGA when the data are generated using TPGM, SPGM and LPGM. But CONGA is better than all other methods when the data are generated using the Gaussian copula type setting. This is likely due to the fact that the other cases correspond to simulating data from one of the competitors models.
6 Neuron spike count application
The dataset records neuron spike counts in mice across 37 neurons in the brain under the influence of three different external stimuli, 2D sinusoids with vertical gradient, horizontal gradient, and the sum. These neurons are from the same depth of the visual cortex of a mouse. The data are collected for around 400 time points. In Figure 3, we plot the marginal densities of the spike counts of four neurons under the influence of stimuli 0. We see that there are many variations in the marginal densities, and the densities are multimodal for some of the cases. Marginally at each node, we also have that the variance is more than the corresponding mean for each of the three stimuli.
6.1 Estimation
We apply exactly the same computational approach as used in the simulation studies. To additionally obtain a summary of the weight of evidence of an edge between nodes and , we calculate , with the posterior probability estimated from the MCMC samples. We plot the estimated graph with edge thickness proportional to the values of . Thus thicker edges suggest greater evidence of an edge in Figures 4 to 6. To test for similarity in the graph across stimuli, we estimate 95% credible regions for , denoting the difference in the edge weight parameter under stimuli and , respectively. We flag those edges having 95% credible intervals for not including zero as significantly different across stimuli.
6.2 Inference
We find that there are 129, 199 and 110 connected edges respectively for stimuli 0, 1 and 2. Among these edges, 38 are common for stimulus 0 and 1. The number is 15 for stimulus 0 and 2, and 28 for stimulus 1 and 2. There are 6 edges that are common for all of the stimuli. These are (13,16), (8,27), (5,8), (33,35), (3,4) and (9, 14). Each node has at least one edge with another node. We plot the estimated network in Figures 4 to 6. We calculate the number of connected edges for each node and list the 5 most connected nodes in Table 4. We also list the most significant 10 edges for each stimulus in Table 5. We find that the node 27 is present in all of them. This node seems to have a significant interconnections with other nodes for all of the stimuli. We also test the similarity in the estimated weighted network across stimulus. Here we find 82.13% similarity between the estimated weighted networks under the influence of stimulus 0 and 1. It is 84.98% for the pair 0 and 2. For 1 and 2, it is 79.43%. Stimulus 0 is a combination of stimuli 1 and 2. This could be the reason that the estimated graph under influence of stimulus 0 has the highest similarity with the other estimated graphs.
Stimuli 0  Stimuli 1  Stimuli 2  
Node  Number of  Node  Number of  Node  Number of 
number  connected edges  number  connected edges  number  connected edges 
37  12  27  16  32  11 
6  11  3  15  23  10 
9  11  5  14  3  9 
25  11  8  14  18  9 
27  11  23  14  27  9 
Stimuli 0  Stimuli 1  Stimuli 2  
Neuron 1  Neuron 2  Neuron 1  Neuron 2  Neuron 1  Neuron 2 
24  35  24  28  14  30 
26  30  24  30  16  35 
26  37  24  35  21  35 
28  37  24  37  21  36 
29  33  26  28  24  28 
30  32  26  31  24  29 
30  35  28  37  24  37 
31  33  34  37  25  26 
35  36  35  36  26  36 
35  37  36  37  31  36 
7 Discussion
Our count nonparametric graphical analysis (CONGA) method is useful for multivariate count data, and represents a starting point for more elaborate models and other research directions. One important direction is to time series data. In this respect, a simple extension is to define an autoregressive process for the baseline parameters , inducing correlation in and , while leaving the graph as fixed in time. A more elaborate extension would instead allow the graph to evolve dynamically by replacing the parameters with , again defining an appropriate autoregressive process.
An additional interesting direction is to flexible graphical modeling of continuous positivevalued multivariate data. Such a modification is conceptually straightforward by changing the term to the corresponding term in the gamma distribution.
Acknowledgements
This research was partially supported by grant R01ES02749801A1 from the National Institute of Environmental Health Sciences (NIEHS) of the National Institutes of Health (NIH).
Appendix
Proof of Theorem 2
The conditional probability is given by,
where and . Since , we can break the exponentiated terms into two such that and are separated out. That would immediately give us, .
Proof of Theorem 4
For the space of probability measure
, let the KullbackLeibler divergences be given by
Let us denote as the probability distribution of the data given below,
where is the normalizing constant and . It is easy to verify that and .
Then it is easy to verify that,
where . We have due to the last assumption. From the dominated convergence theorem as , we have converges to . Thus KullbackLeibler divergences go to zero.
Thus the posterior is weakly consistent. The weak and strong topologies on countable spaces are equivalent by Scheffe’s theorem. Thus the posterior for is also strongly consistent at .
References
 Aitchison and Ho (1989) J. Aitchison and C. Ho. The multivariate Poissonlog normal distribution Biometrika 76 (1989): 643–653.
 Allen and Liu (2012) G. I. Allen and Z. Liu. A loglinear graphical model for inferring genetic networks from highthroughput sequencing data Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on. 2012, 1–6.
 Baddeley and Turner (2000) A. Baddeley and R. Turner. Practical Maximum Pseudolikelihood for Spatial Point Patterns: (with Discussion) Australian & New Zealand Journal of Statistics 42 (2000): 283–322.

Banerjee
et al. (2008)
O. Banerjee, L. E. Ghaoui, and A. d’Aspremont.
Model selection through sparse maximum likelihood estimation for
multivariate aussian or binary data
Journal of Machine Learning Research
9 (2008): 485–516.  Besag (1974) J. Besag. Spatial interaction and the statistical analysis of lattice systems Journal of the Royal Statistical Society. Series B (Methodological) (1974): 192–236.
 Besag (1975) J. Besag. Statistical analysis of nonlattice data The Statistician (1975): 179–195.
 Chan and Ledolter (1995) K. Chan and J. Ledolter. Monte Carlo EM estimation for time series models involving counts Journal of the American Statistical Association 90 (1995): 242–252.
 Chen et al. (2014) S. Chen, D. M. Witten, and A. Shojaie. Selection and estimation for mixed graphical models Biometrika 102 (2014): 47–64.
 Chib and Winkelmann (2001) S. Chib and R. Winkelmann. Markov chain Monte Carlo analysis of correlated count data Journal of Business & Economic Statistics 19 (2001): 428–435.
 Comets (1992) F. Comets. On consistency of a class of estimators for exponential families of Markov random fields on the lattice The Annals of Statistics (1992): 455–468.
 De Oliveira (2012) V. De Oliveira. Bayesian analysis of conditional autoregressive models Annals of the Institute of Statistical Mathematics 64 (2012): 107–133.

De Oliveira (2013)
V. De Oliveira.
Hierarchical Poisson models for spatial count data
Journal of Multivariate Analysis
122 (2013): 393–408.  Diggle et al. (1998) P. J. Diggle, J. Tawn, and R. Moyeed. Modelbased geostatistics Journal of the Royal Statistical Society: Series C (Applied Statistics) 47 (1998): 299–350.
 Dobra et al. (2011) A. Dobra, A. Lenkoski, et al. Copula Gaussian graphical models and their application to modeling functional disability data The Annals of Applied Statistics 5 (2011): 969–993.
 Dobra et al. (2011) A. Dobra, A. Lenkoski, and A. Rodriguez. Bayesian inference for general Gaussian graphical models with application to multivariate lattice data Journal of the American Statistical Association 106 (2011): 1418–1433.
 Dobra et al. (2018) A. Dobra, R. Mohammadi, et al. Loglinear model selection and human mobility The Annals of Applied Statistics 12 (2018): 815–845.
 Gelfand and Vounatsou (2003) A. E. Gelfand and P. Vounatsou. Proper multivariate conditional autoregressive models for spatial data analysis Biostatistics 4 (2003): 11–15.
 Ghosal and Van der Vaart (2017) S. Ghosal and A. Van der Vaart. Fundamentals of nonparametric Bayesian inference. Volume 44 . Cambridge University Press, 2017.
 Hammersley and Clifford (1971) J. M. Hammersley and P. Clifford. Markov fields on finite graphs and lattices (1971).
 Hay and Pettitt (2001) J. L. Hay and A. N. Pettitt. Bayesian analysis of a time series of counts with covariates: an application to the control of an infectious disease Biostatistics 2 (2001): 433–444.
 Inouye et al. (2014) D. Inouye, P. Ravikumar, and I. Dhillon. Admixture of Poisson MRFs: A topic model with word dependencies International Conference on Machine Learning. 2014, 683–691.

Jalali et al. (2011)
A. Jalali, P. Ravikumar, V. Vasuki, and S. Sanghavi.
On learning discrete graphical models using groupsparse
regularization
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics
. 2011, 378–387.  Jensen and Künsch (1994) J. L. Jensen and H. R. Künsch. On asymptotic normality of pseudo likelihood estimates for pairwise interaction processes Annals of the Institute of Statistical Mathematics 46 (1994): 475–486.
 Kolar et al. (2010) M. Kolar, L. Song, A. Ahmed, E. P. Xing, et al. Estimating timevarying networks The Annals of Applied Statistics 4 (2010): 94–123.
 Kolar and Xing (2008) M. Kolar and E. P. Xing. Improved estimation of highdimensional Ising models arXiv preprint arXiv:0811.1239 (2008).
 Liu et al. (2009) H. Liu, J. Lafferty, and L. Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs Journal of Machine Learning Research 10 (2009): 2295–2328.
 Mase (2000) S. Mase. Marked Gibbs processes and asymptotic normality of maximum pseudolikelihood estimators Mathematische Nachrichten 209 (2000): 151–169.
 Mohammadi et al. (2017) A. Mohammadi, F. Abegaz, E. van den Heuvel, and E. C. Wit. Bayesian modelling of Dupuytren disease by using Gaussian copula graphical models Journal of the Royal Statistical Society: Series C (Applied Statistics) 66 (2017): 629–645.
 Mohammadi et al. (2015) A. Mohammadi, E. C. Wit, et al. Bayesian structure learning in sparse Gaussian graphical models Bayesian Analysis 10 (2015): 109–138.
 Murray et al. (2013) J. S. Murray, D. B. Dunson, L. Carin, and J. E. Lucas. Bayesian Gaussian copula factor models for mixed data Journal of the American Statistical Association 108 (2013): 656–665.
 Pensar et al. (2017) J. Pensar, H. Nyman, J. Niiranen, J. Corander, et al. Marginal pseudolikelihood learning of discrete Markov network structures Bayesian analysis 12 (2017): 1195–1215.

Ravikumar et al. (2010)
P. Ravikumar, M. J. Wainwright, J. D. Lafferty, et al.
Highdimensional Ising model selection using
1regularized logistic regression
The Annals of Statistics 38 (2010): 1287–1319.  Roy et al. (2018) A. Roy, B. J. Reich, J. Guinness, R. T. Shinohara, and A.M. Staicu. Spatial shrinkage via the product independent Gaussian process prior arXiv preprint arXiv:1805.03240 (2018).
 Schwartz (1965) L. Schwartz. On bayes procedures Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 4 (1965): 10–26.
 Thulin (2014) M. Thulin. Decisiontheoretic justifications for Bayesian hypothesis testing using credible sets Journal of Statistical Planning and Inference 146 (2014): 133–138.
 Van Duijn et al. (2009) M. A. Van Duijn, K. J. Gile, and M. S. Handcock. A framework for the comparison of maximum pseudolikelihood and maximum likelihood estimation of exponential family random graph models Social Networks 31 (2009): 52–62.
 Wainwright et al. (2007) M. J. Wainwright, J. D. Lafferty, and P. K. Ravikumar. HighDimensional Graphical Model Selection Using Regularized Logistic Regression Advances in neural information processing systems. 2007, 1465–1472.
 Wang (2012) H. Wang. Bayesian graphical lasso models and efficient posterior computation Bayesian Analysis 7 (2012): 867–886.
 Wang (2015) H. Wang. Scaling it up: Stochastic search structure learning in graphical models Bayesian Analysis 10 (2015): 351–377.
 Wang and Kockelman (2013) Y. Wang and K. M. Kockelman. A Poissonlognormal conditionalautoregressive model for multivariate spatial analysis of pedestrian crash counts across neighborhoods Accident Analysis & Prevention 60 (2013): 71–84.
 Wedel et al. (2003) M. Wedel, U. Böckenholt, and W. A. Kamakura. Factor models for multivariate count data Journal of Multivariate Analysis 87 (2003): 356–369.
 XueKun Song (2000) P. XueKun Song. Multivariate dispersion models generated from Gaussian copula Scandinavian Journal of Statistics 27 (2000): 305–320.
 Yang et al. (2013) E. Yang, P. K. Ravikumar, G. I. Allen, and Z. Liu. On Poisson graphical models Advances in Neural Information Processing Systems. 2013, 1718–1726.
 Zhou et al. (2012) M. Zhou, L. A. Hannah, D. B. Dunson, and L. Carin. Betanegative binomial process and Poisson factor analysis In AISTATS (2012): 1462–1471.
 Zhou and Schmidler (2009) X. Zhou and S. C. Schmidler. Bayesian parameter estimation in Ising and Potts models: A comparative study with applications to protein modeling. Technical report, Duke University, 2009.
Comments
There are no comments yet.