Lunch and Posters
Stick-Breaking Neural Latent Variable Models
Latent Bayesian neural networks and neural processes define a class of neural latent variable models. We extend this class to an infinite dimensional space by imposing a stick-breaking prior on the latent space. Using Stochastic Gradient Variational Bayes, we perform posterior inference for the weights of the stick-breaking process. Specifically, we develop the stick-breaking neural process (SB-NP). SB-NPs are able to learn the structure of the latent space and are shown to be highly expressive in modelling posterior uncertainty.
Alex Gao is a 2nd year PhD student in the Department of Statistical Sciences at the University of Toronto. Co-authors: Daniel Flam-Shepherd and Zhaoyu Guo.
A Polar Coordinate Re-Parameterization of the Matern Covariance Function
This poster develops an efficient MCMC algorithm for fitting anisotropic spatial models using a novel polar coordinate re-parametrization of the covariance function. The polar angle is half the anisotropic angle and the radius represents the (shifted) anisotropic ratio. Under this re-parameterization, the parameter values at the origin correspond to an isotropic Matern function. We demonstrate our proposed model on isotropic and anisotropic data, and show that it can accurately estimate model parameters in both cases.
Kamal is a 3rd year PhD student in the Department of Statistical Sciences at the University of Toronto. Co-author: Patrick Brown.
A multivariate Bayesian time series model for pollution data from Vancouver and Delhi
Exposure assessment plays a key role in epidemiology studies. However, the pollution data are usually noisy and with systematic missing values and outliers. A sollution is to model the pollution data , the results from which is then used to predict exposure. In this study, we propose a multivariate time series model within Bayesian framework to model four well known health related pollutants, PM$_{2.5}$, PM$_{10}$, NO$_2$ and O$_3$ in Vancouver, Canada and Delhi, India from 2014-2016 and 2015-2017, respectively. The model is a mixture of gamma and half cauchy model, with the latter used to capture the heavy tail of data distribution. Seasonally varying covariance among pollutants is allowed, and the model is implemented in stan with parallel subthreads to speed computation.
Guowen is a postdoc in the department of statistics at the University of Toronto. Co-author: Patrick Brown.
On set-based association tests: insights gained from a regression, based on summary statistics
Set-based tests that jointly analyze a group of variants have been applied in various settings, including rare variants analysis. In each case, myriad tests have been proposed with competing and sometimes conflicting claims in terms of method performance. We developed a unifying regression framework based on summary statistics. We show that existing tests such as burden, SKAT and SKAT-O can be derived as a special case from this new architecture. Moreover, the regression framework provides a straightforward way to account for any inherent correlations between the variables in a set. Subsequently, we propose new test statistics that are SKAT-O-like, but without the apparent ``optimal'' weighting. Under mild conditions, we show that the asymptotic distribution of the new test statistics are Chi-square distributed, and we provide accurate p-value calculation for finite samples with respect to the number of variables analyzed in a set. Focusing on rare variants analysis and through extensive simulation studies, we show that the proposed method is often, but not always, more powerful than SKAT-O across a number of alternative scenarios. In addition, the summary statistics-based regression framework is easy to implement, and it provides a principled way for additional improvements such as incorporating available genomic information for data integration.
Yanyan Zhao is a 2nd year PDF in the Department of Statistical Sciences at the University of Toronto. Co-author: Lei Sun
SOS: are genotype-based association tests robust to departure from Hardy-Weinberg equilibrium?
Though the exact origin of the argument is debatable, it is agreed that genotype-based association tests are robust to the departure from Hardy-Weinberg equilibrium (HWE), or not sensitive to Hardy-Weinberg disequilibrium (HWD). Indeed, empirical evidence appears to support this statement, particularly in studies using unrelated individuals. But how about related individuals? Linear mixed model (LMM) has become the most popular method for family-based association studies. The covariance structure of the phenotype is partitioned into a weighted sum of correlation structure due to genetic relatedness (or kinship coefficient) and shared environmental effect, where the weight is usually referred as 'heritability'. We show theoretically that if the causal SNP(s) are in HWD in the founders, the kinship coefficient matrix no longer captures the true relatedness structure between the genotypes in the related individuals. Thus, LMM may not be robust to the departure from HWE. Consider the simplest sib-pair design and assume the genotypes of the (unrelated) parents are in HWD but not available, the genotypes of the siblings can be shown to be in HWE; a textbook example demonstrating that HWE can be achieved after one single generation of random mating. However, we can also show theoretically that when it comes to association analysis, HWD in the founder generation affects the result through the covariance structure between the sibling genotypes. When heritability is known, based on literature, our simulation studies show that the empirical type 1 error rate can get as high as 10% at a 5% significance level. When heritability is estimated internally by LMM, type 1 error is well controlled but the estimated heritability is then biased. In practice, the problem occurs only if the true causal variants depart from HWE considerably and the sample consists mostly of related individuals. However, as we move beyond low-hanging fruit and continue to struggle with the missing heritability issue, our findings can be impactful. Furthermore, we propose an alternative approach to family-based association studies that is robust to the departure from HWE and remains powerful in the absence of HWD.
Lin is a 4th year PhD student in the Department of Statistical Sciences at the University of Toronto. Her thesis work focuses on statistical genetics under the supervision of Professor Lei Sun.
An Escape Time Analysis of SGD
Various stochastic differential equations (SDEs) have been proposed as models for the dynamics of stochastic gradient descent (SGD) in the nonconvex setting. A key challenge is understanding the role of the minibatch gradient whose covariance matrix determines the diffusion matrix of the approximating SDEs. While the covariance is in practice far from isotropic, analyses of these models often make this simplifying approximation in order to yield analytic solutions. Instead, we analyze the actual nonisotropic covariance in the neighborhood of critical points, characterizing the time it takes for the corresponding SDE to escape the neighborhood. Our lower and upper bounds characterize the role of the minibatch covariance and Hessian in determining how quickly the SDE escapes from critical points. Applying our theory to understanding SGD, SGD escapes more quickly from minima that are subject to higher levels of "minibatch" noise. Empirical studies of SGD on MNIST support these predictions.
Mufan is a second year PhD student in the Department of Statistical Sciences at the University of Toronto. Co-authors: Philippe Casgrain, Karolina Dziugaite, Daniel Roy