(Advisor: Giulianna Davidoff)
Fall 2016 Math/Stat Club Talks
(Held in Clapp 416: 12:15 pm-1:00 pm, unless noted)
Friday, December 9, 2016
Speaker: Alicia Specht
iCC: a new method for estimating network relationships for non-Gaussian data
Gene co-expression networks (GCNs) are widely used to understand gene regulations and infer gene functions. This talk will discuss a novel method, iCC, for constructing GCNs using RNA-Sequencing data. iCC defines a new correlation measure that can be applied to data following any known distribution to construct undirected GCNs, and is robust to the effects of outliers. iCC's effectiveness is examined using simulations and an Escherichia coli dataset, and compared to commonly used transformation techniques.
Monday, December 5, 2016
Speaker: Qingcong Yuan
A New Class of Measures for Independence Test with Its Application in Big Data
Abstract: A new class of measures for testing independence between two random vectors, using characteristic functions, is proposed. By choosing a particular weight function in the class, we study a new index for measuring independence and its property. Sample versions and their asymptotic properties using different estimations are developed. We demonstrate the advantage of our methods via simulations and real data analysis. In particular, we develop a two-stage sufficient variable selection procedure and a sufficient dimension reduction method to illustrate the effective use of our methods in big data analysis.
Friday, December 2, 2016
Speaker Evan Ray
Statistical Methods for Predicting Infectious Disease
Public health agencies like the Centers for Disease Control and Prevention (CDC) would like to have reliable predictions of future infectious disease dynamics to help plan resource allocation and interventions. In this talk, I will discuss ongoing research that has been motivated by participation in a series of disease prediction contests run by the CDC and other United States federal government agencies. In the first part of the talk, I will describe a semi-parametric prediction method that combines kernel conditional density estimation with copulas to obtain a joint predictive density for disease incidence in all future weeks of the disease season. In applications to predicting dengue fever and influenza, we demonstrate that this method outperforms commonly used approaches to disease prediction in many settings. I will then briefly outline some directions for future research in this area, including the use of ensemble methods that combine predictive distributions from many different models and exploration of models that make predictions for multiple spatial units, such as in each state in the country.
Monday November 29, 2016
Through genome-wide association studies, numerous genes have been shown to be associated with multiple phenotypes. To determine the overlap of genetic susceptibility of correlated phenotypes, one can apply multivariate regression or dimension reduction techniques, such as principal components analysis, and test for the association with the principal components of the phenotypes rather than the individual phenotypes.
However, as these approaches test whether there is a genetic effect for at least one of the phenotypes, a significant test result does not necessarily imply pleiotropy. Recently, a method called Pleiotropy Estimation and Test Bootstrap (PET-B) has been proposed to specifically test for pleiotropy, (i.e. that 2 normally distributed phenotypes are both associated with the single nucleotide polymorphism (SNP) of interest). While this method examines the genetic overlap between the 2 quantitative phenotypes, the extension to binary phenotypes, 3 or more phenotypes, and rare variants is not straightforward. Two approaches to formally test this pleiotropic relationship in multiple scenarios will be presented. These approaches depend on permuting the genotype of interest and comparing the set of observed p-values to the set of permuted p-values in relation to the origin (e.g. a vector of zeros) either using the Hausdorff metric or a cut-off based approach.
These approaches are appropriate for categorical and quantitative phenotypes, more than 2 phenotypes, common variants and rare variants. These approaches are evaluated under various simulation scenarios and applied to the COPDGene study, a case-control study of Chronic Obstructive Pulmonary Disease (COPD) in current and former smokers.
Wednesday, November 2, 2016
Raji Balasubramanian, Associate Professor @ UMass, talks about Biostatistics graduate programs at UMass-Amherst!
Wednesday, November 9th, 2016
George Cobb, Professor of Statistics emeritus
What is Markov Chain Monte Carlo, What Can It Do, and How Does It Work? Three Stories of Important Applications
Background. In my earlier talk, I argued that Markov Chain Monte Carlo (MCMC) can be seen as one of three major developments in the 2500-year history of the integral, with “integral” defined informally as “how to add up a very large number of very small numbers.”
(1) In classical Greece, the integral was part of the attempt to understand motion, area, and the infinite.
(2) At the dawn of the Enlightenment, Newton used antiderivatives as a shortcut for computing integrals, and used integrals to explain the motion of the planets.
(3) Beginning around 1950, MCMC made it possible, for the first time, to compute integrals for applied high-dimensional problems that could not be solved using traditional methods.
What is MCMC? (There are two MCs: Markov Chains and Monte Carlo.)
(1) Markov chains: If you have ever played Monopoly or shuffled a deck of cards, you already know about Markov chains: random processes for which the probabilities for where you go next depend only on where you are now.
(2) Monte Carlo is a systematic way to use random numbers to approximate integrals that can’t be computed by traditional methods. MCMC is now widely used to compute sums, averages, probabilities, and total net change -- integrals that occur throughout the sciences but are too hard for calculus-based methods.
Applications: I’ll compress the history and logic of MCMC into three sets of examples.
(1) 1953: MCMC was invented to solve a problem related to the nuclear physics of the hydrogen bomb. I’ll also tell the story of what must surely be the nastiest academic food fight ever about a big table of 0s and 1s, arising from a problem in ecology – how species distribute themselves on the islands of the Galapagos.
(2) 1983. A second major development was a method called the Gibbs sampler, first applied to image processing. I’ll illustrate the method using a problem from molecular biology.
(3) 1990. The third major development generalized MCMC to compute a very broad class of integrals. A study of lung cancer rates and an Ohio uranium plant illustrates what are called multilevel statistical models, one of the most important areas of applied data analysis of the last 20 years.
Wednesday, November 2nd, 2016
Spring courses! Come hear about our spring courses offered here in the math/stat department and around the 5 colleges!
Wednesday, October 26th, 2016: Clapp 407 (note location)
Joint CS/Math/Stat Club
Come hear about the summer experiences of your fellow students. Sophia Gu will talk about an opportunity with Google during J-term, Em Castner, and Deepshikha Adhikari will present their experiences with Google and data science internships. Food will be served in 416.
Wednesday, October 19th, 2016
George Cobb, Ph.D. Markov Chain Monte Carlo and the Integral: A 2500-year Drama in Three Acts
I plan to address my premise historically via the standard academic cliché of compare/contrast based on (a) chronology, (b) big questions, (c) available technology, and (d) mathematical theory. In my planned talk I oversimplify shamelessly, reducing the stories supporting my premise to three acts:
Act I: Classical Greece (around 350 BCE): motion and area; infinite sums and adding a geometric series.
Act II: Enlightenment (2nd half of the 1600s, after two millennia of dormancy): total net change and area; the Fundamental Theorem of Calculus tells how to convert instantaneous rate of change to total net change, allowing Newton to explain the universe.
Act III: Computer age (Two-and-a-half centuries later, starting around 1950): the Ergodic Theorem tells how to compute the integrals that defeat calculus. Applications include robotics, computer vision, molecular biology, and the ecology of species distribution. Don’t be put off by “Fundamental Theorem of Calculus” and “Ergodic Theorem”. If the phrase “add up a very large number of very small numbers” makes sense, you have all the mathematical background you need for my talk.
Wednesday, October 12th, 2016
Come join Erin Mullin and Alex Wellnitz to learn about Big O - Little O program and to play games. It's going to be a social meeting to get to know each other and develop a stronger sense of community.
Wednesday, October 5th, 2016
Joint CS/Math/Stat Club Lunch
Wednesday September 21, 2016, Clapp 407 : Turan's Problem (Annie Raymond)
What is the maximum number of edges in a graph on n vertices without triangles? Mantel's answer in 1907 that at most half of the edges can be present started a new field: extremal combinatorics. More generally, what is the maximum number of edges in a n-vertex graph that does not contain any subgraph isomorphic to H? What about if you consider hypergraphs instead of graphs? We will explore different strategies to attack such problems, calling upon combinatorics, integer programming, semidefinite programming and flag algebras. We will conclude with some recent work where we embed the flag algebra techniques in more standard methods. This is joint work with James Saunderson, Mohit Singh and Rekha Thomas.
Wednesday, September 14, 2016: Welcome Meeting
This first meeting will be organizational. You will meet math/stat faculty, hear about the student groups working on various types of problems, find out what kinds of activities the club usually has and brainstorm about what it might have. We'll let you know what is already on the agenda, but bring lots of good new ideas with you.