{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Tutorial: The UseR Conference 2016\n", " \n", "\n", "### Genome-Wide Association Analysis and Post-Analytic Interrogation with R \n", "\n", " \n", "### Part 2\n", " \n", "\n", "#### Instructor: Professor Andrea S. Foulkes \n", "\n", "Department of Mathematics and Statistics, Mount Holyoke College" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Acknowledgements: \n", "\n", " Special thanks to Sara Nunez , Reserach Scientist, Mount Holyoke College, for work on the development of code snippets.\n", "\n", "Support for this tutorial was provided by:\n", "\n", "NIH/NHLBI R01 107196 (PI: Foulkes)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### A Jupyter notebook \n", "for this tutorial is available at:\n", "\n", " \n", "\n", " \n", "\n", "http://nbviewer.jupyter.org/urls/www.mtholyoke.edu/courses/afoulkes/UseR2016/UseR2016-Part2.ipynb" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "## Outline\n", " \n", "\n", "#### Part 1: Core analytic components of a standard GWAS\n", " \n", "\n", "#### Part 2: Interrogation of regional and higher-order associations.\n", "\n", "1. Motivation\n", "2. Sequence Kernel Association Test (SKAT)\n", "3. Class level analytic strategies - GenCAT, VEGAS and QT" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Download and unzip data needed for this tutorial:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "library(downloader)\n", "download(\"https://www.mtholyoke.edu/courses/afoulkes/UseR2016_Tutorial_Files.zip\",\n", " \"UseR2016_Tutorial_Files.zip\")\n", "unzip(\"UseR2016_Tutorial_Files.zip\", exdir = \"/Users/afoulkes/UseR2016/\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Part 2: Interrogation of regional and higher-order associations.\n", "1. Motivation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 1. Motivation\n", "\n", "- Large-scale genome-wide association (GWA) meta-analyses have become routine practice for\n", "discovery of the genetic underpinnings of complex traits, such as cardiometabolic disease\n", "(CMD). " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Several resulting meta-analysis resources, including summary level information on\n", "association between each of several million typed and imputed SNPs and a well-defined trait,\n", "are now publicly available. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 1. Motivation _continued_\n", "\n", "- At the same time, we see a growing number of classifications or taxonomies\n", "of the genome—for example, protein coding genes, epigenetic marks, enhancer elements\n", "and non-coding RNAs—herein referred to as classes. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Additionally, we have refined knowledge regarding the linkage-disequilibrium (LD) structure across the genome via the 1000 genomes project. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 1. Motivation _continued_\n", "\n", "- In turn, this presents new opportunity to further potentiate the extensive,\n", "existing data resources through application of theoretically sound methodological advancements\n", "that integrate these multiple knowledge components. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In this tutorial, we leverage these\n", "multiple existing knowledge resources to demonstrate the *added value of applying a\n", "class-level testing strategy to complement more routine analysis practice.*" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 1. Motivation _continued_\n", "\n", "- The minP approach to evaluating association between\n", "protein coding genes and a trait is to ascribe SNPs with p-values less than a Bonferroni corrected threshold ($p \\le 5 \\times 10^{−8}$) to genes and to declare these genes statistically meaningful. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- While the minP approach has led to a large number of novel gene discoveries, it is limited in that a gene must contain at least one SNP with a very large signal to be identified." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 1. Motivation _continued_\n", "\n", "- MinP may not capture genes with multiple SNPs that have moderate signals, which in combination are genetically, biologically, clinically, and statistically meaningful." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 1. Motivation _continued_\n", "\n", "Two representative and established analytic frameworks considered: \n", "\n", "- SKAT \n", "- GenCAT (strong theoretical relationships to VEGAS and QT)\n", "\n", "with relative advantages for alternative settings." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Part 2: Interrogation of regional and higher-order associations.\n", "\n", "1. Motivation $\\checkmark$\n", "2. Sequence Kernel Association Test (SKAT)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. Sequence Kernel Association Test (SKAT)$^*$\n", "\n", "Framework for testing for regional association of (rare or common) genetic variants with a quantitative or binary trait.\n", "\n", "$^*$_Wu et al. (2011) The American Journal of Human Genetics 89, 82–93._" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. SKAT _method_\n", "\n", "Linear model (for a quantitative trait): \n", "\n", "\\begin{eqnarray*}\n", "y_i = \\alpha_0 + \\pmb{\\alpha}^T{\\bf X}_i +\\pmb{\\beta}^T{\\bf G}_i + \\epsilon_i\n", "\\end{eqnarray*}\n", "\n", "$\\alpha_0$ is the intercept, $\\pmb{\\alpha}^T = (\\alpha_1,...,\\alpha_m)^T$ is the vector\n", "of regression coefficients for m covariates, $\\pmb{\\beta}^T = (\\beta_1,...,\\beta_p)^T$ is the\n", "vector of regression coefficients for p genetic variants, and $\\epsilon_i \\sim N(0,\\sigma^2)$ is the error term." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. SKAT _method_\n", "\n", "Interest is in testing: \n", "\n", "\\begin{eqnarray*}\n", "H_0: \\pmb{\\beta}^T={\\bf 0}\n", "\\end{eqnarray*}\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Additionally assume $\\beta_j$ follows an arbitrary mean $0$ distribution with variance $w_j\\tau$ for prespecified weights $w_j$. So the above null hypothesis can be rewritten:\n", "\n", "\\begin{eqnarray*}\n", "H_0: \\tau = 0\n", "\\end{eqnarray*}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$\\rightarrow$ Variance component score test for correspondig mixed model" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. SKAT _data prep_\n", "\n", "In order to implement SKAT, we first call the necessary R packages and set our working directory:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Loading required package: survival\n", "Loading required package: Matrix\n", "Loading required package: dplyr\n", "\n", "Attaching package: ‘dplyr’\n", "\n", "The following objects are masked from ‘package:stats’:\n", "\n", " filter, lag\n", "\n", "The following objects are masked from ‘package:base’:\n", "\n", " intersect, setdiff, setequal, union\n", "\n", "Loading required package: doParallel\n", "Loading required package: foreach\n", "Loading required package: iterators\n", "Loading required package: parallel\n", "Loading required package: ggplot2\n" ] } ], "source": [ "library(snpStats)\n", "library(SKAT)\n", "library(GenCAT)\n", "setwd(\"/Users/afoulkes/UseR2016/\")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. SKAT _data prep_\n", "\n", "- The GWAStutorial_clinical.csv file contains clinical data. \n", "- We maintain observations with complete data on CAD, sex, age, TG, HDL-C and LDL-C:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
210004 1 2 50 55 23 75
310005 1 1 55 105 37 69
410007 1 1 52 314 54 108
510008 1 1 58 161 40 94
610009 1 1 59 171 46 92
710010 1 1 54 147 69 109
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllll}\n", " & FamID & CAD & sex & age & tg & hdl & ldl\\\\\n", "\\hline\n", "\t2 & 10004 & 1 & 2 & 50 & 55 & 23 & 75\\\\\n", "\t3 & 10005 & 1 & 1 & 55 & 105 & 37 & 69\\\\\n", "\t4 & 10007 & 1 & 1 & 52 & 314 & 54 & 108\\\\\n", "\t5 & 10008 & 1 & 1 & 58 & 161 & 40 & 94\\\\\n", "\t6 & 10009 & 1 & 1 & 59 & 171 & 46 & 92\\\\\n", "\t7 & 10010 & 1 & 1 & 54 & 147 & 69 & 109\\\\\n", "\\end{tabular}\n" ], "text/plain": [ " FamID CAD sex age tg hdl ldl\n", "2 10004 1 2 50 55 23 75\n", "3 10005 1 1 55 105 37 69\n", "4 10007 1 1 52 314 54 108\n", "5 10008 1 1 58 161 40 94\n", "6 10009 1 1 59 171 46 92\n", "7 10010 1 1 54 147 69 109" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "clinical <- read.csv(\"UseR2016_Tutorial_Files/GWAStutorial_clinical.csv\",\n", " stringsAsFactors = F)\n", "clinical.c <- clinical[complete.cases(clinical),]\n", "head(clinical.c)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. SKAT _data prep_\n", "\n", "The .bim, .bed and .fam files contain the genotype information and are read into R using the read.plink() function:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A SnpMatrix with 1401 rows and 861473 columns\n", "Row names: 10002 ... 11596 \n", "Col names: rs10458597 ... rs5970564 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
rs10458597rs12565286rs12082473rs3094315rs2286139rs11240776rs2980319
10002 2 2 2 2 2NA 2
100042021222
100052222222
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllll}\n", " & rs10458597 & rs12565286 & rs12082473 & rs3094315 & rs2286139 & rs11240776 & rs2980319\\\\\n", "\\hline\n", "\t10002 & 2 & 2 & 2 & 2 & 2 & NA & 2\\\\\n", "\t10004 & 2 & 0 & 2 & 1 & 2 & 2 & 2\\\\\n", "\t10005 & 2 & 2 & 2 & 2 & 2 & 2 & 2\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "1. 2\n", "2. 2\n", "3. 2\n", "4. 2\n", "5. 0\n", "6. 2\n", "7. 2\n", "8. 2\n", "9. 2\n", "10. 2\n", "11. 1\n", "12. 2\n", "13. 2\n", "14. 2\n", "15. 2\n", "16. NA\n", "17. 2\n", "18. 2\n", "19. 2\n", "20. 2\n", "21. 2\n", "\n", "\n" ], "text/plain": [ " rs10458597 rs12565286 rs12082473 rs3094315 rs2286139 rs11240776 rs2980319\n", "10002 2 2 2 2 2 NA 2\n", "10004 2 0 2 1 2 2 2\n", "10005 2 2 2 2 2 2 2" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "geno <- read.plink(\"UseR2016_Tutorial_Files/GWAStutorial\",\n", " na.strings=c(\"-9\"))\n", "genotype <- geno$genotypes\n", "print(genotype)\n", "genotypeNumeric <- as(genotype,\"numeric\")\n", "genotypeNumeric[1:3,1:7] " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. SKAT _data prep_\n", "\n", "- The geno object also contains information on each SNP:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", " chromosomesnp.namecMpositionallele.1allele.2 rs104585971 rs104585970 564621 NA C rs125652861 rs125652860 721290 G C rs120824731 rs120824730 740857 T C rs30943151 rs30943150 752566 C T rs22861391 rs22861390 761732 C T rs112407761 rs112407760 765269 G A \n" ], "text/latex": [ "\\begin{tabular}{r|llllll}\n", " & chromosome & snp.name & cM & position & allele.1 & allele.2\\\\\n", "\\hline\n", "\trs10458597 & 1 & rs10458597 & 0 & 564621 & NA & C \\\\\n", "\trs12565286 & 1 & rs12565286 & 0 & 721290 & G & C \\\\\n", "\trs12082473 & 1 & rs12082473 & 0 & 740857 & T & C \\\\\n", "\trs3094315 & 1 & rs3094315 & 0 & 752566 & C & T \\\\\n", "\trs2286139 & 1 & rs2286139 & 0 & 761732 & C & T \\\\\n", "\trs11240776 & 1 & rs11240776 & 0 & 765269 & G & A \\\\\n", "\\end{tabular}\n" ], "text/plain": [ " chromosome snp.name cM position allele.1 allele.2\n", "rs10458597 1 rs10458597 0 564621 C\n", "rs12565286 1 rs12565286 0 721290 G C\n", "rs12082473 1 rs12082473 0 740857 T C\n", "rs3094315 1 rs3094315 0 752566 C T\n", "rs2286139 1 rs2286139 0 761732 C T\n", "rs11240776 1 rs11240776 0 765269 G A" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "snps <- geno$map\n", "head(snps)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. SKAT _data prep_\n", "\n", "For illustration, we consider 5 known CAD associated loci. \n", "- First we read in a .csv file containing corresponding coordinates:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
classchrstartstopsnpsposition
1LPL 8 1931318020313180rs264 19813180
2PLG 6 160643608161643608rs4252120161143608
3ABO 9 135654168136654168rs579459 136154168
4SORT1 1 109321511110321511rs602633 109821511
5FLT1 13 28473621 29473621 rs931942828973621
\n" ], "text/latex": [ "\\begin{tabular}{r|llllll}\n", " & class & chr & start & stop & snps & position\\\\\n", "\\hline\n", "\t1 & LPL & 8 & 19313180 & 20313180 & rs264 & 19813180\\\\\n", "\t2 & PLG & 6 & 160643608 & 161643608 & rs4252120 & 161143608\\\\\n", "\t3 & ABO & 9 & 135654168 & 136654168 & rs579459 & 136154168\\\\\n", "\t4 & SORT1 & 1 & 109321511 & 110321511 & rs602633 & 109821511\\\\\n", "\t5 & FLT1 & 13 & 28473621 & 29473621 & rs9319428 & 28973621 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ " class chr start stop snps position\n", "1 LPL 8 19313180 20313180 rs264 19813180\n", "2 PLG 6 160643608 161643608 rs4252120 161143608\n", "3 ABO 9 135654168 136654168 rs579459 136154168\n", "4 SORT1 1 109321511 110321511 rs602633 109821511\n", "5 FLT1 13 28473621 29473621 rs9319428 28973621" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "coords <- read.csv(\"UseR2016_Tutorial_Files/CAD_known_loci_coords.csv\",\n", " stringsAsFactors = F)\n", "colnames(coords)[1] <- \"class\"\n", "coords " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. SKAT _data prep_\n", "\n", "- Using the map2class() function within the GenCAT package, we assign SNPs to each locus:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Mapping SNPs to classes on chromosome 8.\"\n", "[1] \"Mapping SNPs to classes on chromosome 6.\"\n", "[1] \"Mapping SNPs to classes on chromosome 9.\"\n", "[1] \"Mapping SNPs to classes on chromosome 1.\"\n", "[1] \"Mapping SNPs to classes on chromosome 13.\"\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
chrSNPcMpositionA1A2class
18 rs132618240 19313912 C G LPL
28 rs44271900 19321385 T C LPL
38 rs40721310 19321634 C T LPL
48 rs49220300 19322771 C T LPL
58 rs49220310 19323062 T A LPL
68 rs69831750 19323403 C T LPL
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllll}\n", " & chr & SNP & cM & position & A1 & A2 & class\\\\\n", "\\hline\n", "\t1 & 8 & rs13261824 & 0 & 19313912 & C & G & LPL \\\\\n", "\t2 & 8 & rs4427190 & 0 & 19321385 & T & C & LPL \\\\\n", "\t3 & 8 & rs4072131 & 0 & 19321634 & C & T & LPL \\\\\n", "\t4 & 8 & rs4922030 & 0 & 19322771 & C & T & LPL \\\\\n", "\t5 & 8 & rs4922031 & 0 & 19323062 & T & A & LPL \\\\\n", "\t6 & 8 & rs6983175 & 0 & 19323403 & C & T & LPL \\\\\n", "\\end{tabular}\n" ], "text/plain": [ " chr SNP cM position A1 A2 class\n", "1 8 rs13261824 0 19313912 C G LPL\n", "2 8 rs4427190 0 19321385 T C LPL\n", "3 8 rs4072131 0 19321634 C T LPL\n", "4 8 rs4922030 0 19322771 C T LPL\n", "5 8 rs4922031 0 19323062 T A LPL\n", "6 8 rs6983175 0 19323403 C T LPL" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "colnames(snps) <- c(\"chr\",\"SNP\",\"cM\",\"position\",\"A1\",\"A2\")\n", "mapped <- map2class(coords,snps)\n", "head(mapped)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. SKAT _data prep_\n", "\n", "Next we: \n", "- subset columns of the genotype object for SNPs in the mapped object and subjects with complete clinical data; and\n", "- filter the genotype data on call rate ($=1$) and minor allele frequency (MAF >0.01)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A SnpMatrix with 1281 rows and 267 columns\n", "Row names: 10004 ... 11596 \n", "Col names: rs4072131 ... rs1161456 \n" ] } ], "source": [ "geno2 <- genotype[,unique(mapped$SNP)]\n", "geno3 <- geno2[as.character(clinical.c$FamID),]\n", "summary <- col.summary(geno3)\n", "keep <- row.names(summary[summary$Call.rate == 1 & summary$MAF > 0.01,])\n", "geno4 <- geno3[,keep]\n", "print(geno4)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. SKAT _in R_\n", "\n", "- Using LDL as the outcome, and sex and age as covariates, we first fit the null model using the SKAT_Null_Model() function to obtain parameter estimates needed to run SKAT:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\t
$names \n", "\t\t \n", "\t 1. 'res' 2. \n", "\t 3. 'X1' 4. \n", "\t 5. 'res.out' 6. \n", "\t 7. 'out_type' 8. \n", "\t 9. 'n.Resampling' 10. \n", "\t 11. 'type.Resampling' 12. \n", "\t 13. 'id_include' 14. \n", "\t 15. 's2' 16. \n", "\t 17. 'n.all' 18. \n", " \n", " \n", "\t$class
\n", "\t\t
'SKAT_NULL_Model'
\n", "
\n" ], "text/latex": [ "\\begin{description}\n", "\\item[\\$names] \\begin{enumerate*}\n", "\\item 'res'\n", "\\item 'X1'\n", "\\item 'res.out'\n", "\\item 'out\\_type'\n", "\\item 'n.Resampling'\n", "\\item 'type.Resampling'\n", "\\item 'id\\_include'\n", "\\item 's2'\n", "\\item 'n.all'\n", "\\end{enumerate*}\n", "\n", "\\item[\\$class] 'SKAT\\_NULL\\_Model'\n", "\\end{description}\n" ], "text/markdown": [ "$names\n", ": 1. 'res'\n", "2. 'X1'\n", "3. 'res.out'\n", "4. 'out_type'\n", "5. 'n.Resampling'\n", "6. 'type.Resampling'\n", "7. 'id_include'\n", "8. 's2'\n", "9. 'n.all'\n", "\n", "\n", "\n", "$class\n", ": 'SKAT_NULL_Model'\n", "\n", "\n" ], "text/plain": [ "$names\n", "[1] \"res\" \"X1\" \"res.out\" \"out_type\" \n", "[5] \"n.Resampling\" \"type.Resampling\" \"id_include\" \"s2\" \n", "[9] \"n.all\" \n", "\n", "$class\n", "[1] \"SKAT_NULL_Model\"\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "out <- SKAT_Null_Model(clinical.c[,\"ldl\"] ~ \n", " as.matrix(clinical.c[,c(\"sex\",\"age\")]), \n", " out_type=\"C\")\n", "attributes(out)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. SKAT _in R_\n", "\n", "- Prior to applying SKAT, we reformort the snpMatrix object by using the flip.matrix() function to ensure that the allele counts refer to the number of minor alleles present:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "genoNum <- as(geno4,\"numeric\")\n", "\n", "flip.matrix<-function(x) {\n", " zero2 <- which(x==0)\n", " two0 <- which(x==2)\n", " x[zero2] <- 2\n", " x[two0] <- 0\n", " return(x)\n", "}" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
rs4072131rs6998060rs7002507rs4413788rs11204047rs4483172rs4921654rs7017776rs9644631rs17480050rs1617234rs9578057rs7339280rs4272874rs952648rs9314921rs17561728rs1161473rs1161468rs1161456
1000401100000001111220000
1000501000010110002110001
1000710000000000001011111
1000800000000012220111000
1000910011111111110010111
1001000000000111112010111
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll}\n", " & rs4072131 & rs6998060 & rs7002507 & rs4413788 & rs11204047 & rs4483172 & rs4921654 & rs7017776 & rs9644631 & rs17480050 & ⋯ & rs1617234 & rs9578057 & rs7339280 & rs4272874 & rs952648 & rs9314921 & rs17561728 & rs1161473 & rs1161468 & rs1161456\\\\\n", "\\hline\n", "\t10004 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 1 & 1 & 1 & 1 & 2 & 2 & 0 & 0 & 0 & 0\\\\\n", "\t10005 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 1 & 1 & ⋯ & 0 & 0 & 0 & 2 & 1 & 1 & 0 & 0 & 0 & 1\\\\\n", "\t10007 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 1 & 0 & 1 & 1 & 1 & 1 & 1\\\\\n", "\t10008 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & ⋯ & 2 & 2 & 2 & 0 & 1 & 1 & 1 & 0 & 0 & 0\\\\\n", "\t10009 & 1 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & ⋯ & 1 & 1 & 1 & 0 & 0 & 1 & 0 & 1 & 1 & 1\\\\\n", "\t10010 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & ⋯ & 1 & 1 & 1 & 2 & 0 & 1 & 0 & 1 & 1 & 1\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "1. 0\n", "2. 0\n", "3. 1\n", "4. 0\n", "5. 1\n", "6. 0\n", "7. 1\n", "8. 1\n", "9. 0\n", "10. 0\n", "11. 0\n", "12. 0\n", "13. 1\n", "14. 0\n", "15. 0\n", "16. 0\n", "17. 0\n", "18. 0\n", "19. 0\n", "20. 0\n", "21. 0\n", "22. 0\n", "23. 1\n", "24. 0\n", "25. 0\n", "26. 0\n", "27. 0\n", "28. 0\n", "29. 1\n", "30. 0\n", "31. 0\n", "32. 0\n", "33. 0\n", "34. 0\n", "35. 1\n", "36. 0\n", "37. 0\n", "38. 1\n", "39. 0\n", "40. 0\n", "41. 1\n", "42. 0\n", "43. 0\n", "44. 0\n", "45. 0\n", "46. 0\n", "47. 1\n", "48. 0\n", "49. 0\n", "50. 1\n", "51. 0\n", "52. 0\n", "53. 1\n", "54. 1\n", "55. 0\n", "56. 1\n", "57. 0\n", "58. 1\n", "59. 1\n", "60. 1\n", "61. 0\n", "62. 0\n", "63. 1\n", "64. 0\n", "65. 0\n", "66. 0\n", "67. 0\n", "68. 0\n", "69. 1\n", "70. 0\n", "71. 0\n", "72. 0\n", "73. 0\n", "74. 0\n", "75. 1\n", "76. 0\n", "77. 0\n", "78. 0\n", "79. 0\n", "80. 0\n", "81. 1\n", "82. 0\n", "83. 0\n", "84. 0\n", "85. 1\n", "86. 0\n", "87. 1\n", "88. 0\n", "89. 1\n", "90. 0\n", "91. 0\n", "92. 1\n", "93. 0\n", "94. 1\n", "95. 0\n", "96. 1\n", "97. 2\n", "98. 0\n", "99. 1\n", "100. 0\n", "101. 2\n", "102. 0\n", "103. 1\n", "104. 1\n", "105. 1\n", "106. 1\n", "107. 1\n", "108. 1\n", "109. 1\n", "110. 0\n", "111. 0\n", "112. 0\n", "113. 0\n", "114. 0\n", "115. 1\n", "116. 0\n", "117. 0\n", "118. 0\n", "119. 0\n", "120. 0\n", "121. 0\n", "122. 0\n", "123. 0\n", "124. 0\n", "125. 0\n", "126. 0\n", "127. 0\n", "128. 0\n", "129. 0\n", "130. 0\n", "131. 0\n", "132. 0\n", "133. 0\n", "134. 0\n", "135. 0\n", "136. 0\n", "137. 0\n", "138. 0\n", "139. 0\n", "140. 1\n", "141. 0\n", "142. 1\n", "143. 0\n", "144. 0\n", "145. 0\n", "146. 1\n", "147. 0\n", "148. 1\n", "149. 1\n", "150. 0\n", "151. 0\n", "152. 0\n", "153. 0\n", "154. 0\n", "155. 0\n", "156. 0\n", "157. 0\n", "158. 0\n", "159. 0\n", "160. 1\n", "161. 0\n", "162. 0\n", "163. 1\n", "164. 0\n", "165. 0\n", "166. 0\n", "167. 0\n", "168. 2\n", "169. 1\n", "170. 0\n", "171. 1\n", "172. 0\n", "173. 1\n", "174. 0\n", "175. 0\n", "176. 1\n", "177. 1\n", "178. 0\n", "179. 0\n", "180. 0\n", "181. 1\n", "182. 0\n", "183. 0\n", "184. 0\n", "185. 0\n", "186. 2\n", "187. 0\n", "188. 0\n", "189. 0\n", "190. 1\n", "191. 0\n", "192. 0\n", "193. 0\n", "194. 0\n", "195. 0\n", "196. 1\n", "197. 0\n", "198. 0\n", "199. 0\n", "200. 0\n", "201. 0\n", "202. 0\n", "203. 1\n", "204. 0\n", "205. 0\n", "206. 0\n", "207. 0\n", "208. 0\n", "209. 1\n", "210. 0\n", "211. 0\n", "212. 0\n", "213. 0\n", "214. 1\n", "215. 0\n", "216. 0\n", "217. 1\n", "218. 0\n", "219. 0\n", "220. 0\n", "221. 0\n", "222. 0\n", "223. 0\n", "224. 0\n", "225. 0\n", "226. 1\n", "227. 0\n", "228. 0\n", "229. 0\n", "230. 0\n", "231. 0\n", "232. 1\n", "233. 0\n", "234. 2\n", "235. 2\n", "236. 0\n", "237. 2\n", "238. 0\n", "239. 1\n", "240. 0\n", "241. 1\n", "242. 0\n", "243. 0\n", "244. 0\n", "245. 1\n", "246. 0\n", "247. 0\n", "248. 0\n", "249. 0\n", "250. 0\n", "251. 1\n", "252. 0\n", "253. 0\n", "254. 0\n", "255. 0\n", "256. 0\n", "257. 1\n", "258. 0\n", "259. 1\n", "260. 0\n", "261. 1\n", "262. 0\n", "263. 0\n", "264. 0\n", "265. 0\n", "266. 0\n", "267. 0\n", "268. 0\n", "269. 1\n", "270. 0\n", "271. 0\n", "272. 0\n", "273. 0\n", "274. 0\n", "275. 1\n", "276. 0\n", "277. 1\n", "278. 0\n", "279. 1\n", "280. 0\n", "281. 0\n", "282. 0\n", "283. 1\n", "284. 0\n", "285. 2\n", "286. 0\n", "287. 0\n", "288. 0\n", "289. 1\n", "290. 0\n", "291. 2\n", "292. 0\n", "293. 0\n", "294. 0\n", "295. 0\n", "296. 0\n", "297. 0\n", "298. 0\n", "299. 0\n", "300. 0\n", "301. 0\n", "302. 0\n", "303. 0\n", "304. 0\n", "305. 0\n", "306. 0\n", "307. 0\n", "308. 0\n", "309. 0\n", "310. 0\n", "311. 0\n", "312. 0\n", "313. 1\n", "314. 0\n", "315. 1\n", "316. 0\n", "317. 0\n", "318. 0\n", "319. 0\n", "320. 0\n", "321. 0\n", "322. 0\n", "323. 2\n", "324. 0\n", "325. 0\n", "326. 0\n", "327. 0\n", "328. 0\n", "329. 2\n", "330. 0\n", "331. 0\n", "332. 0\n", "333. 0\n", "334. 0\n", "335. 0\n", "336. 0\n", "337. 0\n", "338. 0\n", "339. 0\n", "340. 0\n", "341. 0\n", "342. 0\n", "343. 0\n", "344. 0\n", "345. 0\n", "346. 0\n", "347. 2\n", "348. 0\n", "349. 0\n", "350. 0\n", "351. 0\n", "352. 0\n", "353. 0\n", "354. 0\n", "355. 2\n", "356. 2\n", "357. 0\n", "358. 1\n", "359. 1\n", "360. 1\n", "361. 0\n", "362. 0\n", "363. 1\n", "364. 1\n", "365. 1\n", "366. 0\n", "367. 0\n", "368. 0\n", "369. 1\n", "370. 0\n", "371. 0\n", "372. 1\n", "373. 0\n", "374. 0\n", "375. 1\n", "376. 1\n", "377. 0\n", "378. 2\n", "379. 0\n", "380. 0\n", "381. 0\n", "382. 0\n", "383. 2\n", "384. 0\n", "385. 0\n", "386. 0\n", "387. 0\n", "388. 0\n", "389. 2\n", "390. 0\n", "391. 0\n", "392. 1\n", "393. 0\n", "394. 1\n", "395. 0\n", "396. 1\n", "397. 0\n", "398. 0\n", "399. 1\n", "400. 0\n", "401. 1\n", "402. 0\n", "403. 0\n", "404. 1\n", "405. 0\n", "406. 0\n", "407. 1\n", "408. 1\n", "409. 0\n", "410. 1\n", "411. 1\n", "412. 0\n", "413. 2\n", "414. 1\n", "415. 0\n", "416. 1\n", "417. 1\n", "418. 0\n", "419. 2\n", "420. 1\n", "421. 0\n", "422. 1\n", "423. 0\n", "424. 0\n", "425. 1\n", "426. 1\n", "427. 0\n", "428. 0\n", "429. 1\n", "430. 2\n", "431. 1\n", "432. 0\n", "433. 2\n", "434. 2\n", "435. 0\n", "436. 0\n", "437. 0\n", "438. 1\n", "439. 0\n", "440. 0\n", "441. 1\n", "442. 0\n", "443. 1\n", "444. 0\n", "445. 0\n", "446. 1\n", "447. 0\n", "448. 2\n", "449. 0\n", "450. 1\n", "451. 0\n", "452. 1\n", "453. 1\n", "454. 1\n", "455. 1\n", "456. 1\n", "457. 0\n", "458. 0\n", "459. 1\n", "460. 0\n", "461. 0\n", "462. 0\n", "463. 0\n", "464. 0\n", "465. 1\n", "466. 0\n", "467. 0\n", "468. 0\n", "469. 1\n", "470. 2\n", "471. 1\n", "472. 2\n", "473. 2\n", "474. 0\n", "475. 0\n", "476. 0\n", "477. 1\n", "478. 0\n", "479. 0\n", "480. 1\n", "481. 0\n", "482. 0\n", "483. 0\n", "484. 0\n", "485. 0\n", "486. 1\n", "487. 2\n", "488. 0\n", "489. 1\n", "490. 1\n", "491. 1\n", "492. 1\n", "493. 0\n", "494. 0\n", "495. 1\n", "496. 1\n", "497. 1\n", "498. 0\n", "499. 0\n", "500. 0\n", "501. 0\n", "502. 0\n", "503. 0\n", "504. 0\n", "505. 1\n", "506. 0\n", "507. 0\n", "508. 0\n", "509. 0\n", "510. 0\n", "511. 0\n", "512. 0\n", "513. 1\n", "514. 1\n", "515. 1\n", "516. 0\n", "517. 0\n", "518. 0\n", "519. 1\n", "520. 1\n", "521. 1\n", "522. 0\n", "523. 0\n", "524. 0\n", "525. 0\n", "526. 1\n", "527. 0\n", "528. 0\n", "529. 0\n", "530. 0\n", "531. 1\n", "532. 2\n", "533. 1\n", "534. 1\n", "535. 0\n", "536. 0\n", "537. 1\n", "538. 2\n", "539. 1\n", "540. 1\n", "541. 0\n", "542. 0\n", "543. 1\n", "544. 2\n", "545. 1\n", "546. 1\n", "547. 0\n", "548. 0\n", "549. 1\n", "550. 1\n", "551. 1\n", "552. 1\n", "553. 0\n", "554. 1\n", "555. 0\n", "556. 0\n", "557. 1\n", "558. 0\n", "559. 1\n", "560. 1\n", "561. 0\n", "562. 0\n", "563. 1\n", "564. 1\n", "565. 1\n", "566. 1\n", "567. 0\n", "568. 2\n", "569. 1\n", "570. 0\n", "571. 0\n", "572. 0\n", "573. 0\n", "574. 0\n", "575. 0\n", "576. 0\n", "577. 0\n", "578. 0\n", "579. 0\n", "580. 1\n", "581. 0\n", "582. 0\n", "583. 0\n", "584. 0\n", "585. 0\n", "586. 0\n", "587. 1\n", "588. 0\n", "589. 0\n", "590. 0\n", "591. 0\n", "592. 0\n", "593. 0\n", "594. 0\n", "595. 1\n", "596. 0\n", "597. 1\n", "598. 0\n", "599. 0\n", "600. 1\n", "601. 1\n", "602. 0\n", "603. 1\n", "604. 0\n", "605. 0\n", "606. 1\n", "607. 1\n", "608. 1\n", "609. 0\n", "610. 2\n", "611. 1\n", "612. 0\n", "613. 1\n", "614. 1\n", "615. 1\n", "616. 2\n", "617. 1\n", "618. 0\n", "619. 1\n", "620. 1\n", "621. 1\n", "622. 0\n", "623. 1\n", "624. 2\n", "625. 0\n", "626. 0\n", "627. 0\n", "628. 0\n", "629. 0\n", "630. 1\n", "631. 0\n", "632. 0\n", "633. 1\n", "634. 0\n", "635. 0\n", "636. 0\n", "637. 0\n", "638. 0\n", "639. 0\n", "640. 0\n", "641. 0\n", "642. 0\n", "643. 1\n", "644. 1\n", "645. 1\n", "646. 0\n", "647. 1\n", "648. 1\n", "649. 0\n", "650. 1\n", "651. 0\n", "652. 0\n", "653. 1\n", "654. 0\n", "655. 1\n", "656. 0\n", "657. 2\n", "658. 0\n", "659. 0\n", "660. 2\n", "661. 1\n", "662. 1\n", "663. 0\n", "664. 2\n", "665. 0\n", "666. 0\n", "667. 1\n", "668. 1\n", "669. 0\n", "670. 2\n", "671. 1\n", "672. 0\n", "673. 1\n", "674. 1\n", "675. 1\n", "676. 0\n", "677. 1\n", "678. 2\n", "679. 1\n", "680. 1\n", "681. 2\n", "682. 2\n", "683. 0\n", "684. 2\n", "685. 0\n", "686. 1\n", "687. 0\n", "688. 0\n", "689. 1\n", "690. 0\n", "691. 1\n", "692. 0\n", "693. 0\n", "694. 0\n", "695. 1\n", "696. 0\n", "697. 0\n", "698. 1\n", "699. 0\n", "700. 0\n", "701. 1\n", "702. 0\n", "703. 1\n", "704. 1\n", "705. 0\n", "706. 0\n", "707. 2\n", "708. 0\n", "709. 0\n", "710. 0\n", "711. 0\n", "712. 0\n", "713. 0\n", "714. 0\n", "715. 1\n", "716. 0\n", "717. 1\n", "718. 0\n", "719. 1\n", "720. 2\n", "721. 1\n", "722. 1\n", "723. 1\n", "724. 2\n", "725. 0\n", "726. 0\n", "727. 1\n", "728. 1\n", "729. 1\n", "730. 2\n", "731. 0\n", "732. 0\n", "733. 0\n", "734. 0\n", "735. 0\n", "736. 0\n", "737. 0\n", "738. 0\n", "739. 1\n", "740. 1\n", "741. 1\n", "742. 2\n", "743. 0\n", "744. 0\n", "745. 2\n", "746. 1\n", "747. 2\n", "748. 1\n", "749. 0\n", "750. 1\n", "751. 2\n", "752. 1\n", "753. 2\n", "754. 1\n", "755. 0\n", "756. 1\n", "757. 0\n", "758. 0\n", "759. 1\n", "760. 0\n", "761. 0\n", "762. 0\n", "763. 0\n", "764. 0\n", "765. 0\n", "766. 1\n", "767. 0\n", "768. 0\n", "769. 1\n", "770. 0\n", "771. 1\n", "772. 0\n", "773. 0\n", "774. 0\n", "775. 1\n", "776. 1\n", "777. 1\n", "778. 0\n", "779. 0\n", "780. 1\n", "781. 1\n", "782. 1\n", "783. 1\n", "784. 0\n", "785. 0\n", "786. 1\n", "787. 0\n", "788. 0\n", "789. 0\n", "790. 0\n", "791. 0\n", "792. 0\n", "793. 0\n", "794. 0\n", "795. 0\n", "796. 0\n", "797. 1\n", "798. 1\n", "799. 0\n", "800. 0\n", "801. 0\n", "802. 0\n", "803. 1\n", "804. 1\n", "805. 0\n", "806. 0\n", "807. 0\n", "808. 0\n", "809. 1\n", "810. 1\n", "811. 0\n", "812. 0\n", "813. 0\n", "814. 0\n", "815. 0\n", "816. 0\n", "817. 0\n", "818. 0\n", "819. 0\n", "820. 0\n", "821. 1\n", "822. 1\n", "823. 0\n", "824. 0\n", "825. 0\n", "826. 0\n", "827. 1\n", "828. 1\n", "829. 0\n", "830. 0\n", "831. 0\n", "832. 0\n", "833. 1\n", "834. 1\n", "835. 0\n", "836. 0\n", "837. 0\n", "838. 0\n", "839. 1\n", "840. 1\n", "841. 0\n", "842. 0\n", "843. 0\n", "844. 0\n", "845. 1\n", "846. 1\n", "847. 0\n", "848. 0\n", "849. 0\n", "850. 0\n", "851. 1\n", "852. 1\n", "853. 0\n", "854. 0\n", "855. 0\n", "856. 0\n", "857. 0\n", "858. 0\n", "859. 0\n", "860. 0\n", "861. 0\n", "862. 0\n", "863. 0\n", "864. 0\n", "865. 0\n", "866. 1\n", "867. 0\n", "868. 0\n", "869. 0\n", "870. 0\n", "871. 0\n", "872. 0\n", "873. 0\n", "874. 0\n", "875. 1\n", "876. 1\n", "877. 0\n", "878. 1\n", "879. 0\n", "880. 0\n", "881. 0\n", "882. 0\n", "883. 0\n", "884. 0\n", "885. 0\n", "886. 0\n", "887. 0\n", "888. 0\n", "889. 0\n", "890. 0\n", "891. 0\n", "892. 0\n", "893. 0\n", "894. 1\n", "895. 1\n", "896. 0\n", "897. 0\n", "898. 0\n", "899. 2\n", "900. 0\n", "901. 1\n", "902. 1\n", "903. 1\n", "904. 1\n", "905. 0\n", "906. 2\n", "907. 0\n", "908. 0\n", "909. 0\n", "910. 0\n", "911. 0\n", "912. 0\n", "913. 0\n", "914. 0\n", "915. 0\n", "916. 0\n", "917. 0\n", "918. 0\n", "919. 1\n", "920. 1\n", "921. 1\n", "922. 0\n", "923. 0\n", "924. 2\n", "925. 1\n", "926. 1\n", "927. 1\n", "928. 0\n", "929. 0\n", "930. 2\n", "931. 2\n", "932. 1\n", "933. 1\n", "934. 0\n", "935. 0\n", "936. 1\n", "937. 2\n", "938. 1\n", "939. 1\n", "940. 0\n", "941. 0\n", "942. 1\n", "943. 0\n", "944. 1\n", "945. 0\n", "946. 0\n", "947. 0\n", "948. 0\n", "949. 0\n", "950. 1\n", "951. 0\n", "952. 0\n", "953. 0\n", "954. 0\n", "955. 0\n", "956. 0\n", "957. 0\n", "958. 0\n", "959. 0\n", "960. 0\n", "961. 0\n", "962. 1\n", "963. 1\n", "964. 2\n", "965. 2\n", "966. 1\n", "967. 0\n", "968. 1\n", "969. 0\n", "970. 0\n", "971. 0\n", "972. 0\n", "973. 0\n", "974. 1\n", "975. 0\n", "976. 0\n", "977. 0\n", "978. 0\n", "979. 0\n", "980. 0\n", "981. 0\n", "982. 0\n", "983. 0\n", "984. 0\n", "985. 0\n", "986. 0\n", "987. 0\n", "988. 0\n", "989. 0\n", "990. 0\n", "991. 1\n", "992. 0\n", "993. 0\n", "994. 2\n", "995. 1\n", "996. 0\n", "997. 1\n", "998. 1\n", "999. 0\n", "1000. 0\n", "1001. 0\n", "1002. 0\n", "1003. 0\n", "1004. 1\n", "1005. 0\n", "1006. 0\n", "1007. 0\n", "1008. 0\n", "1009. 1\n", "1010. 1\n", "1011. 0\n", "1012. 0\n", "1013. 0\n", "1014. 0\n", "1015. 1\n", "1016. 1\n", "1017. 0\n", "1018. 0\n", "1019. 0\n", "1020. 0\n", "1021. 0\n", "1022. 0\n", "1023. 0\n", "1024. 0\n", "1025. 0\n", "1026. 0\n", "1027. 2\n", "1028. 1\n", "1029. 1\n", "1030. 1\n", "1031. 1\n", "1032. 0\n", "1033. 0\n", "1034. 1\n", "1035. 1\n", "1036. 0\n", "1037. 1\n", "1038. 1\n", "1039. 0\n", "1040. 1\n", "1041. 0\n", "1042. 1\n", "1043. 0\n", "1044. 0\n", "1045. 1\n", "1046. 2\n", "1047. 0\n", "1048. 1\n", "1049. 1\n", "1050. 0\n", "1051. 1\n", "1052. 1\n", "1053. 0\n", "1054. 1\n", "1055. 0\n", "1056. 0\n", "1057. 0\n", "1058. 0\n", "1059. 0\n", "1060. 0\n", "1061. 0\n", "1062. 0\n", "1063. 0\n", "1064. 0\n", "1065. 0\n", "1066. 0\n", "1067. 0\n", "1068. 0\n", "1069. 2\n", "1070. 1\n", "1071. 1\n", "1072. 1\n", "1073. 0\n", "1074. 1\n", "1075. 0\n", "1076. 0\n", "1077. 0\n", "1078. 0\n", "1079. 0\n", "1080. 0\n", "1081. 0\n", "1082. 0\n", "1083. 0\n", "1084. 0\n", "1085. 0\n", "1086. 0\n", "1087. 0\n", "1088. 0\n", "1089. 0\n", "1090. 0\n", "1091. 0\n", "1092. 0\n", "1093. 0\n", "1094. 1\n", "1095. 0\n", "1096. 0\n", "1097. 1\n", "1098. 0\n", "1099. 0\n", "1100. 0\n", "1101. 0\n", "1102. 1\n", "1103. 0\n", "1104. 0\n", "1105. 0\n", "1106. 2\n", "1107. 0\n", "1108. 0\n", "1109. 1\n", "1110. 1\n", "1111. 0\n", "1112. 0\n", "1113. 0\n", "1114. 1\n", "1115. 0\n", "1116. 0\n", "1117. 0\n", "1118. 0\n", "1119. 0\n", "1120. 0\n", "1121. 0\n", "1122. 1\n", "1123. 0\n", "1124. 0\n", "1125. 0\n", "1126. 1\n", "1127. 0\n", "1128. 1\n", "1129. 1\n", "1130. 2\n", "1131. 0\n", "1132. 0\n", "1133. 1\n", "1134. 1\n", "1135. 1\n", "1136. 1\n", "1137. 0\n", "1138. 1\n", "1139. 1\n", "1140. 1\n", "1141. 1\n", "1142. 0\n", "1143. 1\n", "1144. 0\n", "1145. 1\n", "1146. 0\n", "1147. 0\n", "1148. 0\n", "1149. 0\n", "1150. 0\n", "1151. 0\n", "1152. 0\n", "1153. 0\n", "1154. 0\n", "1155. 1\n", "1156. 2\n", "1157. 0\n", "1158. 1\n", "1159. 1\n", "1160. 0\n", "1161. 0\n", "1162. 0\n", "1163. 0\n", "1164. 0\n", "1165. 1\n", "1166. 0\n", "1167. 0\n", "1168. 0\n", "1169. 0\n", "1170. 0\n", "1171. 1\n", "1172. 1\n", "1173. 0\n", "1174. 0\n", "1175. 1\n", "1176. 0\n", "1177. 0\n", "1178. 0\n", "1179. 0\n", "1180. 0\n", "1181. 0\n", "1182. 0\n", "1183. 1\n", "1184. 1\n", "1185. 1\n", "1186. 0\n", "1187. 0\n", "1188. 1\n", "1189. 0\n", "1190. 1\n", "1191. 0\n", "1192. 0\n", "1193. 0\n", "1194. 0\n", "1195. 0\n", "1196. 0\n", "1197. 0\n", "1198. 1\n", "1199. 0\n", "1200. 0\n", "1201. 0\n", "1202. 1\n", "1203. 0\n", "1204. 1\n", "1205. 0\n", "1206. 0\n", "1207. 0\n", "1208. 1\n", "1209. 0\n", "1210. 0\n", "1211. 0\n", "1212. 0\n", "1213. 0\n", "1214. 0\n", "1215. 0\n", "1216. 0\n", "1217. 0\n", "1218. 0\n", "1219. 0\n", "1220. 0\n", "1221. 0\n", "1222. 0\n", "1223. 0\n", "1224. 0\n", "1225. 0\n", "1226. 0\n", "1227. 0\n", "1228. 0\n", "1229. 0\n", "1230. 0\n", "1231. 0\n", "1232. 0\n", "1233. 0\n", "1234. 1\n", "1235. 0\n", "1236. 1\n", "1237. 0\n", "1238. 0\n", "1239. 0\n", "1240. 1\n", "1241. 0\n", "1242. 1\n", "1243. 0\n", "1244. 0\n", "1245. 0\n", "1246. 0\n", "1247. 0\n", "1248. 0\n", "1249. 1\n", "1250. 0\n", "1251. 0\n", "1252. 0\n", "1253. 0\n", "1254. 0\n", "1255. 0\n", "1256. 0\n", "1257. 0\n", "1258. 0\n", "1259. 0\n", "1260. 0\n", "1261. 0\n", "1262. 0\n", "1263. 0\n", "1264. 1\n", "1265. 1\n", "1266. 0\n", "1267. 2\n", "1268. 1\n", "1269. 1\n", "1270. 0\n", "1271. 1\n", "1272. 2\n", "1273. 0\n", "1274. 1\n", "1275. 1\n", "1276. 0\n", "1277. 0\n", "1278. 0\n", "1279. 0\n", "1280. 0\n", "1281. 0\n", "1282. 0\n", "1283. 0\n", "1284. 0\n", "1285. 0\n", "1286. 1\n", "1287. 0\n", "1288. 0\n", "1289. 0\n", "1290. 1\n", "1291. 0\n", "1292. 0\n", "1293. 0\n", "1294. 1\n", "1295. 0\n", "1296. 1\n", "1297. 0\n", "1298. 0\n", "1299. 0\n", "1300. 0\n", "1301. 0\n", "1302. 0\n", "1303. 0\n", "1304. 0\n", "1305. 0\n", "1306. 0\n", "1307. 0\n", "1308. 0\n", "1309. 1\n", "1310. 0\n", "1311. 0\n", "1312. 1\n", "1313. 1\n", "1314. 1\n", "1315. 0\n", "1316. 1\n", "1317. 0\n", "1318. 1\n", "1319. 0\n", "1320. 0\n", "1321. 0\n", "1322. 0\n", "1323. 1\n", "1324. 0\n", "1325. 1\n", "1326. 0\n", "1327. 0\n", "1328. 1\n", "1329. 0\n", "1330. 1\n", "1331. 0\n", "1332. 0\n", "1333. 0\n", "1334. 1\n", "1335. 0\n", "1336. 2\n", "1337. 1\n", "1338. 0\n", "1339. 0\n", "1340. 1\n", "1341. 0\n", "1342. 1\n", "1343. 0\n", "1344. 1\n", "1345. 1\n", "1346. 1\n", "1347. 1\n", "1348. 0\n", "1349. 0\n", "1350. 1\n", "1351. 0\n", "1352. 1\n", "1353. 0\n", "1354. 0\n", "1355. 0\n", "1356. 0\n", "1357. 0\n", "1358. 1\n", "1359. 0\n", "1360. 0\n", "1361. 0\n", "1362. 0\n", "1363. 0\n", "1364. 1\n", "1365. 0\n", "1366. 0\n", "1367. 0\n", "1368. 0\n", "1369. 0\n", "1370. 1\n", "1371. 0\n", "1372. 0\n", "1373. 0\n", "1374. 0\n", "1375. 1\n", "1376. 0\n", "1377. 1\n", "1378. 1\n", "1379. 0\n", "1380. 1\n", "1381. 0\n", "1382. 1\n", "1383. 0\n", "1384. 0\n", "1385. 0\n", "1386. 0\n", "1387. 1\n", "1388. 0\n", "1389. 1\n", "1390. 1\n", "1391. 1\n", "1392. 0\n", "1393. 0\n", "1394. 0\n", "1395. 1\n", "1396. 1\n", "1397. 1\n", "1398. 1\n", "1399. 0\n", "1400. 0\n", "1401. 0\n", "1402. 0\n", "1403. 0\n", "1404. 0\n", "1405. 0\n", "1406. 1\n", "1407. 0\n", "1408. 0\n", "1409. 1\n", "1410. 0\n", "1411. 0\n", "1412. 1\n", "1413. 0\n", "1414. 0\n", "1415. 1\n", "1416. 0\n", "1417. 1\n", "1418. 1\n", "1419. 1\n", "1420. 1\n", "1421. 0\n", "1422. 0\n", "1423. 0\n", "1424. 0\n", "1425. 0\n", "1426. 0\n", "1427. 0\n", "1428. 0\n", "1429. 0\n", "1430. 1\n", "1431. 0\n", "1432. 0\n", "1433. 1\n", "1434. 0\n", "1435. 0\n", "1436. 0\n", "1437. 0\n", "1438. 0\n", "1439. 1\n", "1440. 0\n", "1441. 0\n", "1442. 1\n", "1443. 2\n", "1444. 1\n", "1445. 1\n", "1446. 1\n", "1447. 0\n", "1448. 1\n", "1449. 2\n", "1450. 1\n", "1451. 1\n", "1452. 1\n", "1453. 0\n", "1454. 1\n", "1455. 0\n", "1456. 0\n", "1457. 0\n", "1458. 0\n", "1459. 0\n", "1460. 0\n", "1461. 1\n", "1462. 0\n", "1463. 0\n", "1464. 0\n", "1465. 0\n", "1466. 0\n", "1467. 1\n", "1468. 1\n", "1469. 0\n", "1470. 1\n", "1471. 0\n", "1472. 1\n", "1473. 2\n", "1474. 2\n", "1475. 1\n", "1476. 1\n", "1477. 1\n", "1478. 1\n", "1479. 0\n", "1480. 0\n", "1481. 1\n", "1482. 1\n", "1483. 1\n", "1484. 1\n", "1485. 0\n", "1486. 0\n", "1487. 1\n", "1488. 1\n", "1489. 0\n", "1490. 0\n", "1491. 2\n", "1492. 2\n", "1493. 1\n", "1494. 0\n", "1495. 0\n", "1496. 0\n", "1497. 2\n", "1498. 2\n", "1499. 1\n", "1500. 0\n", "1501. 0\n", "1502. 0\n", "1503. 0\n", "1504. 0\n", "1505. 0\n", "1506. 0\n", "1507. 0\n", "1508. 0\n", "1509. 0\n", "1510. 0\n", "1511. 0\n", "1512. 0\n", "1513. 0\n", "1514. 0\n", "1515. 0\n", "1516. 0\n", "1517. 0\n", "1518. 0\n", "1519. 0\n", "1520. 0\n", "1521. 2\n", "1522. 1\n", "1523. 1\n", "1524. 1\n", "1525. 0\n", "1526. 0\n", "1527. 2\n", "1528. 1\n", "1529. 1\n", "1530. 0\n", "1531. 1\n", "1532. 1\n", "1533. 2\n", "1534. 1\n", "1535. 1\n", "1536. 0\n", "1537. 1\n", "1538. 2\n", "1539. 1\n", "1540. 0\n", "1541. 0\n", "1542. 1\n", "1543. 1\n", "1544. 0\n", "1545. 0\n", "1546. 2\n", "1547. 1\n", "1548. 1\n", "1549. 1\n", "1550. 0\n", "1551. 0\n", "1552. 2\n", "1553. 1\n", "1554. 1\n", "1555. 1\n", "1556. 0\n", "1557. 0\n", "1558. 2\n", "1559. 1\n", "1560. 1\n", "1561. 1\n", "1562. 2\n", "1563. 1\n", "1564. 0\n", "1565. 0\n", "1566. 2\n", "1567. 2\n", "1568. 1\n", "1569. 0\n", "1570. 1\n", "1571. 0\n", "1572. 0\n", "1573. 2\n", "1574. 1\n", "1575. 1\n", "1576. 1\n", "1577. 1\n", "1578. 1\n", "1579. 0\n", "1580. 0\n", "1581. 1\n", "1582. 1\n", "1583. 0\n", "1584. 0\n", "1585. 0\n", "1586. 0\n", "1587. 1\n", "1588. 0\n", "1589. 1\n", "1590. 1\n", "1591. 0\n", "1592. 0\n", "1593. 1\n", "1594. 0\n", "1595. 1\n", "1596. 1\n", "1597. 0\n", "1598. 1\n", "1599. 1\n", "1600. 0\n", "1601. 1\n", "1602. 1\n", "\n", "\n" ], "text/plain": [ " rs4072131 rs6998060 rs7002507 rs4413788 rs11204047 rs4483172 rs4921654\n", "10004 \"0\" \"1\" \"1\" \"0\" \"0\" \"0\" \"0\" \n", "10005 \"0\" \"1\" \"0\" \"0\" \"0\" \"0\" \"1\" \n", "10007 \"1\" \"0\" \"0\" \"0\" \"0\" \"0\" \"0\" \n", "10008 \"0\" \"0\" \"0\" \"0\" \"0\" \"0\" \"0\" \n", "10009 \"1\" \"0\" \"0\" \"1\" \"1\" \"1\" \"1\" \n", "10010 \"0\" \"0\" \"0\" \"0\" \"0\" \"0\" \"0\" \n", " rs7017776 rs9644631 rs17480050 ⋯ rs1617234 rs9578057 rs7339280\n", "10004 \"0\" \"0\" \"0\" \"⋯\" \"1\" \"1\" \"1\" \n", "10005 \"0\" \"1\" \"1\" \"⋯\" \"0\" \"0\" \"0\" \n", "10007 \"0\" \"0\" \"0\" \"⋯\" \"0\" \"0\" \"0\" \n", "10008 \"0\" \"0\" \"1\" \"⋯\" \"2\" \"2\" \"2\" \n", "10009 \"1\" \"1\" \"1\" \"⋯\" \"1\" \"1\" \"1\" \n", "10010 \"0\" \"1\" \"1\" \"⋯\" \"1\" \"1\" \"1\" \n", " rs4272874 rs952648 rs9314921 rs17561728 rs1161473 rs1161468 rs1161456\n", "10004 \"1\" \"2\" \"2\" \"0\" \"0\" \"0\" \"0\" \n", "10005 \"2\" \"1\" \"1\" \"0\" \"0\" \"0\" \"1\" \n", "10007 \"1\" \"0\" \"1\" \"1\" \"1\" \"1\" \"1\" \n", "10008 \"0\" \"1\" \"1\" \"1\" \"0\" \"0\" \"0\" \n", "10009 \"0\" \"0\" \"1\" \"0\" \"1\" \"1\" \"1\" \n", "10010 \"2\" \"0\" \"1\" \"0\" \"1\" \"1\" \"1\" " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\t
1. 1281
2. \n", "\t
3. 267
4. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1281\n", "\\item 267\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1281\n", "2. 267\n", "\n", "\n" ], "text/plain": [ "[1] 1281 267" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "Z <- flip.matrix(genoNum)\n", "head(Z)\n", "dim(Z)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. SKAT _in R_\n", "\n", "- Finally we subset the SNPS within a given locus and apply the SKAT() function: " ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\t
$names \n", "\t\t \n", "\t 1. 'p.value' 2. \n", "\t 3. 'p.value.resampling' 4. \n", "\t 5. 'Test.Type' 6. \n", "\t 7. 'Q' 8. \n", "\t 9. 'param' 10. \n", "\t 11. 'pval.zero.msg' 12. \n", " \n", " \n", "\t$class
\n", "\t\t
'SKAT_OUT'
\n", "
\n" ], "text/latex": [ "\\begin{description}\n", "\\item[\\$names] \\begin{enumerate*}\n", "\\item 'p.value'\n", "\\item 'p.value.resampling'\n", "\\item 'Test.Type'\n", "\\item 'Q'\n", "\\item 'param'\n", "\\item 'pval.zero.msg'\n", "\\end{enumerate*}\n", "\n", "\\item[\\$class] 'SKAT\\_OUT'\n", "\\end{description}\n" ], "text/markdown": [ "$names\n", ": 1. 'p.value'\n", "2. 'p.value.resampling'\n", "3. 'Test.Type'\n", "4. 'Q'\n", "5. 'param'\n", "6. 'pval.zero.msg'\n", "\n", "\n", "\n", "$class\n", ": 'SKAT_OUT'\n", "\n", "\n" ], "text/plain": [ "$names\n", "[1] \"p.value\" \"p.value.resampling\" \"Test.Type\" \n", "[4] \"Q\" \"param\" \"pval.zero.msg\" \n", "\n", "$class\n", "[1] \"SKAT_OUT\"\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "0.0399313759930093" ], "text/latex": [ "0.0399313759930093" ], "text/markdown": [ "0.0399313759930093" ], "text/plain": [ "[1] 0.03993138" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gene_snps <- unique(mapped$SNP[mapped$class == \"SORT1\"])\n", "gene_snps <- gene_snps[gene_snps%in%colnames(geno4)]\n", "Z_sub <- as.matrix(Z[,gene_snps])\n", "attributes(SKAT(Z_sub,out))\n", "SKAT(Z_sub,out)$p.value " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. SKAT _notable features_\n", "\n", "- Computationally efficient - only requires fitting reduced model (under the null)\n", "- Accomodates both common and rare variants \n", "- Uses raw data as input" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Part 2: Interrogation of regional and higher-order associations.\n", "\n", "- Motivation$\\checkmark$\n", "- Sequence Kernel Association Test (SKAT)$\\checkmark$\n", "- Genetic Class Associatin Test (GenCAT)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 3. Genetic Class Association Test (GenCAT)$^*$\n", "\n", "- Efficient framework for testing for class-level (protein coding gene, regulatory element, etc) association.\n", "- Leverages summary level data from GWA meta-analysis resources.\n", "- Extensions of QT and VEGAS. \n", "\n", "$^*$_Qian et al. (2016) PLoS ONE DOI:10.1371/journal.pone.0148218._" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 3. GenCAT _method_\n", "\n", "- Let${\\bf z}=(z_1, z_2,..., z_n)^T$be a vector of$n$test statistics (z-scores) for association of each element in this class with the trait under study\n", "-${\\bf z} \\sim N_n(0, \\Sigma)$, under the null of no association between any of the$n$elements in this class.\n", "- Because$\\Sigma$is square and positive definite, we can decompose$\\Sigma$as:\n", "\n", "\\begin{eqnarray*}\n", "\\Sigma=Q\\Lambda Q^{T}\n", "\\end{eqnarray*}\n", "\n", "where$Q$is an orthogonal matrix (i.e.,$Q^{T}Q=QQ^{T}=I$) whose columns are the eigenvectors (normalized to unit length) of$\\Sigma$, and$\\Lambda$is a diagonal matrix with diagonal elements equal to the eigenvalues of$\\Sigma$. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 3. GenCAT _method_\n", "\n", "The class-level test statistic is defined as:\n", "\\begin{eqnarray*}\n", "\\mathcal{C}_{s} = U^{T}U\n", "\\end{eqnarray*}\n", "\n", "where$U=\\Lambda^{-1/2}Q^{T}{\\bf z}$. \n", "\n", "Note that$\\mathcal{C}_s$can be expressed as$\\mathcal{C}_{s} = {\\bf z}^{T}(\\Lambda^{-1/2}Q^{T})^{T}(\\Lambda^{-1/2}Q^{T}){\\bf z} = {\\bf z}^{T}\\Sigma^{-1}{\\bf z}$and: \n", "\n", "\\begin{eqnarray*}\n", "{\\bf z}^{T}\\Sigma^{-1}{\\bf z} \\sim \\chi^2(n)\n", "\\end{eqnarray*}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 3. GenCAT _method_\n", "\n", "- It can be shown that$\\Sigma$can be estimated by${\\bf P}$, Pearson's pairwise correlations between SNPs. \n", "- Further projecting the test statistics onto the minimum set of eigenvectors that caputure$(1-\\phi)\\%$of the variability in$\\Sigma$offers stability in high correlation setting." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 3. GenCAT _example_\n", "\n", "- Application to Global Lipids Gene Consortium (GLGC) GWA meta-analysis (http://csg.sph.umich.edu//abecasis/public/lipids2013/) statistics for TGs\n", "- Analysis of all protein coding genes on chromosomes 13, 14 and 15.\n", "- Use 1000 Genomes CEU data (http://www.1000genomes.org/data) for estimation of correlation matrix,${\\bf P}$. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 3. GenCAT _example_\n", "\n", "First we load the GLGC TG summary statistics:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", " MarkerNameAllele1Allele2WeightGC.ZscoreGC.PvalueOverallDirection 1rs964184 c g 96576 -33.075 6.71e-240 - ------------------------- 2rs1260326 t c 96590 24.539 5.68e-133 + +++++++++++++++++++++++++ 3rs780094 t c 96546 23.768 7.08e-125 + +++++++++++++++++++++++++ 4rs2160669 t c 96598 -23.735 1.59e-124 - ---------------------+--- 5rs9326246 c g 96598 23.688 4.79e-124 + +++++++++++++++++++++-+++ 6rs780093 t c 96590 23.681 5.68e-124 + +++++++++++++++++++++++++ \n" ], "text/latex": [ "\\begin{tabular}{r|llllllll}\n", " & MarkerName & Allele1 & Allele2 & Weight & GC.Zscore & GC.Pvalue & Overall & Direction\\\\\n", "\\hline\n", "\t1 & rs964184 & c & g & 96576 & -33.075 & 6.71e-240 & - & -------------------------\\\\\n", "\t2 & rs1260326 & t & c & 96590 & 24.539 & 5.68e-133 & + & +++++++++++++++++++++++++\\\\\n", "\t3 & rs780094 & t & c & 96546 & 23.768 & 7.08e-125 & + & +++++++++++++++++++++++++\\\\\n", "\t4 & rs2160669 & t & c & 96598 & -23.735 & 1.59e-124 & - & ---------------------+---\\\\\n", "\t5 & rs9326246 & c & g & 96598 & 23.688 & 4.79e-124 & + & +++++++++++++++++++++-+++\\\\\n", "\t6 & rs780093 & t & c & 96590 & 23.681 & 5.68e-124 & + & +++++++++++++++++++++++++\\\\\n", "\\end{tabular}\n" ], "text/plain": [ " MarkerName Allele1 Allele2 Weight GC.Zscore GC.Pvalue Overall\n", "1 rs964184 c g 96576 -33.075 6.71e-240 -\n", "2 rs1260326 t c 96590 24.539 5.68e-133 +\n", "3 rs780094 t c 96546 23.768 7.08e-125 +\n", "4 rs2160669 t c 96598 -23.735 1.59e-124 -\n", "5 rs9326246 c g 96598 23.688 4.79e-124 +\n", "6 rs780093 t c 96590 23.681 5.68e-124 +\n", " Direction\n", "1 -------------------------\n", "2 +++++++++++++++++++++++++\n", "3 +++++++++++++++++++++++++\n", "4 ---------------------+---\n", "5 +++++++++++++++++++++-+++\n", "6 +++++++++++++++++++++++++" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " \n", "\t 1. 2692560 2. \n", "\t 3. 8 4. \n", " \n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 2692560\n", "\\item 8\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 2692560\n", "2. 8\n", "\n", "\n" ], "text/plain": [ "[1] 2692560 8" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "glgc <- read.delim(\"UseR2016_Tutorial_Files/GLGC_TG.tbl\",\n", " stringsAsFactors = F,header=T,sep=\"\")\n", "head(glgc)\n", "dim(glgc) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 3. GenCAT _example_\n", "\n", "- We then load the 1000 Genomes genotype data available in the GenCAT package. \n", "- These data include genotypes for SNPs on chromosomes 13, 14 and 15, for 99 individuals of northern and western european ancestry (CEU)." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A SnpMatrix with 99 rows and 84195 columns\n", "Row names: CEU_1 ... CEU_99 \n", "Col names: rs624673 ... rs1399026 \n" ] }, { "data": { "text/html": [ " \n", "\t 1. 99 2. \n", "\t 3. 84195 4. \n", " \n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 99\n", "\\item 84195\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 99\n", "2. 84195\n", "\n", "\n" ], "text/plain": [ "[1] 99 84195" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "library(GenCAT)\n", "data(\"geno\")\n", "geno_dat <- geno$genotypes\n", "snps <- geno$map\n", "print(geno_dat)\n", "dim(geno_dat)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 3. GenCAT _example_\n", "\n", "- Next we subset the GLGC data to include only those SNPs with 1000 Genomes data:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ " \n", "\t 1. 84195 2. \n", "\t 3. 9 4. \n", " \n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 84195\n", "\\item 9\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 84195\n", "2. 9\n", "\n", "\n" ], "text/plain": [ "[1] 84195 9" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", " SNPchrcMpositionA1.xA2.xA1.yA2.ytestStat 1rs100002213 NA 100461219C T t c -0.221 2rs100016013 NA 100950452T C t c -0.885 3rs100028115 NA 72553422 C T t c -4.091 4rs100029015 NA 100949609T C t c 0.079 5rs100052114 NA 70522484 A G a g -1.059 6rs100058715 NA 89050628 A G a g 3.007 \n" ], "text/latex": [ "\\begin{tabular}{r|lllllllll}\n", " & SNP & chr & cM & position & A1.x & A2.x & A1.y & A2.y & testStat\\\\\n", "\\hline\n", "\t1 & rs1000022 & 13 & NA & 100461219 & C & T & t & c & -0.221 \\\\\n", "\t2 & rs1000160 & 13 & NA & 100950452 & T & C & t & c & -0.885 \\\\\n", "\t3 & rs1000281 & 15 & NA & 72553422 & C & T & t & c & -4.091 \\\\\n", "\t4 & rs1000290 & 15 & NA & 100949609 & T & C & t & c & 0.079 \\\\\n", "\t5 & rs1000521 & 14 & NA & 70522484 & A & G & a & g & -1.059 \\\\\n", "\t6 & rs1000587 & 15 & NA & 89050628 & A & G & a & g & 3.007 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ " SNP chr cM position A1.x A2.x A1.y A2.y testStat\n", "1 rs1000022 13 NA 100461219 C T t c -0.221\n", "2 rs1000160 13 NA 100950452 T C t c -0.885\n", "3 rs1000281 15 NA 72553422 C T t c -4.091\n", "4 rs1000290 15 NA 100949609 T C t c 0.079\n", "5 rs1000521 14 NA 70522484 A G a g -1.059\n", "6 rs1000587 15 NA 89050628 A G a g 3.007" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "glgc2 <- glgc[,c(\"MarkerName\",\"Allele1\",\"Allele2\",\"GC.Zscore\")]\n", "colnames(glgc2) <- c(\"SNP\",\"A1\",\"A2\",\"testStat\")\n", "colnames(snps) <- c(\"chr\",\"SNP\",\"cM\",\"position\",\"A1\",\"A2\")\n", "merg <- merge(snps,glgc2,by.x=\"SNP\",by.y=\"SNP\")\n", "dim(merg)\n", "head(merg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 3. GenCAT _example_\n", "\n", "- Some additional housekeeping is required to make sure the allele assignments (and in turn the direction of the correlation) is the same:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# make A1.x and A1.y upper case\n", "merg$A1.y <- toupper(merg$A1.y)\n", "merg$A2.y <- toupper(merg$A2.y)\n", "# check 'A1.x' matches one 'y' allele\n", "keep <- merg[merg$A1.x == merg$A1.y | merg$A1.x == merg$A2.y,] \n", "# check 'A2.x' matches one 'y' allele\n", "keep2 <- keep[keep$A2.x == keep$A1.y | keep$A2.x == keep$A2.y,] " ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ " \n", "\t 1. 84146 2. \n", "\t 3. 9 4. \n", " \n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 84146\n", "\\item 9\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 84146\n", "2. 9\n", "\n", "\n" ], "text/plain": [ "[1] 84146 9" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", " SNPchrcMpositionA1.xA2.xA1.yA2.ytestStat 1rs100002213 NA 100461219C T T C 0.221 2rs100016013 NA 100950452T C T C -0.885 3rs100028115 NA 72553422 C T T C 4.091 4rs100029015 NA 100949609T C T C 0.079 5rs100052114 NA 70522484 A G A G -1.059 6rs100058715 NA 89050628 A G A G 3.007 \n" ], "text/latex": [ "\\begin{tabular}{r|lllllllll}\n", " & SNP & chr & cM & position & A1.x & A2.x & A1.y & A2.y & testStat\\\\\n", "\\hline\n", "\t1 & rs1000022 & 13 & NA & 100461219 & C & T & T & C & 0.221 \\\\\n", "\t2 & rs1000160 & 13 & NA & 100950452 & T & C & T & C & -0.885 \\\\\n", "\t3 & rs1000281 & 15 & NA & 72553422 & C & T & T & C & 4.091 \\\\\n", "\t4 & rs1000290 & 15 & NA & 100949609 & T & C & T & C & 0.079 \\\\\n", "\t5 & rs1000521 & 14 & NA & 70522484 & A & G & A & G & -1.059 \\\\\n", "\t6 & rs1000587 & 15 & NA & 89050628 & A & G & A & G & 3.007 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ " SNP chr cM position A1.x A2.x A1.y A2.y testStat\n", "1 rs1000022 13 NA 100461219 C T T C 0.221\n", "2 rs1000160 13 NA 100950452 T C T C -0.885\n", "3 rs1000281 15 NA 72553422 C T T C 4.091\n", "4 rs1000290 15 NA 100949609 T C T C 0.079\n", "5 rs1000521 14 NA 70522484 A G A G -1.059\n", "6 rs1000587 15 NA 89050628 A G A G 3.007" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# change test stat sign if A1.x != A1.y\n", "keep2$testStat[keep2$A1.x != keep2$A1.y] <- keep2$testStat[keep2$A1.x != keep2$A1.y]*(-1) \n", "dim(keep2)\n", "head(keep2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 3. GenCAT _example_\n", "\n", "- Next we read in a file that includes all PCG coordinates:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false, "scrolled": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", " chrstartstopclass 11 6909070008OR4F5 21 367658 368597 OR4F29-1 31 621095 622034 OR4F29-2 41 861120879961SAMD11 51 879582894679NOC2L 61 895966901099KLHL17 \n" ], "text/latex": [ "\\begin{tabular}{r|llll}\n", " & chr & start & stop & class\\\\\n", "\\hline\n", "\t1 & 1 & 69090 & 70008 & OR4F5\\\\\n", "\t2 & 1 & 367658 & 368597 & OR4F29-1\\\\\n", "\t3 & 1 & 621095 & 622034 & OR4F29-2\\\\\n", "\t4 & 1 & 861120 & 879961 & SAMD11\\\\\n", "\t5 & 1 & 879582 & 894679 & NOC2L \\\\\n", "\t6 & 1 & 895966 & 901099 & KLHL17\\\\\n", "\\end{tabular}\n" ], "text/plain": [ " chr start stop class\n", "1 1 69090 70008 OR4F5\n", "2 1 367658 368597 OR4F29-1\n", "3 1 621095 622034 OR4F29-2\n", "4 1 861120 879961 SAMD11\n", "5 1 879582 894679 NOC2L\n", "6 1 895966 901099 KLHL17" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "coords <- read.csv(\"useR2016_Tutorial_Files/ProCodgene_coords.csv\",stringsAsFactors=F)\n", "colnames(coords)[4] <- \"class\"\n", "head(coords)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 3. GenCAT _example_\n", "\n", "- Each SNP is then mapped to a class using the map2class() function in the GenCAT package:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Mapping SNPs to classes on chromosome 13.\"\n", "[1] \"Mapping SNPs to classes on chromosome 14.\"\n", "[1] \"Mapping SNPs to classes on chromosome 15.\"\n" ] }, { "data": { "text/html": [ "77409" ], "text/latex": [ "77409" ], "text/markdown": [ "77409" ], "text/plain": [ "[1] 77409" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "mapped <- map2class(coords,keep2)\n", "colnames(mapped) <- c(\"SNP\",\"chr\",\"cM\",\"position\",\"effect_allele\",\"other_allele\",\n", " \"A1\",\"A2\",\"testStat\",\"class\")\n", "length(unique(mapped$SNP))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
SNPchrcMpositioneffect_alleleother_alleleA1A2testStatclass
1rs1707688613 NA 19755612 C T T C -0.507 TUBA3C
2rs1707689113 NA 19755726 G C C G -0.562 TUBA3C
3rs1719471913 NA 19752733 C T T C 0.908 TUBA3C
4rs1719472613 NA 19753231 A G A G 0.868 TUBA3C
5rs1719473313 NA 19753310 G A A G 0.866 TUBA3C
6rs277449413 NA 19748709 T G T G 1.505 TUBA3C
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllll}\n", " & SNP & chr & cM & position & effect\\_allele & other\\_allele & A1 & A2 & testStat & class\\\\\n", "\\hline\n", "\t1 & rs17076886 & 13 & NA & 19755612 & C & T & T & C & -0.507 & TUBA3C \\\\\n", "\t2 & rs17076891 & 13 & NA & 19755726 & G & C & C & G & -0.562 & TUBA3C \\\\\n", "\t3 & rs17194719 & 13 & NA & 19752733 & C & T & T & C & 0.908 & TUBA3C \\\\\n", "\t4 & rs17194726 & 13 & NA & 19753231 & A & G & A & G & 0.868 & TUBA3C \\\\\n", "\t5 & rs17194733 & 13 & NA & 19753310 & G & A & A & G & 0.866 & TUBA3C \\\\\n", "\t6 & rs2774494 & 13 & NA & 19748709 & T & G & T & G & 1.505 & TUBA3C \\\\\n", "\\end{tabular}\n" ], "text/plain": [ " SNP chr cM position effect_allele other_allele A1 A2 testStat class\n", "1 rs17076886 13 NA 19755612 C T T C -0.507 TUBA3C\n", "2 rs17076891 13 NA 19755726 G C C G -0.562 TUBA3C\n", "3 rs17194719 13 NA 19752733 C T T C 0.908 TUBA3C\n", "4 rs17194726 13 NA 19753231 A G A G 0.868 TUBA3C\n", "5 rs17194733 13 NA 19753310 G A A G 0.866 TUBA3C\n", "6 rs2774494 13 NA 19748709 T G T G 1.505 TUBA3C" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "head(mapped)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 3. GenCAT _example_\n", "\n", "- We are now ready to apply GenCAT using the GenCAT() function:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Running GenCAT on 298 classes on chromosome 13.\"\n", "[1] \"Running GenCAT on 529 classes on chromosome 14.\"\n", "[1] \"Running GenCAT on 500 classes on chromosome 15.\"\n" ] } ], "source": [ "geno_sub <- geno_dat[,unique(mapped$SNP)]\n", "out <- GenCAT(mapped, geno_sub, snps)\n", "ntests <- dim(out$GenCAT)[1]" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
classchrn_SNPsn_ObsCsumTCsumP
774SLC25A29 14 5 3 25.68382 1.110751e-05
941TGM5 15 20 3 42.60935 2.97883e-09
949PPIP5K1-1 15 9 2 21.49987 2.144681e-05
950CKMT1B 15 1 1 21.11402 4.327493e-06
951STRC-1 15 2 1 23.38788 1.324103e-06
952CATSPER2 15 9 3 27.50306 4.617692e-06
953CKMT1A 15 2 2 21.27628 2.398359e-05
954PDIA3 15 12 2 22.83105 1.102304e-05
957SERINC4 15 3 2 20.80912 3.029403e-05
959WDR76 15 14 3 23.3486 3.415989e-05
960FRMD5 15 62 11 154.7388 1.600565e-27
1043LIPC 15 177 24 89.61096 1.671801e-09
1102TIPIN 15 4 3 23.8544 2.679076e-05
1138PKM 15 8 3 25.66676 1.119921e-05
1141HEXA 15 1 1 19.4481 1.033706e-05
\n" ], "text/latex": [ "\\begin{tabular}{r|llllll}\n", " & class & chr & n\\_SNPs & n\\_Obs & CsumT & CsumP\\\\\n", "\\hline\n", "\t774 & SLC25A29 & 14 & 5 & 3 & 25.68382 & 1.110751e-05\\\\\n", "\t941 & TGM5 & 15 & 20 & 3 & 42.60935 & 2.97883e-09\\\\\n", "\t949 & PPIP5K1-1 & 15 & 9 & 2 & 21.49987 & 2.144681e-05\\\\\n", "\t950 & CKMT1B & 15 & 1 & 1 & 21.11402 & 4.327493e-06\\\\\n", "\t951 & STRC-1 & 15 & 2 & 1 & 23.38788 & 1.324103e-06\\\\\n", "\t952 & CATSPER2 & 15 & 9 & 3 & 27.50306 & 4.617692e-06\\\\\n", "\t953 & CKMT1A & 15 & 2 & 2 & 21.27628 & 2.398359e-05\\\\\n", "\t954 & PDIA3 & 15 & 12 & 2 & 22.83105 & 1.102304e-05\\\\\n", "\t957 & SERINC4 & 15 & 3 & 2 & 20.80912 & 3.029403e-05\\\\\n", "\t959 & WDR76 & 15 & 14 & 3 & 23.3486 & 3.415989e-05\\\\\n", "\t960 & FRMD5 & 15 & 62 & 11 & 154.7388 & 1.600565e-27\\\\\n", "\t1043 & LIPC & 15 & 177 & 24 & 89.61096 & 1.671801e-09\\\\\n", "\t1102 & TIPIN & 15 & 4 & 3 & 23.8544 & 2.679076e-05\\\\\n", "\t1138 & PKM & 15 & 8 & 3 & 25.66676 & 1.119921e-05\\\\\n", "\t1141 & HEXA & 15 & 1 & 1 & 19.4481 & 1.033706e-05\\\\\n", "\\end{tabular}\n" ], "text/plain": [ " class chr n_SNPs n_Obs CsumT CsumP\n", "774 SLC25A29 14 5 3 25.68382 1.110751e-05\n", "941 TGM5 15 20 3 42.60935 2.978830e-09\n", "949 PPIP5K1-1 15 9 2 21.49987 2.144681e-05\n", "950 CKMT1B 15 1 1 21.11402 4.327493e-06\n", "951 STRC-1 15 2 1 23.38788 1.324103e-06\n", "952 CATSPER2 15 9 3 27.50306 4.617692e-06\n", "953 CKMT1A 15 2 2 21.27628 2.398359e-05\n", "954 PDIA3 15 12 2 22.83105 1.102304e-05\n", "957 SERINC4 15 3 2 20.80912 3.029403e-05\n", "959 WDR76 15 14 3 23.34860 3.415989e-05\n", "960 FRMD5 15 62 11 154.73880 1.600565e-27\n", "1043 LIPC 15 177 24 89.61096 1.671801e-09\n", "1102 TIPIN 15 4 3 23.85440 2.679076e-05\n", "1138 PKM 15 8 3 25.66676 1.119921e-05\n", "1141 HEXA 15 1 1 19.44810 1.033706e-05" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "out$GenCAT[out$GenCAT$CsumP<=0.05/ntests,]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 3. GenCAT _summary_\n", "\n", "- Post-analytic approach to identify class-level association\n", "- VEGAS$^*$uses numerical techniques to evaluate statistical signficance of same statistic\n", "- Extends the QT$^{**}$by considering test statistics from generalized linear model and introducing a data reduction step for stability.\n", "\n", "$^*$Liu et al. (Am J Hum Genet. 2010) PMID: 20598278\n", "\n", "$^{**}$Luo L et al (Eur J Hum Genet. 2010) PMID: 20442747" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Part 2: Interrogation of regional and higher-order associations.\n", "\n", "- Motivation$\\checkmark$\n", "- Sequence Kernel Association Test (SKAT)$\\checkmark$\n", "- Genetic Class Associatin Test (GenCAT)$\\checkmark\$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Additional References:\n", "- Li et al (Am J Hum Genet. 2011)\n", "- Wu et al. (Am J Hum Genet. 2010) PMID: 20560208\n", "- Huang et al. (PLoS Genet. 2011) PMID: 21829371\n", "- Li et al. (Am J Hum Genet. 2008) PMID: 18691683\n", "- Hu et al. (Am J Hum Genet. 2013)\n", "- Liu et al. (Nat Genet. 2014) PMID: 24336170\n", "- Mootha et al. (Nat Genet. 2003) PMID: 12808457\n", "- Subramanian et. al (Proc Natl Acad Sci USA. 2005) PMID: 16199517\n", "- Peng et al. (Eur J Hum Genet. 2010) PMID: 19584899\n", "- Segre et al. (PLoS Genet. 2010)\n", "- Weng et al. (BMC Bioinformatics. 2011) PMID: 21496265\n", "- Foulkes et al. (PLoS ONE. 2013) PMID: 23405096\n", "- Lee et al. (Am J Hum Genet. 2013) PMID: 23768515" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.3.0" }, "latex_envs": { "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 0 } }, "nbformat": 4, "nbformat_minor": 0 }