Question: overlap GRangesList and vcf
1
gravatar for Georg Otto
7.0 years ago by
Georg Otto120
Georg Otto120 wrote:
Hi, I apologise if this is trivial, but I haven't found a straight forward way to do this so far. Given a GRangesList with exons ## exons by gene range.exon <- exonsBy(txdb.ensgene, "gene") and a vcf file with SNP data (positions, alleles, etc), how can I generate a list of exons that contain SNPs (along with SNP positions)? Any hint will be appreciated! Georg
snp • 1.3k views
ADD COMMENTlink modified 7.0 years ago by Valerie Obenchain6.7k • written 7.0 years ago by Georg Otto120
Answer: overlap GRangesList and vcf
0
gravatar for Valerie Obenchain
7.0 years ago by
United States
Valerie Obenchain6.7k wrote:
Hi Georg, The readVcf() function in the VariantAnnotation package reads data from a VCF file into a VCF-class object, library(VariantAnnotation) fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation") vcf <- readVcf(fl, "hg19") > vcf class: VCF dim: 10376 5 genome: hg19 exptData(1): header fixed(4): REF ALT QUAL FILTER info(22): LDAF AVGPOST ... VT SNPSOURCE geno(3): GT DS GL rownames(10376): rs7410291 rs147922003 ... rs144055359 rs114526001 rowData values names(1): paramRangeID colnames(5): HG00096 HG00097 HG00099 HG00100 HG00101 colData names(1): Samples For details of the class slots and data accessors, ?'VCF-class' Before counting, make sure the chromosome names are consistent between the vcf and annotation. Here the annotation precedes chromosome numbers with 'chr' and the vcf file does not. > intersect(seqlevels(txdb), seqlevels(vcf)) character(0) We modify the vcf seqlevels to match the txdb using renameSeqlevels(), old <- seqlevels(vcf) new <- paste("chr", old, sep="") names(new) <- old vcf <- renameSeqlevels(vcf, new) > intersect(seqlevels(txdb), seqlevels(vcf)) [1] "chr22" The rowData slot contains a GRanges of positions which can be overlapped with your exons by genes, rd <- rowData(vcf) You wanted a list of exons hit by variants so we will unlist the GRangesList. If instead you want the hits grouped by gene, don't unlist(exbygn). library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene exbygn <- exonsBy(txdb, "gene") exons <- unlist(exbygn) Counting can be done with findOverlaps() or summarizeOverlaps(). See the man pages for details on how the counting is performed. summarizedOverlaps() is written with read gaps in mind and therefore requires the 'reads' argument to be a Bam file or GappedAlignements object. Since you are interested in SNPs for this example we'll use findOverlaps(). fo <- findOverlaps(exons, rd) Exons hit by variants are extracted from 'exons' with the 'queryHits' column, exonsHit <- exons[queryHits(fo)] and the corresponding variants are subset from the VCF-class object with the 'subjectHits' column. vcfHit <- vcf[subjectHits(fo)] The positions of the SNPs are in the GRanges in the rowData slot, rowData(vcfHit) If you wanted your hits grouped by gene, don't unlist(exbygn) before counting. Valerie On 08/10/12 09:55, Georg Otto wrote: > Hi, > > > I apologise if this is trivial, but I haven't found a straight forward > way to do this so far. > > Given a GRangesList with exons > > ## exons by gene > range.exon<- exonsBy(txdb.ensgene, "gene") > > and a vcf file with SNP data (positions, alleles, etc), how can I > generate a list of exons that contain SNPs (along with SNP positions)? > > Any hint will be appreciated! > > Georg > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 7.0 years ago by Valerie Obenchain6.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 341 users visited in the last hour