Extracting overlapping gene names from a list of peaks
2
0
Entering edit mode
@patrick-schorderet-6081
Last seen 9.6 years ago
United States
Dear all, I have a list of peaks from ChIPseq experiments. Now I am trying to find to over which genes these peaks overlap (and extract the gene name). I'm sure this should be pretty easy, but I am just starting with bioconductor, so some concepts are still vague. Here's what I did so far: # Not run because it is installed (as well as other packages) # source("http://bioconductor.org/biocLite.R") # biocLite("TxDb.Dmelanogaster.UCSC.dm3.ensGene") # Load the Dmelanogaster genome library(TxDb.Dmelanogaster.UCSC.dm3.ensGene, quietly = TRUE) txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene ee <- exonsBy(txdb, "gene") # Load an subset of my peaks for the sake of the example finalPeaks <- rbind(c("chr3R", "2788500", "2842850", "2815675", "54350"), (c("chr3R", "12484350", "12661350", "12572850", "177000"))) rownames(finalPeaks) <- c("Peak 1", "Peak 2") colnames(finalPeaks) <- c("chr", "start", "end", "center", "length") # Create a GRanges object with the file I have # finalPeaks is a matrix with rows being individual peaks and columns <- c() GRfinalPeaks <- GRanges(finalPeaks[,1], IRanges(start = as.numeric(finalPeaks[,2]), end = as.numeric(finalPeaks[,3]))) I'm stuck from here. What I'd like is to get, for each peak, the overlapping genes. For example, an output that would be: Peak1: GeneA, GeneB, GeneC, etc Peak2: GeneD Also, I don't know how simple it is because (as specified in the output example) one peak can overlap several genes or none at all.. Thanks for any help, Patrick [[alternative HTML version deleted]]
ChIPSeq chipseq ChIPSeq chipseq • 1.6k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 6 hours ago
United States
Hi Patrick, On 8/6/2013 11:48 AM, Patrick Schorderet wrote: > Dear all, > > I have a list of peaks from ChIPseq experiments. Now I am trying to find to over which genes these peaks overlap (and extract the gene name). > I'm sure this should be pretty easy, but I am just starting with bioconductor, so some concepts are still vague. > Here's what I did so far: > > # Not run because it is installed (as well as other packages) > # source("http://bioconductor.org/biocLite.R") > # biocLite("TxDb.Dmelanogaster.UCSC.dm3.ensGene") > # Load the Dmelanogaster genome > library(TxDb.Dmelanogaster.UCSC.dm3.ensGene, quietly = TRUE) > txdb<- TxDb.Dmelanogaster.UCSC.dm3.ensGene > ee<- exonsBy(txdb, "gene") > > # Load an subset of my peaks for the sake of the example > finalPeaks<- rbind(c("chr3R", "2788500", "2842850", "2815675", "54350"), (c("chr3R", "12484350", "12661350", "12572850", "177000"))) > rownames(finalPeaks)<- c("Peak 1", "Peak 2") > colnames(finalPeaks)<- c("chr", "start", "end", "center", "length") > > # Create a GRanges object with the file I have > # finalPeaks is a matrix with rows being individual peaks and columns<- c() > GRfinalPeaks<- GRanges(finalPeaks[,1], IRanges(start = as.numeric(finalPeaks[,2]), end = as.numeric(finalPeaks[,3]))) > > I'm stuck from here. What I'd like is to get, for each peak, the overlapping genes. > For example, an output that would be: > > Peak1: GeneA, GeneB, GeneC, etc > Peak2: GeneD > > Also, I don't know how simple it is because (as specified in the output example) one peak can overlap several genes or none at all.. > Thanks for any help, > sapply(1:2, function(x) names(ee[ee %over% GRfinalPeaks[x,],])) [[1]] [1] "FBgn0260642" [[2]] [1] "FBgn0000014" "FBgn0003944" "FBgn0015230" "FBgn0020556" "FBgn0051498" [6] "FBgn0063261" "FBgn0084245" "FBgn0084688" "FBgn0085056" You can then coerce that to whatever form you like. Best, Jim > > Patrick > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
You can efficiently get a list like this: hits <- findOverlaps(GRfinalPeaks, ee) listOfGenesByPeak <- split(names(ee)[subjectHits(hits)], queryHits(hits)) But maybe what you want is a long-form table: DataFrame(peak = queryHits(hits), gene = names(ee)[subjectHits(hits)]) On Tue, Aug 6, 2013 at 8:48 AM, Patrick Schorderet < patrick.schorderet@molbio.mgh.harvard.edu> wrote: > Dear all, > > I have a list of peaks from ChIPseq experiments. Now I am trying to find > to over which genes these peaks overlap (and extract the gene name). > I'm sure this should be pretty easy, but I am just starting with > bioconductor, so some concepts are still vague. > Here's what I did so far: > > # Not run because it is installed (as well as other packages) > # source("http://bioconductor.org/biocLite.R") > # biocLite("TxDb.Dmelanogaster.UCSC.dm3.ensGene") > # Load the Dmelanogaster genome > library(TxDb.Dmelanogaster.UCSC.dm3.ensGene, quietly = TRUE) > txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene > ee <- exonsBy(txdb, "gene") > > # Load an subset of my peaks for the sake of the example > finalPeaks <- rbind(c("chr3R", "2788500", "2842850", "2815675", "54350"), > (c("chr3R", "12484350", "12661350", "12572850", "177000"))) > rownames(finalPeaks) <- c("Peak 1", "Peak 2") > colnames(finalPeaks) <- c("chr", "start", "end", "center", "length") > > # Create a GRanges object with the file I have > # finalPeaks is a matrix with rows being individual peaks and columns <- > c() > GRfinalPeaks <- GRanges(finalPeaks[,1], IRanges(start = > as.numeric(finalPeaks[,2]), end = as.numeric(finalPeaks[,3]))) > > I'm stuck from here. What I'd like is to get, for each peak, the > overlapping genes. > For example, an output that would be: > > Peak1: GeneA, GeneB, GeneC, etc > Peak2: GeneD > > Also, I don't know how simple it is because (as specified in the output > example) one peak can overlap several genes or none at all.. > Thanks for any help, > > Patrick > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 525 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6