Pathview+GAGE workflow on RNA-seq data pathway analysis and visualization (was: Pathview published in Bioinformatics)
0
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 19 months ago
United States
Hi Krishna, Yes, pathview is definitely useful for RNA-seq data analysis. Actually pathview can be used to visualize a variety of user data as long as they can be mapped to pathways. I am not sure about cufflinks output format. But below is an example workflow on RNA-seq data using tophat + Bioconductor facilities. The raw reads data were downloaded from http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-1147/, and then mapped using tophat. This is the example data used in the RNA-seq section of the R/Bioconductor course at http://www.bioconductor.org/help/course- materials/2013/SeattleMay2013/. They worked with reads mapped to chromosome 14 only, here I worked with all reads/genes for a practical pathway analysis. Attached PDF file includes a few example pathview output graphs from this analysis. This is just a representative and concise workflow on pathway analysis and visualization with RNA-seq data. There are lots of issues like RNA-seq data quality assessment and pre-processing etc are not covered or detailed here. HTH. Weijun Luo ##package installation within R if you haven't done so, you will need to work with R 3.0 or newer version source("http://bioconductor.org/biocLite.R") biocLite(c("pathview", "gage", "Rsamtools", "TxDb.Hsapiens.UCSC.hg19.knownGene") ##extract exon regions by gene (i.e. Annotation of known gene models) library(TxDb.Hsapiens.UCSC.hg19.knownGene) exByGn <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene") ##count the reads mapped to genes/exons by tophat #used the files "accepted_hits.bam" in tophat output directory, but you need to rename them follow your sample names library(Rsamtools) fls <- list.files("/path/to/your/tophat_out/", pattern="bam$", full.names =T) bamfls <- BamFileList(fls) flag <- scanBamFlag(isNotPrimaryRead=FALSE, isProperPair=TRUE) param <- ScanBamParam(flag=flag) gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union", ignore.strand=TRUE, single.end=FALSE, param=param) ###extract, normalize and process the read counts into log2 expression matrix expdat=assay(gnCnt) libsizes=colSums(expdat) size.factor=libsizes/exp(mean(log(libsizes))) expdat.norm=t(t(expdat)/size.factor) dim(expdat.norm) sel.rn=rowSums(expdat.norm) != 0 expdat.norm=expdat.norm[sel.rn,] dim(expdat.norm) range(expdat.norm) #log2 transformation idx=expdat.norm==0 expdat.norm[expdat.norm==0]=0.1 range(expdat.norm) expdat.norm=log2(expdat.norm) range(expdat.norm) ###Pathway analysis using GAGE followed by visualization with Pathview library(gage) ref.idx=5:8 samp.idx=1:4 datakegg.gs) expdat.kegg.p <- gage(expdat.norm, gsets = kegg.gs, ref = ref.idx, samp = samp.idx, compare ="unpaired") #differential expression: log2 ratio or fold change expdat.d= expdat.norm[, samp.idx]-rowMeans(expdat.norm[, ref.idx]) #up-regulated pathways visualized by pathview sel <- expdat.kegg.p$greater[, "q.val"] < 0.1 & !is.na(expdat.kegg.p$greater[, "q.val"]) path.ids <- rownames(expdat.kegg.p$greater)[sel] path.ids2 <- substr(path.ids, 1, 8) library(pathview) pv.out.list <- sapply(path.ids2, function(pid) pathview(gene.data = expdat.d[, 1:2], pathway.id = pid, species = "hsa")) #down-regulated pathways visualized by pathview sel.l <- expdat.kegg.p$less[, "q.val"] < 0.1 & !is.na(expdat.kegg.p$less[, "q.val"]) path.ids.l <- rownames(expdat.kegg.p$less)[sel.l] path.ids.l2 <- substr(path.ids.l, 1, 8) pv.out.list.l <- sapply(path.ids.l2, function(pid) pathview(gene.data = expdat.d[, 1:2], pathway.id = pid, species = "hsa")) -------------------------------------------- On Wed, 7/24/13, km <srikrishnamohan at="" gmail.com=""> wrote: Subject: Re: [BioC] Pathview published in Bioinformatics Cc: "Ed" <edforum at="" gmail.com="">, "BioC Help" <bioconductor at="" r-project.org=""> Date: Wednesday, July 24, 2013, 12:39 AM Dear All, Is this package useful to do pathway analysis? with RNA-seq? based expression? data - say for eg. gene expression analysis results from tophat/cuflinks pipeline ? Regards, Krishna Mohan On Fri, Jun 28, 2013 wrote: A little more info. You may download the KGML data file for Propanoate metabolism pathway. Gene nodes correspond to node entries with type="gene", while white nodes correspond to entries with type="ortholog", ortholog genes which are not mapped for a particular KEGG species. These ortholog gene nodes are still show are in both KEGG and Graphviz view, except no gene data can be mapped when calling pathview function with species = "hsa" (or other KEGG species). However, gene data can still be mapped to these nodes when species = "ko". This is relevant for ortholog gene data or metagenomic data. HTH. ----- Original Message ----- To: Ed <edforum at="" gmail.com=""> Cc: BioC Help <bioconductor at="" r-project.org=""> Sent: Thursday, June 27, 2013 6:32 PM Subject: Re: [BioC] Pathview published in Bioinformatics Yes, you can replace get gene symbols instead of EC numbers by setting kegg.native = T, same.layer = F for KEGG view. However, only those enzyme nodes/genes which are present in the KGML data file. These nodes are marked in green in original KEGG graph for particular species, for example Propanoate metabolism - Homo sapiens: http://www.genome.jp/kegg-bin/show_pathway?map=hsa00640 In fact for Graphviz view (kegg.native = F), the pathway graph are also limited to genes which are presented in KGML data file. Hence you may see some inconsistence for metabolic pathway graphs between KEGG view and Graphviz view. Unfortunately, there is little we are limited by the KGML source data files. ________________________________ From: Ed <edforum at="" gmail.com=""> Sent: Thursday, June 27, 2013 12:04 AM Subject: Re: [BioC] Pathview published in Bioinformatics so if I understand and tried correctly, I cannot simply replace the EC in the graph (like Propanoate metabolism) with gene symbol. If I need to do so, I need to set kegg.native=F, which means it will change the graph structure of Propanoate Metoblism. Am I right? wrote: Hi Nick, >Yes, you can get gene symbol instead of EC (or original KEGG node labels). ?If you want a native KEGG view, set kegg.native = T, same.layer = F when you call pathview function. Otherwise, simply set set kegg.native = F for a Graphviz view. As example outputs, you may compare the gene symbols in Figure 1b and Figure 2 vs Figure 1a in the vignette. HTH.Weijun > > >________________________________ > From: Ed <edforum at="" gmail.com=""> >Sent: Tuesday, June 25, 2013 9:10 PM >Subject: Re: [BioC] Pathview published in Bioinformatics > > > >Hi Dr. Luo, > > [[elided Yahoo spam]] > > >I am just wondering if you can change the EC into gene symbol for the enzyme in KEGG pathway? > > >Thanks, > > >Nick > > > > > > > wrote: > >BioC List, >>The pathview package was recently published in >>Bioinformatics: >>http://bioinformatics.oxfordjournals.org/content/early/2013/06/11/b ioinformatics.btt285.full >>? >>Pathview is an R/Bioconductor package for pathway based data >>integration and visualization. It maps and renders a wide variety of biological >>data on relevant pathway graphs. >>? >>The package is available through Bioconductor and R-Forge: >>http://bioconductor.org/packages/release/bioc/html/pathview.html >>http://pathview.r-forge.r-project.org/ >>Please try it out and let me know if you have any [[elided Yahoo spam]] >>Weijun Luo >> >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at r-project.org >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Annotation Pathways Visualization graph PROcess gage pathview Annotation Pathways graph • 3.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 903 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6