Question

An. Gambie RNA-Seq differential expression using gage and pathview

0

Entering edit mode

Luo Weijun ★ 1.6k

@luo-weijun-1783

Last seen 10 months ago

United States

You had some problem with your kegg gene set data. What you did: kegg.gs <- kegg.gsets(species = "aga", id.type = "kegg") cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = samp.idx, compare ="paired") kegg.gsets produces kegg gene sets with some meta-data. You need to process the right gene set data out before calling gage: kg.aga=kegg.gsets("aga") kegg.gs=kg. aga$kg.sets[kg. aga$sigmet.idx] cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = samp.idx, compare ="paired") ? You can check the details on your gene set data: names(kg.aga) lapply(kg.aga, head, 3) lapplykegg.gs[1:3], head, 3) you may also check the function documentation: ?kegg.gsets Actually you will find everything in the pacakge tutorials and documentations for functions you work with like gage or pathview etc: http://bioconductor.org/packages/release/bioc/html/gage.html http://bioconductor.org/packages/release/bioc/html/pathview.html ---------------------------------------------- Saul Lozano wrote: ???Dear bioconductor members, ? I have completed the "RNA-Seq Data Pathway and Gene-set Analysis Workflows" tutorial and have obtained the output png files ??? for the human example datasets???. Having completed ???the tutorial, I started working with my mosquito data. ???I aligned my reads using gmap-gsnap using " https://www.vectorbase.org/download/anopheles-gambiae- pestchromosomesagamp4fagz" and created the Transcription Database object using its corresponding annotation files that can be find at " https://www.vectorbase.org/download/anopheles-gambiae- pestbasefeaturesagamp41gff3gz " ??? ??? ? I have completed the pre-processing and normalization, the list of normalized reads per gene looks like this ???>cnts.norm??? ... AGAP009385? ? ? ? 1.729138e+01? ? ? ? 2.222712e+03? ? ? ? 8.552872e+02 AGAP009386? ? ? ? 3.094247e+01? ? ? ? 1.681387e+02? ? ? ? 7.808745e+01 AGAP009387? ? ? ? 1.274102e+02? ? ? ? 6.561511e+01? ? ? ? 7.441275e+01 AGAP009388? ? ? ? 1.695465e+03? ? ? ? 3.033674e+03? ? ? ? 1.940243e+03 AGAP009389? ? ? ? 1.071155e+03? ? ? ? 1.202602e+03? ? ? ? 9.646097e+02 AGAP009390? ? ? ? 0.000000e+00? ? ? ? 2.050472e+00? ? ? ? 1.837352e+00 AGAP009391? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AGAP009392? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AGAP009393? ? ? ? 1.820145e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AGAP009394? ? ? ? 0.000000e+00? ? ? ? 2.050472e+00? ? ? ? 0.000000e+00 ... ???the structure of the object is: > str(cnts.norm) num [1:12489, 1:12] 1857 3076 1390 4953 5027 ... - attr(*, "dimnames")=List of 2 ? ..$ : chr [1:12489] "AGAP000002" "AGAP000005" "AGAP000007" "AGAP000008" ... ? ..$ : chr [1:12] "library10.sorted.bam" "library11.sorted.bam" "library12.sorted.bam" "library1.sorted.bam" ... I load the kegg pathway database using kegg.gsets like this >kegg.gs <- kegg.gsets(species = "aga", id.type = "kegg") but the accession number are different in the Transcription database and the kegg. > kegg.gs ... $kg.sets$`aga04745 Phototransduction - fly` [1] "AgaP_AGAP000028" "AgaP_AGAP000348" "AgaP_AGAP000651" "AgaP_AGAP000945" [5] "AgaP_AGAP001506" "AgaP_AGAP001936" "AgaP_AGAP002145" "AgaP_AGAP005079" [9] "AgaP_AGAP005095" "AgaP_AGAP006263" "AgaP_AGAP006475" "AgaP_AGAP008435" [13] "AgaP_AGAP009115" "AgaP_AGAP009730" "AgaP_AGAP009953" "AgaP_AGAP010630" [17] "AgaP_AGAP010957" "AgaP_AGAP011099" "AgaP_AGAP011414" "AgaP_AGAP011516" [21] "AgaP_AGAP012026" "AgaP_AGAP012252" ... The gage tutorial mentions that the names of the genes in the counts and the kegg object should be the same, so I changed the rownames of the counts by adding "Agap_" > x <- data.frame(rownames(cnts.norm), stringsAsFactors=FALSE) > x$new <- paste("AgaP_", x[,1],sep="") > row.names(cnts.norm) <- x$new so the list now looks like this ... AgaP_AGAP009385? ? ? ? 1.729138e+01? ? ? ? 2.222712e+03? ? ? ? 8.552872e+02 AgaP_AGAP009386? ? ? ? 3.094247e+01? ? ? ? 1.681387e+02? ? ? ? 7.808745e+01 AgaP_AGAP009387? ? ? ? 1.274102e+02? ? ? ? 6.561511e+01? ? ? ? 7.441275e+01 AgaP_AGAP009388? ? ? ? 1.695465e+03? ? ? ? 3.033674e+03? ? ? ? 1.940243e+03 AgaP_AGAP009389? ? ? ? 1.071155e+03? ? ? ? 1.202602e+03? ? ? ? 9.646097e+02 AgaP_AGAP009390? ? ? ? 0.000000e+00? ? ? ? 2.050472e+00? ? ? ? 1.837352e+00 AgaP_AGAP009391? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AgaP_AGAP009392? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AgaP_AGAP009393? ? ? ? 1.820145e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AgaP_AGAP009394? ? ? ? 0.000000e+00? ? ? ? 2.050472e+00? ? ? ? 0.000000e+00 ... In theory everything is set for the gage analysis like this > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = samp.idx, compare ="paired") but I get > cnts.kegg.p $greater ? ? ? ? ???p.geomean stat.mean p.val q.val set.size library2.sorted.bam kg.sets? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 2? ? ? ? ? ? ? ? ? NA sigmet.idx? ? ? ? NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA met.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA dise.idx? ? ? ? ? NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA ? ? ? ? ???library5.sorted.bam kg.sets? ? ? ? ? ? ? ? ? ???NA sigmet.idx? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ? ? ? ? ? ???NA met.idx? ? ? ? ? ? ? ? ? ???NA dise.idx? ? ? ? ? ? ? ? ? ? NA $less ? ? ? ? ???p.geomean stat.mean p.val q.val set.size library2.sorted.bam kg.sets? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 2? ? ? ? ? ? ? ? ? NA sigmet.idx? ? ? ? NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA met.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA dise.idx? ? ? ? ? NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA ? ? ? ? ???library5.sorted.bam kg.sets? ? ? ? ? ? ? ? ? ???NA sigmet.idx? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ? ? ? ? ? ???NA met.idx? ? ? ? ? ? ? ? ? ???NA dise.idx? ? ? ? ? ? ? ? ? ? NA $stats ? ? ? ? ???stat.mean library2.sorted.bam library5.sorted.bam kg.sets? ? ? ? ? NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA sigmet.idx? ? ???NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ? NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA met.idx? ? ? ? ? NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA dise.idx? ? ? ???NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA My questions are, how do I debug this problem? is the gene name (accession number) change correct? Are "NaN"s the result of zeros in the datasets? any help will be greatly appreciated. Best regards, Saul ??? [[alternative HTML version deleted]]

Transcription Annotation Normalization gage pathview Transcription Annotation Normalization • 1.7k views

ADD COMMENT • link updated 9.8 years ago by Saul Lozano ▴ 20 • written 9.8 years ago by Luo Weijun ★ 1.6k

score 0 · Answer 1 · 2014-07-07

Dear Luo Weijun, Thank you, the data transformation fixed my problem. I been poring over manuals, example code and forums but the solution escaped me. Thank you again, On Thu, Jul 3, 2014 at 8:14 AM, Luo Weijun <luo_weijun@yahoo.com> wrote: > You had some problem with your kegg gene set data. What you did: > kegg.gs <- kegg.gsets(species = "aga", id.type = "kegg") > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = > samp.idx, compare ="paired") > > kegg.gsets produces kegg gene sets with some meta-data. You need to > process the right gene set data out before calling gage: > kg.aga=kegg.gsets("aga") > kegg.gs=kg. aga$kg.sets[kg. aga$sigmet.idx] > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = > samp.idx, compare ="paired") > â¦ > > You can check the details on your gene set data: > names(kg.aga) > lapply(kg.aga, head, 3) > lapplykegg.gs[1:3], head, 3) > > you may also check the function documentation: > ?kegg.gsets > Actually you will find everything in the pacakge tutorials and > documentations for functions you work with like gage or pathview etc: > http://bioconductor.org/packages/release/bioc/html/gage.html > http://bioconductor.org/packages/release/bioc/html/pathview.html > > ---------------------------------------------- > Saul Lozano wrote: > > Ã¢â¬â¹Dear bioconductor members, > > I have completed the "RNA-Seq Data Pathway and Gene-set > Analysis > Workflows" tutorial and have obtained the output png files > Ã¢â¬â¹ for the human example datasetsÃ¢â¬â¹. > > Having completed > Ã¢â¬â¹the tutorial, I started working with my mosquito > data. > Ã¢â¬â¹I aligned my reads using gmap-gsnap using " > > https://www.vectorbase.org/download/anopheles-gambiae- pestchromosomesagamp4fagz > " > and created the Transcription Database object using its > corresponding > annotation files that can be find at " > > https://www.vectorbase.org/download/anopheles-gambiae- pestbasefeaturesagamp41gff3gz > " > Ã¢â¬â¹ Ã¢â¬â¹ > I have completed the pre-processing and normalization, > the list of > normalized reads per gene looks like this > > Ã¢â¬â¹>cnts.normÃ¢â¬â¹ > > ... > AGAP009385 1.729138e+01 2.222712e+03 > 8.552872e+02 > AGAP009386 3.094247e+01 1.681387e+02 > 7.808745e+01 > AGAP009387 1.274102e+02 6.561511e+01 > 7.441275e+01 > AGAP009388 1.695465e+03 3.033674e+03 > 1.940243e+03 > AGAP009389 1.071155e+03 1.202602e+03 > 9.646097e+02 > AGAP009390 0.000000e+00 2.050472e+00 > 1.837352e+00 > AGAP009391 0.000000e+00 0.000000e+00 > 0.000000e+00 > AGAP009392 0.000000e+00 0.000000e+00 > 0.000000e+00 > AGAP009393 1.820145e+00 0.000000e+00 > 0.000000e+00 > AGAP009394 0.000000e+00 2.050472e+00 > 0.000000e+00 > ... > > Ã¢â¬â¹the structure of the object is: > > > str(cnts.norm) > num [1:12489, 1:12] 1857 3076 1390 4953 5027 ... > - attr(*, "dimnames")=List of 2 > ..$ : chr [1:12489] "AGAP000002" "AGAP000005" > "AGAP000007" "AGAP000008" > ... > ..$ : chr [1:12] "library10.sorted.bam" > "library11.sorted.bam" > "library12.sorted.bam" "library1.sorted.bam" ... > > I load the kegg pathway database using kegg.gsets like this > > >kegg.gs <- kegg.gsets(species = "aga", id.type = > "kegg") > > but the accession number are different in the Transcription > database and > the kegg. > > > kegg.gs > ... > $kg.sets$`aga04745 Phototransduction - fly` > [1] "AgaP_AGAP000028" "AgaP_AGAP000348" "AgaP_AGAP000651" > "AgaP_AGAP000945" > [5] "AgaP_AGAP001506" "AgaP_AGAP001936" "AgaP_AGAP002145" > "AgaP_AGAP005079" > [9] "AgaP_AGAP005095" "AgaP_AGAP006263" "AgaP_AGAP006475" > "AgaP_AGAP008435" > [13] "AgaP_AGAP009115" "AgaP_AGAP009730" "AgaP_AGAP009953" > "AgaP_AGAP010630" > [17] "AgaP_AGAP010957" "AgaP_AGAP011099" "AgaP_AGAP011414" > "AgaP_AGAP011516" > [21] "AgaP_AGAP012026" "AgaP_AGAP012252" > ... > The gage tutorial mentions that the names of the genes in > the counts and > the kegg object should be the same, so I changed the > rownames of the counts > by adding "Agap_" > > > x <- data.frame(rownames(cnts.norm), > stringsAsFactors=FALSE) > > x$new <- paste("AgaP_", x[,1],sep="") > > row.names(cnts.norm) <- x$new > > so the list now looks like this > > ... > AgaP_AGAP009385 1.729138e+01 > 2.222712e+03 8.552872e+02 > AgaP_AGAP009386 3.094247e+01 > 1.681387e+02 7.808745e+01 > AgaP_AGAP009387 1.274102e+02 > 6.561511e+01 7.441275e+01 > AgaP_AGAP009388 1.695465e+03 > 3.033674e+03 1.940243e+03 > AgaP_AGAP009389 1.071155e+03 > 1.202602e+03 9.646097e+02 > AgaP_AGAP009390 0.000000e+00 > 2.050472e+00 1.837352e+00 > AgaP_AGAP009391 0.000000e+00 > 0.000000e+00 0.000000e+00 > AgaP_AGAP009392 0.000000e+00 > 0.000000e+00 0.000000e+00 > AgaP_AGAP009393 1.820145e+00 > 0.000000e+00 0.000000e+00 > AgaP_AGAP009394 0.000000e+00 > 2.050472e+00 0.000000e+00 > ... > > In theory everything is set for the gage analysis like this > > > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref > = ref.idx, samp = > samp.idx, compare ="paired") > > but I get > > > cnts.kegg.p > $greater > p.geomean stat.mean p.val q.val set.size > library2.sorted.bam > kg.sets NA NaN NA NA > 2 NA > sigmet.idx NA NaN NA NA > 0 NA > sig.idx NA NaN NA NA > 0 NA > met.idx NA NaN NA NA > 0 NA > dise.idx NA NaN NA NA > 0 NA > library5.sorted.bam > kg.sets NA > sigmet.idx NA > sig.idx NA > met.idx NA > dise.idx NA > > $less > p.geomean stat.mean p.val q.val set.size > library2.sorted.bam > kg.sets NA NaN NA NA > 2 NA > sigmet.idx NA NaN NA NA > 0 NA > sig.idx NA NaN NA NA > 0 NA > met.idx NA NaN NA NA > 0 NA > dise.idx NA NaN NA NA > 0 NA > library5.sorted.bam > kg.sets NA > sigmet.idx NA > sig.idx NA > met.idx NA > dise.idx NA > > $stats > stat.mean library2.sorted.bam > library5.sorted.bam > kg.sets NaN NA > NA > sigmet.idx NaN NA > NA > sig.idx NaN NA > NA > met.idx NaN NA > NA > dise.idx NaN NA > NA > > My questions are, how do I debug this problem? is the gene > name (accession > number) change correct? Are "NaN"s the result of zeros in > the datasets? any > help will be greatly appreciated. > > Best regards, Saul > > [[alternative HTML version deleted]] > > > [[alternative HTML version deleted]]