Search
Question: An. Gambie RNA-Seq differential expression using gage and pathview
0
gravatar for Luo Weijun
3.4 years ago by
Luo Weijun1.4k
United States
Luo Weijun1.4k wrote:
You had some problem with your kegg gene set data. What you did: kegg.gs <- kegg.gsets(species = "aga", id.type = "kegg") cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = samp.idx, compare ="paired") kegg.gsets produces kegg gene sets with some meta-data. You need to process the right gene set data out before calling gage: kg.aga=kegg.gsets("aga") kegg.gs=kg. aga$kg.sets[kg. aga$sigmet.idx] cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = samp.idx, compare ="paired") ? You can check the details on your gene set data: names(kg.aga) lapply(kg.aga, head, 3) lapplykegg.gs[1:3], head, 3) you may also check the function documentation: ?kegg.gsets Actually you will find everything in the pacakge tutorials and documentations for functions you work with like gage or pathview etc: http://bioconductor.org/packages/release/bioc/html/gage.html http://bioconductor.org/packages/release/bioc/html/pathview.html ---------------------------------------------- Saul Lozano wrote: ???Dear bioconductor members, ? I have completed the "RNA-Seq Data Pathway and Gene-set Analysis Workflows" tutorial and have obtained the output png files ??? for the human example datasets???. Having completed ???the tutorial, I started working with my mosquito data. ???I aligned my reads using gmap-gsnap using " https://www.vectorbase.org/download/anopheles-gambiae- pestchromosomesagamp4fagz" and created the Transcription Database object using its corresponding annotation files that can be find at " https://www.vectorbase.org/download/anopheles-gambiae- pestbasefeaturesagamp41gff3gz " ??? ??? ? I have completed the pre-processing and normalization, the list of normalized reads per gene looks like this ???>cnts.norm??? ... AGAP009385? ? ? ? 1.729138e+01? ? ? ? 2.222712e+03? ? ? ? 8.552872e+02 AGAP009386? ? ? ? 3.094247e+01? ? ? ? 1.681387e+02? ? ? ? 7.808745e+01 AGAP009387? ? ? ? 1.274102e+02? ? ? ? 6.561511e+01? ? ? ? 7.441275e+01 AGAP009388? ? ? ? 1.695465e+03? ? ? ? 3.033674e+03? ? ? ? 1.940243e+03 AGAP009389? ? ? ? 1.071155e+03? ? ? ? 1.202602e+03? ? ? ? 9.646097e+02 AGAP009390? ? ? ? 0.000000e+00? ? ? ? 2.050472e+00? ? ? ? 1.837352e+00 AGAP009391? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AGAP009392? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AGAP009393? ? ? ? 1.820145e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AGAP009394? ? ? ? 0.000000e+00? ? ? ? 2.050472e+00? ? ? ? 0.000000e+00 ... ???the structure of the object is: > str(cnts.norm) num [1:12489, 1:12] 1857 3076 1390 4953 5027 ... - attr(*, "dimnames")=List of 2 ? ..$ : chr [1:12489] "AGAP000002" "AGAP000005" "AGAP000007" "AGAP000008" ... ? ..$ : chr [1:12] "library10.sorted.bam" "library11.sorted.bam" "library12.sorted.bam" "library1.sorted.bam" ... I load the kegg pathway database using kegg.gsets like this >kegg.gs <- kegg.gsets(species = "aga", id.type = "kegg") but the accession number are different in the Transcription database and the kegg. > kegg.gs ... $kg.sets$`aga04745 Phototransduction - fly` [1] "AgaP_AGAP000028" "AgaP_AGAP000348" "AgaP_AGAP000651" "AgaP_AGAP000945" [5] "AgaP_AGAP001506" "AgaP_AGAP001936" "AgaP_AGAP002145" "AgaP_AGAP005079" [9] "AgaP_AGAP005095" "AgaP_AGAP006263" "AgaP_AGAP006475" "AgaP_AGAP008435" [13] "AgaP_AGAP009115" "AgaP_AGAP009730" "AgaP_AGAP009953" "AgaP_AGAP010630" [17] "AgaP_AGAP010957" "AgaP_AGAP011099" "AgaP_AGAP011414" "AgaP_AGAP011516" [21] "AgaP_AGAP012026" "AgaP_AGAP012252" ... The gage tutorial mentions that the names of the genes in the counts and the kegg object should be the same, so I changed the rownames of the counts by adding "Agap_" > x <- data.frame(rownames(cnts.norm), stringsAsFactors=FALSE) > x$new <- paste("AgaP_", x[,1],sep="") > row.names(cnts.norm) <- x$new so the list now looks like this ... AgaP_AGAP009385? ? ? ? 1.729138e+01? ? ? ? 2.222712e+03? ? ? ? 8.552872e+02 AgaP_AGAP009386? ? ? ? 3.094247e+01? ? ? ? 1.681387e+02? ? ? ? 7.808745e+01 AgaP_AGAP009387? ? ? ? 1.274102e+02? ? ? ? 6.561511e+01? ? ? ? 7.441275e+01 AgaP_AGAP009388? ? ? ? 1.695465e+03? ? ? ? 3.033674e+03? ? ? ? 1.940243e+03 AgaP_AGAP009389? ? ? ? 1.071155e+03? ? ? ? 1.202602e+03? ? ? ? 9.646097e+02 AgaP_AGAP009390? ? ? ? 0.000000e+00? ? ? ? 2.050472e+00? ? ? ? 1.837352e+00 AgaP_AGAP009391? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AgaP_AGAP009392? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AgaP_AGAP009393? ? ? ? 1.820145e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AgaP_AGAP009394? ? ? ? 0.000000e+00? ? ? ? 2.050472e+00? ? ? ? 0.000000e+00 ... In theory everything is set for the gage analysis like this > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = samp.idx, compare ="paired") but I get > cnts.kegg.p $greater ? ? ? ? ???p.geomean stat.mean p.val q.val set.size library2.sorted.bam kg.sets? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 2? ? ? ? ? ? ? ? ? NA sigmet.idx? ? ? ? NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA met.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA dise.idx? ? ? ? ? NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA ? ? ? ? ???library5.sorted.bam kg.sets? ? ? ? ? ? ? ? ? ???NA sigmet.idx? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ? ? ? ? ? ???NA met.idx? ? ? ? ? ? ? ? ? ???NA dise.idx? ? ? ? ? ? ? ? ? ? NA $less ? ? ? ? ???p.geomean stat.mean p.val q.val set.size library2.sorted.bam kg.sets? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 2? ? ? ? ? ? ? ? ? NA sigmet.idx? ? ? ? NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA met.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA dise.idx? ? ? ? ? NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA ? ? ? ? ???library5.sorted.bam kg.sets? ? ? ? ? ? ? ? ? ???NA sigmet.idx? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ? ? ? ? ? ???NA met.idx? ? ? ? ? ? ? ? ? ???NA dise.idx? ? ? ? ? ? ? ? ? ? NA $stats ? ? ? ? ???stat.mean library2.sorted.bam library5.sorted.bam kg.sets? ? ? ? ? NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA sigmet.idx? ? ???NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ? NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA met.idx? ? ? ? ? NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA dise.idx? ? ? ???NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA My questions are, how do I debug this problem? is the gene name (accession number) change correct? Are "NaN"s the result of zeros in the datasets? any help will be greatly appreciated. Best regards, Saul ??? [[alternative HTML version deleted]]
ADD COMMENTlink modified 3.4 years ago by Saul Lozano20 • written 3.4 years ago by Luo Weijun1.4k
0
gravatar for Saul Lozano
3.4 years ago by
Saul Lozano20
Saul Lozano20 wrote:
Dear Luo Weijun, Thank you, the data transformation fixed my problem. I been poring over manuals, example code and forums but the solution escaped me. Thank you again, On Thu, Jul 3, 2014 at 8:14 AM, Luo Weijun <luo_weijun@yahoo.com> wrote: > You had some problem with your kegg gene set data. What you did: > kegg.gs <- kegg.gsets(species = "aga", id.type = "kegg") > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = > samp.idx, compare ="paired") > > kegg.gsets produces kegg gene sets with some meta-data. You need to > process the right gene set data out before calling gage: > kg.aga=kegg.gsets("aga") > kegg.gs=kg. aga$kg.sets[kg. aga$sigmet.idx] > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = > samp.idx, compare ="paired") > … > > You can check the details on your gene set data: > names(kg.aga) > lapply(kg.aga, head, 3) > lapplykegg.gs[1:3], head, 3) > > you may also check the function documentation: > ?kegg.gsets > Actually you will find everything in the pacakge tutorials and > documentations for functions you work with like gage or pathview etc: > http://bioconductor.org/packages/release/bioc/html/gage.html > http://bioconductor.org/packages/release/bioc/html/pathview.html > > ---------------------------------------------- > Saul Lozano wrote: > > ​Dear bioconductor members, > > I have completed the "RNA-Seq Data Pathway and Gene-set > Analysis > Workflows" tutorial and have obtained the output png files > ​ for the human example datasets​. > > Having completed > ​the tutorial, I started working with my mosquito > data. > ​I aligned my reads using gmap-gsnap using " > > https://www.vectorbase.org/download/anopheles-gambiae- pestchromosomesagamp4fagz > " > and created the Transcription Database object using its > corresponding > annotation files that can be find at " > > https://www.vectorbase.org/download/anopheles-gambiae- pestbasefeaturesagamp41gff3gz > " > ​ ​ > I have completed the pre-processing and normalization, > the list of > normalized reads per gene looks like this > > ​>cnts.norm​ > > ... > AGAP009385 1.729138e+01 2.222712e+03 > 8.552872e+02 > AGAP009386 3.094247e+01 1.681387e+02 > 7.808745e+01 > AGAP009387 1.274102e+02 6.561511e+01 > 7.441275e+01 > AGAP009388 1.695465e+03 3.033674e+03 > 1.940243e+03 > AGAP009389 1.071155e+03 1.202602e+03 > 9.646097e+02 > AGAP009390 0.000000e+00 2.050472e+00 > 1.837352e+00 > AGAP009391 0.000000e+00 0.000000e+00 > 0.000000e+00 > AGAP009392 0.000000e+00 0.000000e+00 > 0.000000e+00 > AGAP009393 1.820145e+00 0.000000e+00 > 0.000000e+00 > AGAP009394 0.000000e+00 2.050472e+00 > 0.000000e+00 > ... > > ​the structure of the object is: > > > str(cnts.norm) > num [1:12489, 1:12] 1857 3076 1390 4953 5027 ... > - attr(*, "dimnames")=List of 2 > ..$ : chr [1:12489] "AGAP000002" "AGAP000005" > "AGAP000007" "AGAP000008" > ... > ..$ : chr [1:12] "library10.sorted.bam" > "library11.sorted.bam" > "library12.sorted.bam" "library1.sorted.bam" ... > > I load the kegg pathway database using kegg.gsets like this > > >kegg.gs <- kegg.gsets(species = "aga", id.type = > "kegg") > > but the accession number are different in the Transcription > database and > the kegg. > > > kegg.gs > ... > $kg.sets$`aga04745 Phototransduction - fly` > [1] "AgaP_AGAP000028" "AgaP_AGAP000348" "AgaP_AGAP000651" > "AgaP_AGAP000945" > [5] "AgaP_AGAP001506" "AgaP_AGAP001936" "AgaP_AGAP002145" > "AgaP_AGAP005079" > [9] "AgaP_AGAP005095" "AgaP_AGAP006263" "AgaP_AGAP006475" > "AgaP_AGAP008435" > [13] "AgaP_AGAP009115" "AgaP_AGAP009730" "AgaP_AGAP009953" > "AgaP_AGAP010630" > [17] "AgaP_AGAP010957" "AgaP_AGAP011099" "AgaP_AGAP011414" > "AgaP_AGAP011516" > [21] "AgaP_AGAP012026" "AgaP_AGAP012252" > ... > The gage tutorial mentions that the names of the genes in > the counts and > the kegg object should be the same, so I changed the > rownames of the counts > by adding "Agap_" > > > x <- data.frame(rownames(cnts.norm), > stringsAsFactors=FALSE) > > x$new <- paste("AgaP_", x[,1],sep="") > > row.names(cnts.norm) <- x$new > > so the list now looks like this > > ... > AgaP_AGAP009385 1.729138e+01 > 2.222712e+03 8.552872e+02 > AgaP_AGAP009386 3.094247e+01 > 1.681387e+02 7.808745e+01 > AgaP_AGAP009387 1.274102e+02 > 6.561511e+01 7.441275e+01 > AgaP_AGAP009388 1.695465e+03 > 3.033674e+03 1.940243e+03 > AgaP_AGAP009389 1.071155e+03 > 1.202602e+03 9.646097e+02 > AgaP_AGAP009390 0.000000e+00 > 2.050472e+00 1.837352e+00 > AgaP_AGAP009391 0.000000e+00 > 0.000000e+00 0.000000e+00 > AgaP_AGAP009392 0.000000e+00 > 0.000000e+00 0.000000e+00 > AgaP_AGAP009393 1.820145e+00 > 0.000000e+00 0.000000e+00 > AgaP_AGAP009394 0.000000e+00 > 2.050472e+00 0.000000e+00 > ... > > In theory everything is set for the gage analysis like this > > > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref > = ref.idx, samp = > samp.idx, compare ="paired") > > but I get > > > cnts.kegg.p > $greater > p.geomean stat.mean p.val q.val set.size > library2.sorted.bam > kg.sets NA NaN NA NA > 2 NA > sigmet.idx NA NaN NA NA > 0 NA > sig.idx NA NaN NA NA > 0 NA > met.idx NA NaN NA NA > 0 NA > dise.idx NA NaN NA NA > 0 NA > library5.sorted.bam > kg.sets NA > sigmet.idx NA > sig.idx NA > met.idx NA > dise.idx NA > > $less > p.geomean stat.mean p.val q.val set.size > library2.sorted.bam > kg.sets NA NaN NA NA > 2 NA > sigmet.idx NA NaN NA NA > 0 NA > sig.idx NA NaN NA NA > 0 NA > met.idx NA NaN NA NA > 0 NA > dise.idx NA NaN NA NA > 0 NA > library5.sorted.bam > kg.sets NA > sigmet.idx NA > sig.idx NA > met.idx NA > dise.idx NA > > $stats > stat.mean library2.sorted.bam > library5.sorted.bam > kg.sets NaN NA > NA > sigmet.idx NaN NA > NA > sig.idx NaN NA > NA > met.idx NaN NA > NA > dise.idx NaN NA > NA > > My questions are, how do I debug this problem? is the gene > name (accession > number) change correct? Are "NaN"s the result of zeros in > the datasets? any > help will be greatly appreciated. > > Best regards, Saul > > [[alternative HTML version deleted]] > > > [[alternative HTML version deleted]]
ADD COMMENTlink written 3.4 years ago by Saul Lozano20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 143 users visited in the last hour