An. Gambie RNA-Seq differential expression using gage and pathview
1
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 10 months ago
United States
You had some problem with your kegg gene set data. What you did: kegg.gs <- kegg.gsets(species = "aga", id.type = "kegg") cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = samp.idx, compare ="paired") kegg.gsets produces kegg gene sets with some meta-data. You need to process the right gene set data out before calling gage: kg.aga=kegg.gsets("aga") kegg.gs=kg. aga$kg.sets[kg. aga$sigmet.idx] cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = samp.idx, compare ="paired") ? You can check the details on your gene set data: names(kg.aga) lapply(kg.aga, head, 3) lapplykegg.gs[1:3], head, 3) you may also check the function documentation: ?kegg.gsets Actually you will find everything in the pacakge tutorials and documentations for functions you work with like gage or pathview etc: http://bioconductor.org/packages/release/bioc/html/gage.html http://bioconductor.org/packages/release/bioc/html/pathview.html ---------------------------------------------- Saul Lozano wrote: ???Dear bioconductor members, ? I have completed the "RNA-Seq Data Pathway and Gene-set Analysis Workflows" tutorial and have obtained the output png files ??? for the human example datasets???. Having completed ???the tutorial, I started working with my mosquito data. ???I aligned my reads using gmap-gsnap using " https://www.vectorbase.org/download/anopheles-gambiae- pestchromosomesagamp4fagz" and created the Transcription Database object using its corresponding annotation files that can be find at " https://www.vectorbase.org/download/anopheles-gambiae- pestbasefeaturesagamp41gff3gz " ??? ??? ? I have completed the pre-processing and normalization, the list of normalized reads per gene looks like this ???>cnts.norm??? ... AGAP009385? ? ? ? 1.729138e+01? ? ? ? 2.222712e+03? ? ? ? 8.552872e+02 AGAP009386? ? ? ? 3.094247e+01? ? ? ? 1.681387e+02? ? ? ? 7.808745e+01 AGAP009387? ? ? ? 1.274102e+02? ? ? ? 6.561511e+01? ? ? ? 7.441275e+01 AGAP009388? ? ? ? 1.695465e+03? ? ? ? 3.033674e+03? ? ? ? 1.940243e+03 AGAP009389? ? ? ? 1.071155e+03? ? ? ? 1.202602e+03? ? ? ? 9.646097e+02 AGAP009390? ? ? ? 0.000000e+00? ? ? ? 2.050472e+00? ? ? ? 1.837352e+00 AGAP009391? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AGAP009392? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AGAP009393? ? ? ? 1.820145e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AGAP009394? ? ? ? 0.000000e+00? ? ? ? 2.050472e+00? ? ? ? 0.000000e+00 ... ???the structure of the object is: > str(cnts.norm) num [1:12489, 1:12] 1857 3076 1390 4953 5027 ... - attr(*, "dimnames")=List of 2 ? ..$ : chr [1:12489] "AGAP000002" "AGAP000005" "AGAP000007" "AGAP000008" ... ? ..$ : chr [1:12] "library10.sorted.bam" "library11.sorted.bam" "library12.sorted.bam" "library1.sorted.bam" ... I load the kegg pathway database using kegg.gsets like this >kegg.gs <- kegg.gsets(species = "aga", id.type = "kegg") but the accession number are different in the Transcription database and the kegg. > kegg.gs ... $kg.sets$`aga04745 Phototransduction - fly` [1] "AgaP_AGAP000028" "AgaP_AGAP000348" "AgaP_AGAP000651" "AgaP_AGAP000945" [5] "AgaP_AGAP001506" "AgaP_AGAP001936" "AgaP_AGAP002145" "AgaP_AGAP005079" [9] "AgaP_AGAP005095" "AgaP_AGAP006263" "AgaP_AGAP006475" "AgaP_AGAP008435" [13] "AgaP_AGAP009115" "AgaP_AGAP009730" "AgaP_AGAP009953" "AgaP_AGAP010630" [17] "AgaP_AGAP010957" "AgaP_AGAP011099" "AgaP_AGAP011414" "AgaP_AGAP011516" [21] "AgaP_AGAP012026" "AgaP_AGAP012252" ... The gage tutorial mentions that the names of the genes in the counts and the kegg object should be the same, so I changed the rownames of the counts by adding "Agap_" > x <- data.frame(rownames(cnts.norm), stringsAsFactors=FALSE) > x$new <- paste("AgaP_", x[,1],sep="") > row.names(cnts.norm) <- x$new so the list now looks like this ... AgaP_AGAP009385? ? ? ? 1.729138e+01? ? ? ? 2.222712e+03? ? ? ? 8.552872e+02 AgaP_AGAP009386? ? ? ? 3.094247e+01? ? ? ? 1.681387e+02? ? ? ? 7.808745e+01 AgaP_AGAP009387? ? ? ? 1.274102e+02? ? ? ? 6.561511e+01? ? ? ? 7.441275e+01 AgaP_AGAP009388? ? ? ? 1.695465e+03? ? ? ? 3.033674e+03? ? ? ? 1.940243e+03 AgaP_AGAP009389? ? ? ? 1.071155e+03? ? ? ? 1.202602e+03? ? ? ? 9.646097e+02 AgaP_AGAP009390? ? ? ? 0.000000e+00? ? ? ? 2.050472e+00? ? ? ? 1.837352e+00 AgaP_AGAP009391? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AgaP_AGAP009392? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AgaP_AGAP009393? ? ? ? 1.820145e+00? ? ? ? 0.000000e+00? ? ? ? 0.000000e+00 AgaP_AGAP009394? ? ? ? 0.000000e+00? ? ? ? 2.050472e+00? ? ? ? 0.000000e+00 ... In theory everything is set for the gage analysis like this > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = samp.idx, compare ="paired") but I get > cnts.kegg.p $greater ? ? ? ? ???p.geomean stat.mean p.val q.val set.size library2.sorted.bam kg.sets? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 2? ? ? ? ? ? ? ? ? NA sigmet.idx? ? ? ? NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA met.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA dise.idx? ? ? ? ? NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA ? ? ? ? ???library5.sorted.bam kg.sets? ? ? ? ? ? ? ? ? ???NA sigmet.idx? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ? ? ? ? ? ???NA met.idx? ? ? ? ? ? ? ? ? ???NA dise.idx? ? ? ? ? ? ? ? ? ? NA $less ? ? ? ? ???p.geomean stat.mean p.val q.val set.size library2.sorted.bam kg.sets? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 2? ? ? ? ? ? ? ? ? NA sigmet.idx? ? ? ? NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA met.idx? ? ? ? ???NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA dise.idx? ? ? ? ? NA? ? ???NaN? ? NA? ? NA? ? ? ? 0? ? ? ? ? ? ? ? ? NA ? ? ? ? ???library5.sorted.bam kg.sets? ? ? ? ? ? ? ? ? ???NA sigmet.idx? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ? ? ? ? ? ???NA met.idx? ? ? ? ? ? ? ? ? ???NA dise.idx? ? ? ? ? ? ? ? ? ? NA $stats ? ? ? ? ???stat.mean library2.sorted.bam library5.sorted.bam kg.sets? ? ? ? ? NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA sigmet.idx? ? ???NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA sig.idx? ? ? ? ? NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA met.idx? ? ? ? ? NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA dise.idx? ? ? ???NaN? ? ? ? ? ? ? ? ? NA? ? ? ? ? ? ? ? ? NA My questions are, how do I debug this problem? is the gene name (accession number) change correct? Are "NaN"s the result of zeros in the datasets? any help will be greatly appreciated. Best regards, Saul ??? [[alternative HTML version deleted]]
Transcription Annotation Normalization gage pathview Transcription Annotation Normalization • 1.7k views
ADD COMMENT
0
Entering edit mode
Saul Lozano ▴ 20
@saul-lozano-6631
Last seen 9.6 years ago
Dear Luo Weijun, Thank you, the data transformation fixed my problem. I been poring over manuals, example code and forums but the solution escaped me. Thank you again, On Thu, Jul 3, 2014 at 8:14 AM, Luo Weijun <luo_weijun@yahoo.com> wrote: > You had some problem with your kegg gene set data. What you did: > kegg.gs <- kegg.gsets(species = "aga", id.type = "kegg") > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = > samp.idx, compare ="paired") > > kegg.gsets produces kegg gene sets with some meta-data. You need to > process the right gene set data out before calling gage: > kg.aga=kegg.gsets("aga") > kegg.gs=kg. aga$kg.sets[kg. aga$sigmet.idx] > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = > samp.idx, compare ="paired") > … > > You can check the details on your gene set data: > names(kg.aga) > lapply(kg.aga, head, 3) > lapplykegg.gs[1:3], head, 3) > > you may also check the function documentation: > ?kegg.gsets > Actually you will find everything in the pacakge tutorials and > documentations for functions you work with like gage or pathview etc: > http://bioconductor.org/packages/release/bioc/html/gage.html > http://bioconductor.org/packages/release/bioc/html/pathview.html > > ---------------------------------------------- > Saul Lozano wrote: > > ​Dear bioconductor members, > > I have completed the "RNA-Seq Data Pathway and Gene-set > Analysis > Workflows" tutorial and have obtained the output png files > ​ for the human example datasets​. > > Having completed > ​the tutorial, I started working with my mosquito > data. > ​I aligned my reads using gmap-gsnap using " > > https://www.vectorbase.org/download/anopheles-gambiae- pestchromosomesagamp4fagz > " > and created the Transcription Database object using its > corresponding > annotation files that can be find at " > > https://www.vectorbase.org/download/anopheles-gambiae- pestbasefeaturesagamp41gff3gz > " > ​ ​ > I have completed the pre-processing and normalization, > the list of > normalized reads per gene looks like this > > ​>cnts.norm​ > > ... > AGAP009385 1.729138e+01 2.222712e+03 > 8.552872e+02 > AGAP009386 3.094247e+01 1.681387e+02 > 7.808745e+01 > AGAP009387 1.274102e+02 6.561511e+01 > 7.441275e+01 > AGAP009388 1.695465e+03 3.033674e+03 > 1.940243e+03 > AGAP009389 1.071155e+03 1.202602e+03 > 9.646097e+02 > AGAP009390 0.000000e+00 2.050472e+00 > 1.837352e+00 > AGAP009391 0.000000e+00 0.000000e+00 > 0.000000e+00 > AGAP009392 0.000000e+00 0.000000e+00 > 0.000000e+00 > AGAP009393 1.820145e+00 0.000000e+00 > 0.000000e+00 > AGAP009394 0.000000e+00 2.050472e+00 > 0.000000e+00 > ... > > ​the structure of the object is: > > > str(cnts.norm) > num [1:12489, 1:12] 1857 3076 1390 4953 5027 ... > - attr(*, "dimnames")=List of 2 > ..$ : chr [1:12489] "AGAP000002" "AGAP000005" > "AGAP000007" "AGAP000008" > ... > ..$ : chr [1:12] "library10.sorted.bam" > "library11.sorted.bam" > "library12.sorted.bam" "library1.sorted.bam" ... > > I load the kegg pathway database using kegg.gsets like this > > >kegg.gs <- kegg.gsets(species = "aga", id.type = > "kegg") > > but the accession number are different in the Transcription > database and > the kegg. > > > kegg.gs > ... > $kg.sets$`aga04745 Phototransduction - fly` > [1] "AgaP_AGAP000028" "AgaP_AGAP000348" "AgaP_AGAP000651" > "AgaP_AGAP000945" > [5] "AgaP_AGAP001506" "AgaP_AGAP001936" "AgaP_AGAP002145" > "AgaP_AGAP005079" > [9] "AgaP_AGAP005095" "AgaP_AGAP006263" "AgaP_AGAP006475" > "AgaP_AGAP008435" > [13] "AgaP_AGAP009115" "AgaP_AGAP009730" "AgaP_AGAP009953" > "AgaP_AGAP010630" > [17] "AgaP_AGAP010957" "AgaP_AGAP011099" "AgaP_AGAP011414" > "AgaP_AGAP011516" > [21] "AgaP_AGAP012026" "AgaP_AGAP012252" > ... > The gage tutorial mentions that the names of the genes in > the counts and > the kegg object should be the same, so I changed the > rownames of the counts > by adding "Agap_" > > > x <- data.frame(rownames(cnts.norm), > stringsAsFactors=FALSE) > > x$new <- paste("AgaP_", x[,1],sep="") > > row.names(cnts.norm) <- x$new > > so the list now looks like this > > ... > AgaP_AGAP009385 1.729138e+01 > 2.222712e+03 8.552872e+02 > AgaP_AGAP009386 3.094247e+01 > 1.681387e+02 7.808745e+01 > AgaP_AGAP009387 1.274102e+02 > 6.561511e+01 7.441275e+01 > AgaP_AGAP009388 1.695465e+03 > 3.033674e+03 1.940243e+03 > AgaP_AGAP009389 1.071155e+03 > 1.202602e+03 9.646097e+02 > AgaP_AGAP009390 0.000000e+00 > 2.050472e+00 1.837352e+00 > AgaP_AGAP009391 0.000000e+00 > 0.000000e+00 0.000000e+00 > AgaP_AGAP009392 0.000000e+00 > 0.000000e+00 0.000000e+00 > AgaP_AGAP009393 1.820145e+00 > 0.000000e+00 0.000000e+00 > AgaP_AGAP009394 0.000000e+00 > 2.050472e+00 0.000000e+00 > ... > > In theory everything is set for the gage analysis like this > > > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref > = ref.idx, samp = > samp.idx, compare ="paired") > > but I get > > > cnts.kegg.p > $greater > p.geomean stat.mean p.val q.val set.size > library2.sorted.bam > kg.sets NA NaN NA NA > 2 NA > sigmet.idx NA NaN NA NA > 0 NA > sig.idx NA NaN NA NA > 0 NA > met.idx NA NaN NA NA > 0 NA > dise.idx NA NaN NA NA > 0 NA > library5.sorted.bam > kg.sets NA > sigmet.idx NA > sig.idx NA > met.idx NA > dise.idx NA > > $less > p.geomean stat.mean p.val q.val set.size > library2.sorted.bam > kg.sets NA NaN NA NA > 2 NA > sigmet.idx NA NaN NA NA > 0 NA > sig.idx NA NaN NA NA > 0 NA > met.idx NA NaN NA NA > 0 NA > dise.idx NA NaN NA NA > 0 NA > library5.sorted.bam > kg.sets NA > sigmet.idx NA > sig.idx NA > met.idx NA > dise.idx NA > > $stats > stat.mean library2.sorted.bam > library5.sorted.bam > kg.sets NaN NA > NA > sigmet.idx NaN NA > NA > sig.idx NaN NA > NA > met.idx NaN NA > NA > dise.idx NaN NA > NA > > My questions are, how do I debug this problem? is the gene > name (accession > number) change correct? Are "NaN"s the result of zeros in > the datasets? any > help will be greatly appreciated. > > Best regards, Saul > > [[alternative HTML version deleted]] > > > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 952 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6