An. Gambie RNA-Seq differential expression using gage and pathview
0
0
Entering edit mode
Saul Lozano ▴ 20
@saul-lozano-6631
Last seen 9.6 years ago
​Dear bioconductor members, I have completed the "RNA-Seq Data Pathway and Gene-set Analysis Workflows" tutorial and have obtained the output png files ​ for the human example datasets​. Having completed ​the tutorial, I started working with my mosquito data. ​I aligned my reads using gmap-gsnap using " https://www.vectorbase.org/download/anopheles-gambiae- pestchromosomesagamp4fagz" and created the Transcription Database object using its corresponding annotation files that can be find at " https://www.vectorbase.org/download/anopheles-gambiae- pestbasefeaturesagamp41gff3gz " ​ ​ I have completed the pre-processing and normalization, the list of normalized reads per gene looks like this ​>cnts.norm​ ... AGAP009385 1.729138e+01 2.222712e+03 8.552872e+02 AGAP009386 3.094247e+01 1.681387e+02 7.808745e+01 AGAP009387 1.274102e+02 6.561511e+01 7.441275e+01 AGAP009388 1.695465e+03 3.033674e+03 1.940243e+03 AGAP009389 1.071155e+03 1.202602e+03 9.646097e+02 AGAP009390 0.000000e+00 2.050472e+00 1.837352e+00 AGAP009391 0.000000e+00 0.000000e+00 0.000000e+00 AGAP009392 0.000000e+00 0.000000e+00 0.000000e+00 AGAP009393 1.820145e+00 0.000000e+00 0.000000e+00 AGAP009394 0.000000e+00 2.050472e+00 0.000000e+00 ... ​the structure of the object is: > str(cnts.norm) num [1:12489, 1:12] 1857 3076 1390 4953 5027 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:12489] "AGAP000002" "AGAP000005" "AGAP000007" "AGAP000008" ... ..$ : chr [1:12] "library10.sorted.bam" "library11.sorted.bam" "library12.sorted.bam" "library1.sorted.bam" ... I load the kegg pathway database using kegg.gsets like this >kegg.gs <- kegg.gsets(species = "aga", id.type = "kegg") but the accession number are different in the Transcription database and the kegg. > kegg.gs ... $kg.sets$`aga04745 Phototransduction - fly` [1] "AgaP_AGAP000028" "AgaP_AGAP000348" "AgaP_AGAP000651" "AgaP_AGAP000945" [5] "AgaP_AGAP001506" "AgaP_AGAP001936" "AgaP_AGAP002145" "AgaP_AGAP005079" [9] "AgaP_AGAP005095" "AgaP_AGAP006263" "AgaP_AGAP006475" "AgaP_AGAP008435" [13] "AgaP_AGAP009115" "AgaP_AGAP009730" "AgaP_AGAP009953" "AgaP_AGAP010630" [17] "AgaP_AGAP010957" "AgaP_AGAP011099" "AgaP_AGAP011414" "AgaP_AGAP011516" [21] "AgaP_AGAP012026" "AgaP_AGAP012252" ... The gage tutorial mentions that the names of the genes in the counts and the kegg object should be the same, so I changed the rownames of the counts by adding "Agap_" > x <- data.frame(rownames(cnts.norm), stringsAsFactors=FALSE) > x$new <- paste("AgaP_", x[,1],sep="") > row.names(cnts.norm) <- x$new so the list now looks like this ... AgaP_AGAP009385 1.729138e+01 2.222712e+03 8.552872e+02 AgaP_AGAP009386 3.094247e+01 1.681387e+02 7.808745e+01 AgaP_AGAP009387 1.274102e+02 6.561511e+01 7.441275e+01 AgaP_AGAP009388 1.695465e+03 3.033674e+03 1.940243e+03 AgaP_AGAP009389 1.071155e+03 1.202602e+03 9.646097e+02 AgaP_AGAP009390 0.000000e+00 2.050472e+00 1.837352e+00 AgaP_AGAP009391 0.000000e+00 0.000000e+00 0.000000e+00 AgaP_AGAP009392 0.000000e+00 0.000000e+00 0.000000e+00 AgaP_AGAP009393 1.820145e+00 0.000000e+00 0.000000e+00 AgaP_AGAP009394 0.000000e+00 2.050472e+00 0.000000e+00 ... In theory everything is set for the gage analysis like this > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = samp.idx, compare ="paired") but I get > cnts.kegg.p $greater p.geomean stat.mean p.val q.val set.size library2.sorted.bam kg.sets NA NaN NA NA 2 NA sigmet.idx NA NaN NA NA 0 NA sig.idx NA NaN NA NA 0 NA met.idx NA NaN NA NA 0 NA dise.idx NA NaN NA NA 0 NA library5.sorted.bam kg.sets NA sigmet.idx NA sig.idx NA met.idx NA dise.idx NA $less p.geomean stat.mean p.val q.val set.size library2.sorted.bam kg.sets NA NaN NA NA 2 NA sigmet.idx NA NaN NA NA 0 NA sig.idx NA NaN NA NA 0 NA met.idx NA NaN NA NA 0 NA dise.idx NA NaN NA NA 0 NA library5.sorted.bam kg.sets NA sigmet.idx NA sig.idx NA met.idx NA dise.idx NA $stats stat.mean library2.sorted.bam library5.sorted.bam kg.sets NaN NA NA sigmet.idx NaN NA NA sig.idx NaN NA NA met.idx NaN NA NA dise.idx NaN NA NA My questions are, how do I debug this problem? is the gene name (accession number) change correct? Are "NaN"s the result of zeros in the datasets? any help will be greatly appreciated. Best regards, Saul [[alternative HTML version deleted]]
Transcription Normalization gage Transcription Normalization gage • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6