Question

An. Gambie RNA-Seq differential expression using gage and pathview

0

Entering edit mode

Saul Lozano ▴ 20

@saul-lozano-6631

Last seen 9.6 years ago

âDear bioconductor members, I have completed the "RNA-Seq Data Pathway and Gene-set Analysis Workflows" tutorial and have obtained the output png files â for the human example datasetsâ. Having completed âthe tutorial, I started working with my mosquito data. âI aligned my reads using gmap-gsnap using " https://www.vectorbase.org/download/anopheles-gambiae- pestchromosomesagamp4fagz" and created the Transcription Database object using its corresponding annotation files that can be find at " https://www.vectorbase.org/download/anopheles-gambiae- pestbasefeaturesagamp41gff3gz " â â I have completed the pre-processing and normalization, the list of normalized reads per gene looks like this â>cnts.normâ ... AGAP009385 1.729138e+01 2.222712e+03 8.552872e+02 AGAP009386 3.094247e+01 1.681387e+02 7.808745e+01 AGAP009387 1.274102e+02 6.561511e+01 7.441275e+01 AGAP009388 1.695465e+03 3.033674e+03 1.940243e+03 AGAP009389 1.071155e+03 1.202602e+03 9.646097e+02 AGAP009390 0.000000e+00 2.050472e+00 1.837352e+00 AGAP009391 0.000000e+00 0.000000e+00 0.000000e+00 AGAP009392 0.000000e+00 0.000000e+00 0.000000e+00 AGAP009393 1.820145e+00 0.000000e+00 0.000000e+00 AGAP009394 0.000000e+00 2.050472e+00 0.000000e+00 ... âthe structure of the object is: > str(cnts.norm) num [1:12489, 1:12] 1857 3076 1390 4953 5027 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:12489] "AGAP000002" "AGAP000005" "AGAP000007" "AGAP000008" ... ..$ : chr [1:12] "library10.sorted.bam" "library11.sorted.bam" "library12.sorted.bam" "library1.sorted.bam" ... I load the kegg pathway database using kegg.gsets like this >kegg.gs <- kegg.gsets(species = "aga", id.type = "kegg") but the accession number are different in the Transcription database and the kegg. > kegg.gs ... $kg.sets$`aga04745 Phototransduction - fly` [1] "AgaP_AGAP000028" "AgaP_AGAP000348" "AgaP_AGAP000651" "AgaP_AGAP000945" [5] "AgaP_AGAP001506" "AgaP_AGAP001936" "AgaP_AGAP002145" "AgaP_AGAP005079" [9] "AgaP_AGAP005095" "AgaP_AGAP006263" "AgaP_AGAP006475" "AgaP_AGAP008435" [13] "AgaP_AGAP009115" "AgaP_AGAP009730" "AgaP_AGAP009953" "AgaP_AGAP010630" [17] "AgaP_AGAP010957" "AgaP_AGAP011099" "AgaP_AGAP011414" "AgaP_AGAP011516" [21] "AgaP_AGAP012026" "AgaP_AGAP012252" ... The gage tutorial mentions that the names of the genes in the counts and the kegg object should be the same, so I changed the rownames of the counts by adding "Agap_" > x <- data.frame(rownames(cnts.norm), stringsAsFactors=FALSE) > x$new <- paste("AgaP_", x[,1],sep="") > row.names(cnts.norm) <- x$new so the list now looks like this ... AgaP_AGAP009385 1.729138e+01 2.222712e+03 8.552872e+02 AgaP_AGAP009386 3.094247e+01 1.681387e+02 7.808745e+01 AgaP_AGAP009387 1.274102e+02 6.561511e+01 7.441275e+01 AgaP_AGAP009388 1.695465e+03 3.033674e+03 1.940243e+03 AgaP_AGAP009389 1.071155e+03 1.202602e+03 9.646097e+02 AgaP_AGAP009390 0.000000e+00 2.050472e+00 1.837352e+00 AgaP_AGAP009391 0.000000e+00 0.000000e+00 0.000000e+00 AgaP_AGAP009392 0.000000e+00 0.000000e+00 0.000000e+00 AgaP_AGAP009393 1.820145e+00 0.000000e+00 0.000000e+00 AgaP_AGAP009394 0.000000e+00 2.050472e+00 0.000000e+00 ... In theory everything is set for the gage analysis like this > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = samp.idx, compare ="paired") but I get > cnts.kegg.p $greater p.geomean stat.mean p.val q.val set.size library2.sorted.bam kg.sets NA NaN NA NA 2 NA sigmet.idx NA NaN NA NA 0 NA sig.idx NA NaN NA NA 0 NA met.idx NA NaN NA NA 0 NA dise.idx NA NaN NA NA 0 NA library5.sorted.bam kg.sets NA sigmet.idx NA sig.idx NA met.idx NA dise.idx NA $less p.geomean stat.mean p.val q.val set.size library2.sorted.bam kg.sets NA NaN NA NA 2 NA sigmet.idx NA NaN NA NA 0 NA sig.idx NA NaN NA NA 0 NA met.idx NA NaN NA NA 0 NA dise.idx NA NaN NA NA 0 NA library5.sorted.bam kg.sets NA sigmet.idx NA sig.idx NA met.idx NA dise.idx NA $stats stat.mean library2.sorted.bam library5.sorted.bam kg.sets NaN NA NA sigmet.idx NaN NA NA sig.idx NaN NA NA met.idx NaN NA NA dise.idx NaN NA NA My questions are, how do I debug this problem? is the gene name (accession number) change correct? Are "NaN"s the result of zeros in the datasets? any help will be greatly appreciated. Best regards, Saul [[alternative HTML version deleted]]

Transcription Normalization gage Transcription Normalization gage • 1.2k views

ADD COMMENT • link 9.8 years ago Saul Lozano ▴ 20