How to use goana function with zebrafish genome from org.Dr.eg.db
1
0
Entering edit mode
Eric • 0
@4f3a926f
Last seen 3 months ago
United States

I am very new to GO analysis in R and am attempting to use the goana function from the limma package with the zebrafish reference genome from org.Dr.eg.db. I've read through quite a bit of documentation, but have not been successful in implementing the code. I am using a DGELRT object for the analysis and initially received an error of "Error in goana.default(de = DEGenes, universe = universe, ...) : No annotated genes found in universe"

Is there a reference for implementing a species with a corresponding organism package in goana? I have been following this vignette for reference: https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

Thank you!

library(limma)
library(org.Dr.eg.db)

fit <- glmQLFit(dgeObj1, design)
qlf <- glmQLFTest(fit, coef=2:3)

qlf <- glmQLFTest(fit, coef=2)
go <- goana(qlf, species="Dr") 

sessionInfo( )
R version 4.4.3 (2025-02-28 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] org.Dr.eg.db_3.20.0  AnnotationDbi_1.68.0 IRanges_2.40.1       S4Vectors_0.44.0     Biobase_2.66.0      
[6] BiocGenerics_0.52.0  edgeR_4.4.2          limma_3.62.2
org.Dr.eg.db zebrafish limma • 1.6k views
ADD COMMENT
0
Entering edit mode

How did you create the object dgeObj1?

Please show the output from topTags(qlf) so that we can see what gene annotation your object contains.

ADD REPLY
0
Entering edit mode

Sorry about that, completely missed that on my end:

For dgeObj1 (some of this is irrelevant for this analysis, but wanted to include just in case):

#starting from fastq file list
fastq.files <- list.files(pattern = ".fastq.gz$", full.names = TRUE)
fastq.files

#build index
buildindex(basename="danRer11",reference="danRer11.fa.gz")

#align
align.stat <- align(index = "./danRer11", readfile1 = fastq.files)

bam.files <- list.files(pattern = ".BAM$", full.names = TRUE)
bam.files

fc <- featureCounts(bam.files)
counts <- as.data.frame(fc$counts)
myCPM <- cpm(counts)
thresh <- myCPM > 0.5
keep <- rowSums(thresh) >= 2
counts.keep <- counts[keep,]
dgeObj <- DGEList(counts.keep)
dgeObj <- calcNormFactors(dgeObj)
dgeObj1 <- estimateTagwiseDisp(dgeObj1)

group <- factor(c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10))
design <- model.matrix(~group)
dgeObj1 <- estimateDisp(dgeObj1, design)

The output of topTags(qlf) is imbedded below: enter image description here

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 15 hours ago
WEHI, Melbourne, Australia

The problem isn't with goana() but with how you have computed gene counts in the first place. When you assign aligned reads to genes, the counting program (featureCounts in this case) needs to know where the genes are on the genome. But you haven't input any gene annotation for zebra fish, so featureCounts has used its default annotation, which is for mouse. So you have computed read counts for mouse genes, but from a reads aligned to the zebra fish genome, which of course doesn't make any sense at all.

All the featureCounts() examples in the edgeR User's Guide are explicit about which gene annotation is being used.

featureCounts would have reported to you a very low read assignment rate, which would alert you to a problem.

ADD COMMENT
0
Entering edit mode

Thank you, that fixed the issue. I must have missed that when going back and debugging.

ADD REPLY

Login before adding your answer.

Traffic: 552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6