Question

How to use goana function with zebrafish genome from org.Dr.eg.db

0

Entering edit mode

Eric • 0

@4f3a926f

Last seen 6 months ago

United States

I am very new to GO analysis in R and am attempting to use the goana function from the limma package with the zebrafish reference genome from org.Dr.eg.db. I've read through quite a bit of documentation, but have not been successful in implementing the code. I am using a DGELRT object for the analysis and initially received an error of "Error in goana.default(de = DEGenes, universe = universe, ...) : No annotated genes found in universe"

Is there a reference for implementing a species with a corresponding organism package in goana? I have been following this vignette for reference: https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

Thank you!

library(limma)
library(org.Dr.eg.db)

fit <- glmQLFit(dgeObj1, design)
qlf <- glmQLFTest(fit, coef=2:3)

qlf <- glmQLFTest(fit, coef=2)
go <- goana(qlf, species="Dr") 

sessionInfo( )
R version 4.4.3 (2025-02-28 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] org.Dr.eg.db_3.20.0  AnnotationDbi_1.68.0 IRanges_2.40.1       S4Vectors_0.44.0     Biobase_2.66.0      
[6] BiocGenerics_0.52.0  edgeR_4.4.2          limma_3.62.2

org.Dr.eg.db zebrafish limma • 1.8k views

ADD COMMENT • link 6 months ago Eric • 0

0

Entering edit mode

How did you create the object dgeObj1?

Please show the output from topTags(qlf) so that we can see what gene annotation your object contains.

ADD REPLY • link 6 months ago Gordon Smyth 53k

0

Entering edit mode

Sorry about that, completely missed that on my end:

For dgeObj1 (some of this is irrelevant for this analysis, but wanted to include just in case):

#starting from fastq file list
fastq.files <- list.files(pattern = ".fastq.gz$", full.names = TRUE)
fastq.files

#build index
buildindex(basename="danRer11",reference="danRer11.fa.gz")

#align
align.stat <- align(index = "./danRer11", readfile1 = fastq.files)

bam.files <- list.files(pattern = ".BAM$", full.names = TRUE)
bam.files

fc <- featureCounts(bam.files)
counts <- as.data.frame(fc$counts)
myCPM <- cpm(counts)
thresh <- myCPM > 0.5
keep <- rowSums(thresh) >= 2
counts.keep <- counts[keep,]
dgeObj <- DGEList(counts.keep)
dgeObj <- calcNormFactors(dgeObj)
dgeObj1 <- estimateTagwiseDisp(dgeObj1)

group <- factor(c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10))
design <- model.matrix(~group)
dgeObj1 <- estimateDisp(dgeObj1, design)

The output of topTags(qlf) is imbedded below: enter image description here

ADD REPLY • link 6 months ago Eric • 0

score 0 · Answer 1 · 2025-06-16

The problem isn't with goana() but with how you have computed gene counts in the first place. When you assign aligned reads to genes, the counting program (featureCounts in this case) needs to know where the genes are on the genome. But you haven't input any gene annotation for zebra fish, so featureCounts has used its default annotation, which is for mouse. So you have computed read counts for mouse genes, but from a reads aligned to the zebra fish genome, which of course doesn't make any sense at all.

All the featureCounts() examples in the edgeR User's Guide are explicit about which gene annotation is being used.

featureCounts would have reported to you a very low read assignment rate, which would alert you to a problem.