Question: goana and "No annotated genes found in universe" even with Entrez gene ids
0
25 days ago by
matrs0
matrs0 wrote:

I'm using edgeR to perform a a RNA-seq differential expression analysis with S. cerevisiae samples and then trying to use goana. My genes ids are from ensembl so I added a column with entrezgene ids to the DGELRT object, which is coming from the glmQLFTest function. I'm not sure where in the DGLRT object I have to add the column with entrezgene ids, in the manual says:

...or the name of the column of de$genes containing the Gene IDs. But I cannot find which element in the DGELRT object is de$genes, so I added to de$table and I'm passing that to goana. I also tried to add the entregene ids to dgelist_qltest$geneid but goana fails in the same way

> class(dgelist_qltest)
[1] "DGELRT"
attr(,"package")
[1] "edgeR"

> goana.DGELRT(de=dgelist_qltest,  geneid=dgelist_qltest$table$entrezgene_id)
Error in goana.default(de = DEGenes, universe = universe, ...) :
No annotated genes found in universe

> head(dgelist_qltest$table) logFC logCPM F PValue entrezgene_id YAL001C 0.2240038 6.943048 14.1162483 0.0022238429 851262 YAL002W 0.2994297 6.364025 18.9983099 0.0006987048 851261 YAL003W -0.3346512 10.272637 10.8792050 0.0054613104 851260 YAL004W 0.1465868 7.660092 0.8873887 0.3657633848 NA YAL005C 0.2767830 11.574323 4.7249463 0.0479133341 851259 YAL007C -0.2657464 7.172642 8.8780937 0.0102093078 851226 #When I pass this to goana, fails with the same error: > head(dgelist_qltest$geneid)
[1] 851262 851261 851260     NA 851259 851226



Related to this, when I use goana with (or without) the geneid parameter and species=... I get this error:

> goana(de=dgelist_qltest,  geneid=dgelist_qltest$table$entrezgene_id, species='Sc')
Error in goana.default(de = DEGenes, universe = universe, ...) :
org.Sc.eg.db package required but not not installed (or can't be loaded)


Where the name of the db isn't found because in the case of S. cerevisiae is org.Sc.sgd.db , which does have "sgd" instead of the "eg" in the name.

edger goana • 128 views
modified 25 days ago by Gordon Smyth38k • written 25 days ago by matrs0
Answer: goana and "No annotated genes found in universe" even with Entrez gene ids
1
25 days ago by
United States
James W. MacDonald50k wrote:

Your DGELRT object dgelist_qltest is a list, with a genes item that you can put things in. In general when you make the DGEList to begin with, you would put all the annotation in at that point. You could do something like

## here I am assuming that your counts were in an object called 'counts'
## with SGD IDs as the row names, which seems to be a reasonable assumption
annot <- select(org.Sc.sgd.db, row.names(counts), "ENTREZID")

dgelist <- DGEList(counts, genes = annot)


And then in your call to goana you would just use the argument geneid = "ENTREZID"

I should note that there appears to be a 1-1 correspondence between SGD and Gene IDs:

> z <- select(org.Sc.sgd.db, keys(org.Sc.sgd.db), "ENTREZID")
'select()' returned 1:1 mapping between keys and columns



But if you didn't get the message that there is a 1:1 mapping, you would need to account for that, because you would end up with more rows in your annotation data.frame than in your counts, and the annotation wouldn't match up. How one does that is perhaps beyond the scope of this thread,

And never mind all that. You won't be able to use goana; as the help page says, you can look at ?alias2Symbol to get all the organisms that you can use:

    'species' can be any character string XX for which an organism
package org.XX.eg.db exists and is installed. The only requirement
of the organism package is that it contains objects
'org.XX.egALIAS2EG' and 'org.XX.egSYMBOL' linking the aliases and
symbols to Entrez Gene Ids. At the time of writing, the following
organism packages are available from Bioconductor 3.6:

Package            Species
org.Ag.eg.db       Anopheles
org.Bt.eg.db       Bovine
org.Ce.eg.db       Worm
org.Cf.eg.db       Canine
org.Dm.eg.db       Fly
org.Dr.eg.db       Zebrafish
org.EcK12.eg.db    E coli strain K12
org.EcSakai.eg.db  E coli strain Sakai
org.Gg.eg.db       Chicken
org.Hs.eg.db       Human
org.Mm.eg.db       Mouse
org.Mmu.eg.db      Rhesus
org.Pt.eg.db       Chimp
org.Rn.eg.db       Rat
org.Ss.eg.db       Pig
org.Xl.eg.db       Xenopus


Which doesn't include org.Sc.sgd.db. You should switch to goseq, which does work with Saccharomyces.

Thank you for your answers, I did use goseq.

That's a point where some of my confusion comes, my DGLRT object doesn't have a genes item and from the manual I don't get that I have to create it. Is It supposed to have a genes item?

> names(dgelist_qltest)
[1] "coefficients"          "fitted.values"         "deviance"              "method"
[5] "unshrunk.coefficients" "df.residual"           "design"                "offset"
[9] "dispersion"            "prior.count"           "AveLogCPM"             "df.residual.zeros"
[13] "df.prior"              "var.post"              "var.prior"             "samples"
[17] "table"                 "comparison"            "df.test"               "df.total"
[21] "entrezgene_id"         "geneid"


The two last items were created by me.

Yes. See my first answer. As an example, cribbed from ?DGEList

>  y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
> library(org.Hs.eg.db)
## pretend like these are human genes
> row.names(y) <- keys(org.Hs.eg.db)[1:nrow(y)]

> annot <- select(org.Hs.eg.db, row.names(y), c("SYMBOL","GENENAME"))
'select()' returned 1:1 mapping between keys and columns
> dglst <- DGEList(y, genes = annot)
> dglst <- calcNormFactors(dglst)
> design <- model.matrix(~gl(2,2))
> dglst <- estimateDisp(dglst, design)
> fit <- glmQLFit(dglst, design)
> tst <- glmQLFTest(fit, 2)
> topTags(tst)
Coefficient:  gl(2, 2)2
ENTREZID  SYMBOL                                         GENENAME
2254     2254    FGF9                       fibroblast growth factor 9
94         94  ACVRL1                   activin A receptor like type 1
2086     2086  ERV3-1 endogenous retrovirus group 3 member 1, envelope
2856     2856   GPR33  G protein-coupled receptor 33 (gene/pseudogene)
772       772    CACD              central areolar choroidal dystrophy
2159     2159     F10                             coagulation factor X
861       861   RUNX1              runt related transcription factor 1
1368     1368     CPM                               carboxypeptidase M
1318     1318 SLC31A2                solute carrier family 31 member 2
1497     1497    CTNS        cystinosin, lysosomal cystine transporter
logFC   logCPM         F      PValue       FDR
2254  6.596329 9.324105 12.942278 0.004170561 0.9908227
94    6.478047 9.235095 12.012533 0.005257713 0.9908227
2086  6.109880 8.971395 11.097805 0.006671991 0.9908227
2856  6.024536 8.912995 10.454189 0.007942445 0.9908227
772  -5.732565 8.721117  9.549125 0.010252893 0.9908227
2159 -5.847485 8.794307  9.467513 0.010498265 0.9908227
861  -5.719758 8.713577  9.169355 0.011456287 0.9908227
1368 -5.617202 8.649418  9.123042 0.011614292 0.9908227
1318 -6.046482 8.926302  8.898173 0.012419446 0.9908227
1497  5.609782 8.645777  8.810166 0.012752585 0.9908227

> z <- goana(tst, "ENTREZID")
No DE genes



I see, I never use genes= in this step:

dglst <- DGEList(y, genes = annot)

What happens is that I'm coming from salmon transcript counts -->tximport gene level -->edgeR. So when I go from transcripts level to gene level, I use the direct translation from SGD (same as ensembl) transcripts to SGD genes. Of course in that step I also can map those to entrez. Anyways, I'll modify my workflow to include the genes parameter. Thanks for your help.

The genes component is documented as being optional, see ?"DGEList-class" or ?"DGELRT-class". You can include the genes data.frame when you create the DGEList object with DGEList() or you can add it later on at any time by:

dge$genes <- data.frame(GeneID=geneid)  or dge_qltest$genes <- data.frame(GeneID=geneid)


or similar. The only requirement is that genes has the same number of rows as counts. We didn't specially document the ability to add the genes component on the fly because it's just standard base R programming.

What you can't do it is to add undocumentated components to an edgeR object and expect edgeR to take any notice of them.

Answer: goana and "No annotated genes found in universe" even with Entrez gene ids
0
25 days ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

What the error messages mean

It is good that you are reading the goana help page. You need to read the help pages ?goana.DGELRT and ?goana in conjunction. I will make some comments to try to help you interpret R help pages like this one.

goana.DGELRT(de=dgelist_qltest,

You never need to use goana.DGELRT() explicitly. Just use goana(dgelist_qltest, and the goana.DGELRT() function will be called automatically because of the class of dgelist_qltest.

No annotated genes found in universe

Well, this is because you tried to do a GO analysis for home sapiens. Look at the usage line on the help page and you will see that the default setting for species is species = Hs. None of the row names in your object are human Entrez Gene Ids, hence the message.

I cannot find which element in the DGELRT object is de$genes The help page tells you that de is a name for the first argument entered to goana and you have written in your own code that de=dgelist_qltest. So de$genes is dgelist_qltest$genes. If dgelist_qltest$genes already exists, then you can add the Entrez Gene Ids by

dgelist_qltest$genes$GeneID <- geneid


or similar. If it doesn't exist, then you can add it by

dgelist_qltest$genes <- data.frame(GeneID=geneid)  This is just base R programming and follows from the fact that dge_qltest is a list. so I added to de$table

Well, that won't work. You have to do what the help pages says.

org.Sc.eg.db package required but not not installed

As James as explained, goana only works for species for which a Bioconductor Entrez Id based organism package exists. goana can't use org.Sc.sgd.db.

Work around

As James has advised, a good solution is to use the goseq package instead of goana for this species.

Alternatively, you can always used kegga instead of goana for non-standard species. In this case you can use:

library("org.Sc.sgd.db")
ORF.GO <- toTable(org.Sc.sgdGO2ALLORFS)
library(GO.db)
GO.Name <- toTable(GOTERM)
GO.Name <- GO.Name[, c("go_id","Term")]
k <- kegga(dgelist_qltest, gene.pathway=ORF.GO, pathway.names=GO.Name)
topKEGG(k)


and the results will be exactly as if goana supported this species. Note that you don't need to convert geneids in this approach.

Thank you very much for your answer, I will use kegga to get a go enrichment analysis too.