I'm using edgeR
to perform a a RNA-seq differential expression analysis with S. cerevisiae samples and then trying to use goana
. My genes ids are from ensembl so I added a column with entrezgene ids to the DGELRT
object, which is coming from the glmQLFTest
function. I'm not sure where in the DGLRT object I have to add the column with entrezgene ids, in the manual says:
...or the name of the column of de$genes containing the Gene IDs.
But I cannot find which element in the DGELRT
object is de$genes
, so I added to de$table
and I'm passing that to goana
. I also tried to add the entregene ids to dgelist_qltest$geneid
but goana
fails in the same way
> class(dgelist_qltest)
[1] "DGELRT"
attr(,"package")
[1] "edgeR"
> goana.DGELRT(de=dgelist_qltest, geneid=dgelist_qltest$table$entrezgene_id)
Error in goana.default(de = DEGenes, universe = universe, ...) :
No annotated genes found in universe
> head(dgelist_qltest$table)
logFC logCPM F PValue entrezgene_id
YAL001C 0.2240038 6.943048 14.1162483 0.0022238429 851262
YAL002W 0.2994297 6.364025 18.9983099 0.0006987048 851261
YAL003W -0.3346512 10.272637 10.8792050 0.0054613104 851260
YAL004W 0.1465868 7.660092 0.8873887 0.3657633848 NA
YAL005C 0.2767830 11.574323 4.7249463 0.0479133341 851259
YAL007C -0.2657464 7.172642 8.8780937 0.0102093078 851226
#When I pass this to goana, fails with the same error:
> head(dgelist_qltest$geneid)
[1] 851262 851261 851260 NA 851259 851226
Related to this, when I use goana
with (or without) the geneid
parameter and species=...
I get this error:
> goana(de=dgelist_qltest, geneid=dgelist_qltest$table$entrezgene_id, species='Sc')
Error in goana.default(de = DEGenes, universe = universe, ...) :
org.Sc.eg.db package required but not not installed (or can't be loaded)
Where the name of the db isn't found because in the case of S. cerevisiae is org.Sc.sgd.db
, which does have "sgd" instead of the "eg" in the name.
I should note that there appears to be a 1-1 correspondence between SGD and Gene IDs:
But if you didn't get the message that there is a 1:1 mapping, you would need to account for that, because you would end up with more rows in your annotation data.frame than in your counts, and the annotation wouldn't match up. How one does that is perhaps beyond the scope of this thread,
And never mind all that. You won't be able to use
goana
; as the help page says, you can look at?alias2Symbol
to get all the organisms that you can use:Which doesn't include org.Sc.sgd.db. You should switch to
goseq
, which does work with Saccharomyces.Thank you for your answers, I did use
goseq
.That's a point where some of my confusion comes, my DGLRT object doesn't have a
genes
item and from the manual I don't get that I have to create it. Is It supposed to have agenes
item?The two last items were created by me.
Yes. See my first answer. As an example, cribbed from
?DGEList
I see, I never use
genes=
in this step:dglst <- DGEList(y, genes = annot)
What happens is that I'm coming from
salmon
transcript counts -->tximport
gene level -->edgeR
. So when I go from transcripts level to gene level, I use the direct translation from SGD (same as ensembl) transcripts to SGD genes. Of course in that step I also can map those to entrez. Anyways, I'll modify my workflow to include thegenes
parameter. Thanks for your help.The
genes
component is documented as being optional, see?"DGEList-class"
or?"DGELRT-class"
. You can include thegenes
data.frame when you create the DGEList object withDGEList()
or you can add it later on at any time by:or
or similar. The only requirement is that
genes
has the same number of rows ascounts
. We didn't specially document the ability to add thegenes
component on the fly because it's just standard base R programming.What you can't do it is to add undocumentated components to an edgeR object and expect edgeR to take any notice of them.