Question: Using goseq on non-model organism - how to define genome?
gravatar for Jon Bråte
2.1 years ago by
Jon Bråte130
Jon Bråte130 wrote:


I have a few questions about using goseq on a non-model organism:

I have a set of DE-genes identified by DESeq2 called genes,

a vector of gene lengths called lengthData created like this:

txdb = makeTxDbFromGFF("../merged_fixed.gtf", format = ("gtf"))

And I have a reference set of genes called backM which is a list of gene names

I am at this point of the goseq manual:


And my question is how do I refer to my non-model genome and what should I provide as id?

Thanks, Jon

ADD COMMENTlink modified 2.1 years ago by Gordon Smyth32k • written 2.1 years ago by Jon Bråte130

Is your organism supported by Annotation Hub? Try:

hub = AnnotationHub()
query(hub, c("OrgDb"))
ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Gordon Smyth32k

No it's not, but thanks for the Annotation Hub tip! I haven't used if before.

ADD REPLYlink written 2.0 years ago by Jon Bråte130
gravatar for Gordon Smyth
2.1 years ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

Initially I said "Obviously you cannot do a GO analysis for a species for which no GO database exists".

However later Steve Lianglou pointed out that you can do a goseq analysis of any species if you provide your own GO mapping. I was not familiar with this part of the goseq package, so you should follow Steve's suggestions below.

ADD COMMENTlink modified 24 months ago • written 2.1 years ago by Gordon Smyth32k

Ok, I see. So it is not possible to use goseq on organisms not in supporedGenomes()? From the manual I got the impression that if I supplied gene length information and the GO-mapping manually I could still run it?

ADD REPLYlink written 2.1 years ago by Jon Bråte130

You can. You will have to provide a mapping of your gene id's to go (or whatever) id's wired up in a data.frame, too. It's doable

ADD REPLYlink written 2.1 years ago by Steve Lianoglou12k

Thanks! I have created such a mapping with Blast2GO. But I guess this is used in the goseq function?

For nullp() I just skipped providing "genome" and "id" and entered the lengthData as and it seems to work. I did actually read the nullp documentation before posting the question, but it was not so obvious to me that genome and id could be skipped.

ADD REPLYlink written 2.1 years ago by Jon Bråte130

Now that you have the output from nullp and the list of differentially expressed genes, you just need to create the gene2cat object for a call to goseq. The documentation in ?goseq describes the format reasonably well. It is of the format that is returned from the getgo function as well. You can look at the help in ?getgo (and run the example code there) to demystify that a bit further, if necessary.

ADD REPLYlink written 2.1 years ago by Steve Lianoglou12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 426 users visited in the last hour