Search
Question: Using goseq on non-model organism - how to define genome?
0
gravatar for Jon Bråte
16 months ago by
Jon Bråte130
Norway
Jon Bråte130 wrote:

Hi,

I have a few questions about using goseq on a non-model organism:

I have a set of DE-genes identified by DESeq2 called genes,

a vector of gene lengths called lengthData created like this:

txdb = makeTxDbFromGFF("../merged_fixed.gtf", format = ("gtf"))
txsByGene=transcriptsBy(txdb,"gene")
lengthData=median(width(txsByGene))

And I have a reference set of genes called backM which is a list of gene names

I am at this point of the goseq manual:

pwf=nullp(genes,"hg19","ensGene")

And my question is how do I refer to my non-model genome and what should I provide as id?

Thanks, Jon

ADD COMMENTlink modified 16 months ago by Gordon Smyth30k • written 16 months ago by Jon Bråte130

Is your organism supported by Annotation Hub? Try:

library(AnnotationHub)
hub = AnnotationHub()
query(hub, c("OrgDb"))
ADD REPLYlink modified 15 months ago • written 15 months ago by Gordon Smyth30k

No it's not, but thanks for the Annotation Hub tip! I haven't used if before.

ADD REPLYlink written 15 months ago by Jon Bråte130
0
gravatar for Gordon Smyth
16 months ago by
Gordon Smyth30k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth30k wrote:

Initially I said "Obviously you cannot do a GO analysis for a species for which no GO database exists".

However later Steve Lianglou pointed out that you can do a goseq analysis of any species if you provide your own GO mapping. I was not familiar with this part of the goseq package, so you should follow Steve's suggestions below.

ADD COMMENTlink modified 15 months ago • written 16 months ago by Gordon Smyth30k

Ok, I see. So it is not possible to use goseq on organisms not in supporedGenomes()? From the manual I got the impression that if I supplied gene length information and the GO-mapping manually I could still run it?

ADD REPLYlink written 16 months ago by Jon Bråte130
1

You can. You will have to provide a mapping of your gene id's to go (or whatever) id's wired up in a data.frame, too. It's doable

ADD REPLYlink written 16 months ago by Steve Lianoglou12k

Thanks! I have created such a mapping with Blast2GO. But I guess this is used in the goseq function?

For nullp() I just skipped providing "genome" and "id" and entered the lengthData as bias.data and it seems to work. I did actually read the nullp documentation before posting the question, but it was not so obvious to me that genome and id could be skipped.

ADD REPLYlink written 16 months ago by Jon Bråte130
1

Now that you have the output from nullp and the list of differentially expressed genes, you just need to create the gene2cat object for a call to goseq. The documentation in ?goseq describes the format reasonably well. It is of the format that is returned from the getgo function as well. You can look at the help in ?getgo (and run the example code there) to demystify that a bit further, if necessary.

ADD REPLYlink written 16 months ago by Steve Lianoglou12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 170 users visited in the last hour