Question: GOSeq with unsupported organism (Arabidopsis) and retrieving gene IDs from enriched GO categories
5.2 years ago by
Dale Richardson30 wrote:
Hi All, I'm currently working on a differential gene expression analysis and I've used GOSeq to find enriched GO categories, just like what is mentioned here (https://stat.ethz.ch/pipermail/bioconductor/attachments/20110308/92b2 7df4/attachment.pl ), except I am using a non-supported organism (Arabidopsis). I've come to the exact point in the analysis as Fernando has in the above link, where I would like to extract all gene IDs associated with the enriched GO terms in my DE analysis. My question is, how can I do this with a non-supported organism? For a supported organism, the process looks to be straight forward.. but for an unsupported genome and for a newbie in R, the process isn't so easy.. This is some of the code that got me to where I am now. #calculate pwf function pwf = nullp(genes,bias.data=overlapLengths) tairgo <- read.table("ATH_GO_GOSLIM.txt", header=F, sep="\t", fill=T) #read in GO Categories File GO.wall <- goseq(pwf, gene2cat=tairgo[,c(1,6)]) # get ID and GO columns only from tairgo GO.samp <- goseq(pwf, gene2cat=tairgo[,c(1,6)], method="Sampling",repcnt=1000) enriched.GO = GO.wall$category[p.adjust(GO.wall$over_represented_pvalue, method = "BH") < 0.05] enriched.sampgo = GO.samp$category[p.adjust(GO.wall$over_represented_pvalue, method = "BH") < 0.05] What I've been thinking of doing is looping through my enriched GO terms vector and finding all gene IDs that have matching GO terms in "tairgo". However, is there a better way to do this using one of the functions built into GOSeq? Thanks so much for your valuable input!! -- Dale Richardson, Ph.D. Laboratory of Plant Molecular Biology Instituto Gulbenkian de Ci?ncia Rua da Quinta Grande, 6 2780-156 Oeiras Portugal http://www.igc.gulbenkian.pt Tel: +351 214 464 647
