Hi,
I am trying to compare two result lists from GOstats. The reason for doing this is that I did a proteomics experiment in which we established a new sample preparation protocol and I now want to kind of show that with this new method we do not enrich/deplete for certain proteins/protein functions. The obvious way, comparing the protein lists and check the overlap isn`t very telling as there is redundancy in the proteome, i.e. protein X does not neccessarily have the same ID in both sets, and further, both sets contain different amount of entries. So I thought doing a GO enrichment analysis to check whether certain catergories are enriched in one list versus the other would be appropriate.
So now I have the lists of the two sets and I wanted to do a simple statistical test to compare them GO term by GO term. As this is a count based list I wanted to use a GTest (or Chi square) . But as the input lists have different sizes, I would naturally expect less Counts per GO term in the smaller data set (GO.list.1) compared to the larger list (GO.list.2), meaning that the simple GTest:
>GO.list<-merge(GO.list.1, GO.list.2, by=2, all=F) >colnames(GO.list) [1] "GOMFID" "Pvalue.x" "OddsRatio.x" "ExpCount.x" "Count.x" "Size.x" [7] "Term.x" "test.x" "ontology.x" "subset.x" "Pvalue.y" "OddsRatio.y" [13] "ExpCount.y" "Count.y" "Size.y" "Term.y" "test.y" "ontology.y" [19] "subset.y"
>GO.list$pvalue<-sapply(1:nrow(GO.list), function(i) GTest(x=as.numeric(unlist(GO.list[i,c(5,14)]))))$pvalue)
will bias my result as it assumes equal proportions.
As the GOstats result file give you all kind of information (based on the input data) I thought I`d use these informations to calculate my expected proportions. My first thought now was to normalise the counts per GO term on the relative proportions of the ExpCounts given in the GOstats result file, i.e. columns 4 and 13 of the merged result file.
>GO.list<-merge(GO.list.1, GO.list.2, by=2, all=F) >colnames(GO.list) [1] "GOMFID" "Pvalue.x" "OddsRatio.x" "ExpCount.x" "Count.x" "Size.x" [7] "Term.x" "test.x" "ontology.x" "subset.x" "Pvalue.y" "OddsRatio.y" [13] "ExpCount.y" "Count.y" "Size.y" "Term.y" "test.y" "ontology.y" [19] "subset.y"
>p1<-GO.list$ExpCount.x/(GO.list$ExpCount.y+GO.list$ExpCount.x) >p2<-GO.list$ExpCount.y/(GO.list$ExpCount.y+GO.list$ExpCount.x) >GO.list$pvalue<-sapply(1:nrow(GO.list), function(i) GTest(x=as.numeric(unlist(GO.list[i,c(5,14)]))), p=c(p1,p2))$pvalue)
But I don`t think this is the correct way to do it either, because if I run GOstats on a merged list (i.e. merging my two protein in put lists) then the ExpCounts are (somewhat expectedly) not the sum of the two input lists.
So I was wondering if I could use the Count columns as a proxy for calculating the expected proportions:
>GO.list<-merge(GO.list.1, GO.list.2, by=2, all=F) >colnames(GO.list) [1] "GOMFID" "Pvalue.x" "OddsRatio.x" "ExpCount.x" "Count.x" "Size.x" [7] "Term.x" "test.x" "ontology.x" "subset.x" "Pvalue.y" "OddsRatio.y" [13] "ExpCount.y" "Count.y" "Size.y" "Term.y" "test.y" "ontology.y" [19] "subset.y"
>p1<-((sum(GO.list$Count.x)/sum(GO.list$Count.y))*0.5)/(((sum(GO.list$Count.x)/sum(GO.list$Count.y))*0.5)+0.5) >p2<-((sum(GO.list$Count.y)/sum(GO.list$Count.x))*0.5)/(((sum(GO.list$Count.y)/sum(GO.list$Count.x))*0.5)+0.5)
#in my case: p1: 0.4176183 #p2: 0.5823817
>GO.list$pvalue<-sapply(1:nrow(GO.list), function(i) GTest(x=as.numeric(unlist(GO.list[i,c(5,14)]))), p=c(p1,p2))$pvalue)
Does anybody have any suggestion if this is any way correct, or should I do something completely different?
Cheers
Sven
Hi thanks for your answer.
Yes, I think
roast
would do exactly what I'd want, but for that I'd have to have replicates which I don't have. I have only the Counts from one enrichment analysis the Counts from the second one which I want to compare to each other.I then found that I could use
geneSetTest
:So now I get a pvalue, but I hoestly don`t know what it means, in the documentation it reads:
geneSetTest
performs a competitive test in the sense that genes in the test set are compared to other genes (Goeman and Buhlmann, 2007). If thestatistic
is a genewise test statistic for differential expression, thengeneSetTest
tests whether genes in the set are more differentially expressed than genes not in the set.But it doesn`t tell me what the null hypothesis is. Is the null that the genes are more differently, or is it they are not differently?