Hello,
I encounter a problem using goSlim function in the package GSEAbase.
I have a set of annotations (pasted below). When I use the goSlim function in order to have the counts of appearance of each Go Slim plant ontology, it seems to me that the counts are off.
I noticed it because I am interested in one perticular ontology which is "GO:0005618" (cell wall), appearing two times in my annotation file but which is counted only one. Is there something I am missing here?
My annotation file and code are below:
Here is my annotation file, named annot7:
--------------------------------------------------------------------------------------------
Contig2390 GO:0000902 expansin-A1 isoform X1 Contig2390 GO:0005618 Contig2390 GO:0071554 Contig2390 GO:0005576 Contig2390 GO:0040007 Contig7915 GO:0007010 villin-3-like isoform X1 Contig7915 GO:0022607 Contig7915 GO:0008092 Contig10367 GO:0005975 endoglucanase 2 Contig10367 GO:0009056 Contig10367 GO:0005575 Contig10367 GO:0016798 Contig10367 EC:3.2.1.4 Contig21476 GO:0016746 O-acyltransferase WSD1-like Contig21476 GO:0006629 Contig21476 GO:0009058 Contig21476 GO:0005886 Contig21476 EC:2.3.1.20 Contig21476 EC:2.3.1.75 Contig71769 GO:0005975 plant invertase pectin methylesterase inhibitor superfamily Contig71769 GO:0009056 Contig71769 GO:0005618 Contig71769 GO:0071554 Contig71769 GO:0030234 Contig71769 EC:3.1.1.11 Contig71976 GO:0005575 glucomannan 4-beta-mannosyltransferase 9-like Contig71976 GO:0003674 Contig74515 GO:0005975 probable rhamnogalacturonate lyase B Contig74515 GO:0003674
-----------------------------------------------------------------------------------------------------------------
Here is my code:
--------------------------------------------------------------------------------------------------------------------
colnames(annot7)=c("contig","GO","gene") library("GSEABase") #database fl <- system.file("extdata", "goslim_plant.obo", package="GSEABase") slim <- getOBOCollection(fl) setDEGOs7=as.character(annot7$GO) currentCollecSet7=GOCollection(setDEGOs7) # Here, length(which(currentCollecSet7=="GO:0005618")) is 2 freqGO_7_MF=goSlim(currentCollecSet7, slim, "MF")#or BP or CC freqGO_7_BP=goSlim(currentCollecSet7, slim, "BP") freqGO_7_CC=goSlim(currentCollecSet7, slim, "CC")
#and this is what I get: > freqGO_7_CC Count Percent Term GO:0005575 4 36.363636 cellular_component GO:0005576 1 9.090909 extracellular region GO:0005578 0 0.000000 proteinaceous extracellular matrix GO:0005615 0 0.000000 extracellular space GO:0005618 1 9.090909 cell wall GO:0005622 0 0.000000 intracellular GO:0005623 2 18.181818 cell GO:0005634 0 0.000000 nucleus GO:0005635 0 0.000000 nuclear envelope GO:0005654 0 0.000000 nucleoplasm GO:0005730 0 0.000000 nucleolus GO:0005737 0 0.000000 cytoplasm GO:0005739 0 0.000000 mitochondrion GO:0005764 0 0.000000 lysosome GO:0005768 0 0.000000 endosome GO:0005773 0 0.000000 vacuole GO:0005777 0 0.000000 peroxisome GO:0005783 0 0.000000 endoplasmic reticulum GO:0005794 0 0.000000 Golgi apparatus GO:0005829 0 0.000000 cytosol GO:0005840 0 0.000000 ribosome GO:0005856 0 0.000000 cytoskeleton GO:0005886 1 9.090909 plasma membrane GO:0009536 0 0.000000 plastid GO:0009579 0 0.000000 thylakoid GO:0016020 1 9.090909 membrane GO:0030312 1 9.090909 external encapsulating structure
----------------------------------------------------------------------------------------------------------------------------
Thanks for reading me!
Thanks a lot! It works now and the results seems more logical.