Search
Question: Resolved- GSEAbase: goSlim function, pb with counts
0
gravatar for maeva.mollion
8 days ago by
maeva.mollion0 wrote:

Hello, 

I encounter a problem using goSlim function in the package GSEAbase. 

I have a set of annotations (pasted below). When I use the goSlim function in order to have the counts of appearance of each Go Slim plant ontology, it seems to me that the counts are off. 

I noticed it because I am interested in one perticular ontology which is "GO:0005618" (cell wall), appearing two times in my annotation file but which is counted only one. Is there something I am missing here? 

My annotation file and code are below:

Here is my annotation file, named annot7:

--------------------------------------------------------------------------------------------

Contig2390 GO:0000902 expansin-A1 isoform X1
Contig2390 GO:0005618
Contig2390 GO:0071554
Contig2390 GO:0005576
Contig2390 GO:0040007
Contig7915 GO:0007010 villin-3-like isoform X1
Contig7915 GO:0022607
Contig7915 GO:0008092
Contig10367 GO:0005975 endoglucanase 2
Contig10367 GO:0009056
Contig10367 GO:0005575
Contig10367 GO:0016798
Contig10367 EC:3.2.1.4
Contig21476 GO:0016746 O-acyltransferase WSD1-like
Contig21476 GO:0006629
Contig21476 GO:0009058
Contig21476 GO:0005886
Contig21476 EC:2.3.1.20
Contig21476 EC:2.3.1.75
Contig71769 GO:0005975 plant invertase pectin methylesterase inhibitor superfamily
Contig71769 GO:0009056
Contig71769 GO:0005618
Contig71769 GO:0071554
Contig71769 GO:0030234
Contig71769 EC:3.1.1.11
Contig71976 GO:0005575 glucomannan 4-beta-mannosyltransferase 9-like
Contig71976 GO:0003674
Contig74515 GO:0005975 probable rhamnogalacturonate lyase B
Contig74515 GO:0003674

-----------------------------------------------------------------------------------------------------------------

Here is my code:

--------------------------------------------------------------------------------------------------------------------

colnames(annot7)=c("contig","GO","gene")

library("GSEABase")

#database
fl <- system.file("extdata", "goslim_plant.obo", package="GSEABase")
slim <- getOBOCollection(fl)

setDEGOs7=as.character(annot7$GO)
currentCollecSet7=GOCollection(setDEGOs7)

# Here, length(which(currentCollecSet7=="GO:0005618")) is 2

freqGO_7_MF=goSlim(currentCollecSet7, slim, "MF")#or BP or CC
freqGO_7_BP=goSlim(currentCollecSet7, slim, "BP")
freqGO_7_CC=goSlim(currentCollecSet7, slim, "CC")

 

#and this is what I get:

> freqGO_7_CC
           Count   Percent                               Term
GO:0005575     4 36.363636                 cellular_component
GO:0005576     1  9.090909               extracellular region
GO:0005578     0  0.000000 proteinaceous extracellular matrix
GO:0005615     0  0.000000                extracellular space
GO:0005618     1  9.090909                          cell wall
GO:0005622     0  0.000000                      intracellular
GO:0005623     2 18.181818                               cell
GO:0005634     0  0.000000                            nucleus
GO:0005635     0  0.000000                   nuclear envelope
GO:0005654     0  0.000000                        nucleoplasm
GO:0005730     0  0.000000                          nucleolus
GO:0005737     0  0.000000                          cytoplasm
GO:0005739     0  0.000000                      mitochondrion
GO:0005764     0  0.000000                           lysosome
GO:0005768     0  0.000000                           endosome
GO:0005773     0  0.000000                            vacuole
GO:0005777     0  0.000000                         peroxisome
GO:0005783     0  0.000000              endoplasmic reticulum
GO:0005794     0  0.000000                    Golgi apparatus
GO:0005829     0  0.000000                            cytosol
GO:0005840     0  0.000000                           ribosome
GO:0005856     0  0.000000                       cytoskeleton
GO:0005886     1  9.090909                    plasma membrane
GO:0009536     0  0.000000                            plastid
GO:0009579     0  0.000000                          thylakoid
GO:0016020     1  9.090909                           membrane
GO:0030312     1  9.090909   external encapsulating structure

----------------------------------------------------------------------------------------------------------------------------

 

Thanks for reading me!

ADD COMMENTlink modified 6 hours ago • written 8 days ago by maeva.mollion0
0
gravatar for Martin Morgan
1 day ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:

Sorry for my slow response, but yes, this seems to be a bug; thanks! It is fixed in version 1.38.2, which should become available via biocLite() on Thursday after about 2pm Eastern, all being well. The updated output is

> goSlim(currentCollecSet7, slim, "CC")
           Count Percent                               Term
GO:0005575     6   37.50                 cellular_component
GO:0005576     1    6.25               extracellular region
GO:0005578     0    0.00 proteinaceous extracellular matrix
GO:0005615     0    0.00                extracellular space
GO:0005618     2   12.50                          cell wall
GO:0005622     0    0.00                      intracellular
GO:0005623     3   18.75                               cell
GO:0005634     0    0.00                            nucleus
GO:0005635     0    0.00                   nuclear envelope
GO:0005654     0    0.00                        nucleoplasm
GO:0005730     0    0.00                          nucleolus
GO:0005737     0    0.00                          cytoplasm
GO:0005739     0    0.00                      mitochondrion
GO:0005764     0    0.00                           lysosome
GO:0005768     0    0.00                           endosome
GO:0005773     0    0.00                            vacuole
GO:0005777     0    0.00                         peroxisome
GO:0005783     0    0.00              endoplasmic reticulum
GO:0005794     0    0.00                    Golgi apparatus
GO:0005829     0    0.00                            cytosol
GO:0005840     0    0.00                           ribosome
GO:0005856     0    0.00                       cytoskeleton
GO:0005886     1    6.25                    plasma membrane
GO:0009536     0    0.00                            plastid
GO:0009579     0    0.00                          thylakoid
GO:0016020     1    6.25                           membrane
GO:0030312     2   12.50   external encapsulating structure

with increments occurring both in the primary id and also offspring. For instance the count for GO:0005575 changes from 4 to 6 because it is represented twice as primary id, twice as child of the duplicated id GO0005618, and once each as child of the unique ids GO:0005576 and 0005886.

ADD COMMENTlink written 1 day ago by Martin Morgan ♦♦ 20k

Thanks a lot! It works now and the results seems more logical. 

ADD REPLYlink written 6 hours ago by maeva.mollion0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 271 users visited in the last hour