Search
Question: Subset of genes can find a term but whole set cannot
0
gravatar for ankita.mandal28529
5 months ago by
ankita.mandal285290 wrote:

I have a set of genes and I want to find out disease ontology term from that set. I have used DOSE and biomaRt R package. If I put a subset of genes, I can get a significant term but if I put the whole set of genes, that term is missing. Is it possible? Or, I have made any mistake?

ADD COMMENTlink modified 5 months ago by James W. MacDonald45k • written 5 months ago by ankita.mandal285290

You need to show some self-contained code that others can run. Otherwise, it's not clear exactly what you are doing, nor what the problem is.

ADD REPLYlink written 5 months ago by James W. MacDonald45k

My code:

 

library(DOSE)
library(biomaRt)
ensembl=useEnsembl(biomart="ensembl",dataset="hsapiens_gene_ensembl",version=89)
geneId<-readLines("First_100_Selected_Gene_name.txt")   try(entrez_gene<-getBM(attributes=c('entrezgene'),filters='hgnc_symbol',values=geneId,mart=ensembl))
write.table(entrez_gene,file="entrezgene.txt",sep="\t")
entrez_gene1<-entrez_gene[,1]
try(x<-enrichDO(gene=entrez_gene1,ont="DO",pvalueCutoff=0.05,pAdjustMethod="BH",minGSSize=5,readable=FALSE))
y<-as.data.frame(x)
write.table(y,file="disease_ontology_output.txt",sep="\t"

 

If I use "First_50_Selected_Gene_name.txt", I can get a significant term but if I use "First_100_Selected_Gene_name.txt", that term is missing. My question is, "Is it possible that subset can have a term but whole set can not ?"

ADD REPLYlink written 5 months ago by ankita.mandal285290
1
gravatar for James W. MacDonald
5 months ago by
United States
James W. MacDonald45k wrote:

By 'self-contained' I actually meant 'code that somebody other than you could run'. Perhaps I wasn't explicit enough.

Anyway, note that enrichDO is just fitting a Fisher's exact test, which is based on a 2x2 table, where the cells of the table are

  Genes in term Genes not in term
Gene is significant    
Gene not significant    

And if you only use half of your genes, you may affect the count of genes in all four of those cells. Say you have 7 'influenza' genes in your set, and you select all 7 regardless of whether or not you use 50 or 100 genes.

  Genes in term Genes not in term
Gene is significant 7 43
Gene not significant Inf.genes - 7 Univ - 50

But then you add in the other 50 genes

  Genes in term Genes not in term
Gene is significant 7 93
Gene not significant Inf.genes - 7

Univ - 100

 In that situation, three of the cells used to compute the statistic have now changed, which, depending on the number of influenza genes in the universe, and the number in the universe, may affect the statistic sufficiently for it to no longer be significant.

We can make a stupid example to test this, using the example data for enrichDO:

> example(enrichDO)
<snip>
> z <- summary(yy)

fun <- function(gene, sigresult, B){
    rslt <- vector(length = B)
    for(i in seq(B)){
        gene2 <- gene[sample(1:525, 212)] # Randomly select half the genes
        tmp <- summary(enrichDO(gene2))
        rslt[i] <- sum(!row.names(tmp) %in% row.names(sigresult))
    }
    rslt
}

> y <- fun(gene, z, 100)

> y
 [1]   0  37   7 104   0   0  15   3 108  70  20 158  72   7  31  59   1   9
 [19]  72  88  47 156  71  34 109  90  74  91  47  87  56  29  38 110   3  48
 [37]  33 102  29   0  95   0   5  80  10  74  82   3  82  32  36 181  19   2
 [55]   0  51  69  50   6  54  75   6 112  64  13  77  30   6   1  32  24  17
 [73]  34  14 112  33  68  83  26  49  36  23 105  12  53  12  59  83  26   8
 [91]   7   0 142  74   1  17  16  93  19  17

So we get from 0 to edit - 181 terms that appear when we use a random subset of half the original 525 genes used in the example, as compared to the terms we got in the first place.

ADD COMMENTlink modified 5 months ago • written 5 months ago by James W. MacDonald45k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 109 users visited in the last hour