Search
Question: GSVA: questions about the bootstrap p-value
1
gravatar for dionnezaal
6 months ago by
dionnezaal10
dionnezaal10 wrote:

Hi all, I am looking into the GSVA package and would like to calculate gsva scores. When using the gsva method from the package you can choose to do bootstrapping which in the end will give you a p-value for each sample and gene set combination. I am having some trouble understanding this p-value and what is actually bootstrapped. I could not find information in the GSVA tutorial or paper, but I found some information in the code from the package. My questions are the following:

- Am I correct that the samples (columns in the expression data) are bootstrapped?
- The part of the code where the p-value is calculated (see bottom of this post) states that a
  non-parametric test is done to test if the median of the empirical distribution is 0. Why was there in this
  case chosen for 0 as hypothesis?
- How should I interpret the p-value calculated? To me it seems to be the proportion of bootstrap scores in
  the extreme side of the distribution?

Hope there is someone that can help me!
Kind regards, Dionne

 

## Code calculating p-value from bootstrapping
# es.obs = observed gsva score
# es.bootstraps = estimated gsva scores from the bootstrap
# no.bootstraps = number of bootstraps

for(i in 1:n.gset){
            
            for(j in 1:n.samples){
                # non-parametric test if median of empirical dist is 0
                if(es.obs[i,j] > 0){
                    p.vals.sign[i,j] <- (1 + sum(es.bootstraps[i,j,] < 0)) / (1 + no.bootstraps)
                }else{
                    p.vals.sign[i,j] <- (1 + sum(es.bootstraps[i,j,] > 0)) / (1 + no.bootstraps)
                }
            }
        }

ADD COMMENTlink modified 5 months ago by Robert Castelo2.1k • written 6 months ago by dionnezaal10
1
gravatar for Robert Castelo
5 months ago by
Robert Castelo2.1k
Spain/Barcelona/Universitat Pompeu Fabra
Robert Castelo2.1k wrote:

hi,

sorry for the delay in getting back to you. I'm a contributor to GSVA and not the maintainer of the package who added this feature but I'll try to answer. Indeed, bootstrapping was something added after the publication, which is the main reason why is not well described. Let me warn you that this is still an experimental feature, and therefore, it may change in the future the way it is working. Going to your specific questions

Am I correct that the samples (columns in the expression data) are bootstrapped?

yes

The part of the code where the p-value is calculated (see bottom of this post) states that a non-parametric test is done to test if the median of the empirical distribution is 0. Why was there in this case chosen for 0 as hypothesis?

I'm not the one who added this feature but my interpretation would be that, because under the null hypothesis, the two step-CDF, of genes inside the gene set and of genes outside the gene set, are identical, the resulting K-S statistic should be zero.

How should I interpret the p-value calculated? To me it seems to be the proportion of bootstrap scores in the extreme side of the distribution?

Again, because I'm not the one who added this feature I cannot safely answer your question but it looks like a non-parametric sign test and then it would be interpreted as the probability that the difference between the two step-CDF have zero median.

cheers,

robert.

ADD COMMENTlink written 5 months ago by Robert Castelo2.1k

Thanks for your help! I will try to contact the maintainer of the package with these questions too.

ADD REPLYlink written 5 months ago by dionnezaal10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 315 users visited in the last hour