GSVA: questions about the bootstrap p-value
1
2
Entering edit mode
dionnezaal ▴ 20
@dionnezaal-13258
Last seen 4.1 years ago

Hi all, I am looking into the GSVA package and would like to calculate gsva scores. When using the gsva method from the package you can choose to do bootstrapping which in the end will give you a p-value for each sample and gene set combination. I am having some trouble understanding this p-value and what is actually bootstrapped. I could not find information in the GSVA tutorial or paper, but I found some information in the code from the package. My questions are the following:

- Am I correct that the samples (columns in the expression data) are bootstrapped?
- The part of the code where the p-value is calculated (see bottom of this post) states that a
  non-parametric test is done to test if the median of the empirical distribution is 0. Why was there in this
  case chosen for 0 as hypothesis?
- How should I interpret the p-value calculated? To me it seems to be the proportion of bootstrap scores in
  the extreme side of the distribution?

Hope there is someone that can help me!
Kind regards, Dionne

 

## Code calculating p-value from bootstrapping
# es.obs = observed gsva score
# es.bootstraps = estimated gsva scores from the bootstrap
# no.bootstraps = number of bootstraps

for(i in 1:n.gset){
            
            for(j in 1:n.samples){
                # non-parametric test if median of empirical dist is 0
                if(es.obs[i,j] > 0){
                    p.vals.sign[i,j] <- (1 + sum(es.bootstraps[i,j,] < 0)) / (1 + no.bootstraps)
                }else{
                    p.vals.sign[i,j] <- (1 + sum(es.bootstraps[i,j,] > 0)) / (1 + no.bootstraps)
                }
            }
        }

gsva • 1.3k views
ADD COMMENT
1
Entering edit mode
Robert Castelo ★ 2.7k
@rcastelo
Last seen 12 weeks ago
Barcelona/Universitat Pompeu Fabra

hi,

sorry for the delay in getting back to you. I'm a contributor to GSVA and not the maintainer of the package who added this feature but I'll try to answer. Indeed, bootstrapping was something added after the publication, which is the main reason why is not well described. Let me warn you that this is still an experimental feature, and therefore, it may change in the future the way it is working. Going to your specific questions

Am I correct that the samples (columns in the expression data) are bootstrapped?

yes

The part of the code where the p-value is calculated (see bottom of this post) states that a non-parametric test is done to test if the median of the empirical distribution is 0. Why was there in this case chosen for 0 as hypothesis?

I'm not the one who added this feature but my interpretation would be that, because under the null hypothesis, the two step-CDF, of genes inside the gene set and of genes outside the gene set, are identical, the resulting K-S statistic should be zero.

How should I interpret the p-value calculated? To me it seems to be the proportion of bootstrap scores in the extreme side of the distribution?

Again, because I'm not the one who added this feature I cannot safely answer your question but it looks like a non-parametric sign test and then it would be interpreted as the probability that the difference between the two step-CDF have zero median.

cheers,

robert.

ADD COMMENT
0
Entering edit mode

Thanks for your help! I will try to contact the maintainer of the package with these questions too.

ADD REPLY

Login before adding your answer.

Traffic: 255 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6