Hi all, I am looking into the GSVA package and would like to calculate gsva scores. When using the gsva method from the package you can choose to do bootstrapping which in the end will give you a p-value for each sample and gene set combination. I am having some trouble understanding this p-value and what is actually bootstrapped. I could not find information in the GSVA tutorial or paper, but I found some information in the code from the package. My questions are the following:
- Am I correct that the samples (columns in the expression data) are bootstrapped?
- The part of the code where the p-value is calculated (see bottom of this post) states that a
non-parametric test is done to test if the median of the empirical distribution is 0. Why was there in this
case chosen for 0 as hypothesis?
- How should I interpret the p-value calculated? To me it seems to be the proportion of bootstrap scores in
the extreme side of the distribution?
Hope there is someone that can help me!
Kind regards, Dionne
## Code calculating p-value from bootstrapping
# es.obs = observed gsva score
# es.bootstraps = estimated gsva scores from the bootstrap
# no.bootstraps = number of bootstraps
for(i in 1:n.gset){
for(j in 1:n.samples){
# non-parametric test if median of empirical dist is 0
if(es.obs[i,j] > 0){
p.vals.sign[i,j] <- (1 + sum(es.bootstraps[i,j,] < 0)) / (1 + no.bootstraps)
}else{
p.vals.sign[i,j] <- (1 + sum(es.bootstraps[i,j,] > 0)) / (1 + no.bootstraps)
}
}
}
Thanks for your help! I will try to contact the maintainer of the package with these questions too.