Question

DESeq2 -- Bootstrap subsampling to estimate distribution of expression

0

Entering edit mode

David Eccles (gringer) ▴ 30

@david-eccles-gringer-7488

Last seen 4.3 years ago

New Zealand

I'm working in collaboration with a clinical diagnostics / research team in New Zealand, trying to use RNA biomarkers to determine disease risk. What we would ideally like to do is to generate expression ranges (in log space) using the reference samples, and then use those ranges to generate individual likelihoods per biomarker that are combined to generate a disease risk score. The original expectation was that this would be using the negative binomial distribution, but I can't find any documentation about how to estimate Nbinom parameters from a set of data (i.e. in a similar way that mean and sd can be used for normally-distributed data). Does anyone know of any ways to do this?

In light of this difficulty, I've been playing round with the 'boot' package to see if I can do bootstrap sub-sampling to estimate the expression distribution. One of our concerns is whether or not 6 samples is enough to be able to do this (that's what we've got at the moment, but we will have more in the future). Another concern is whether or not the approach I'm using is correct -- it's not something I've had much experience with previously, so I'm flailing around a bit. I'm currently subsampling reference samples with replacement, then calculating probability thresholds based on the distribution produced from the subsampling:

## extract out size-normalised counts for reference set
## [dgd.full is a DESeq2 dataset]
normcounts.ref.mat <-
    counts(dgd.full, normalized=TRUE)[,dgd.full$reference == "Ref"];
## subsample individuals (with replacement) to generate per-marker stats
ref.boot <- boot(data=t(normcounts.ref.mat), statistic = function(data,indices){
    colMeans(log1p(data[indices,]));
}, R = 10000);

## extract out descriptive statistics
ref.stats <- data.frame(row.names=rownames(normcounts.ref.mat));
for(stat in c("median","mean","sd","min","max")){
    ref.stats[[sprintf("LogT.%s",stat)]] <-
        apply(ref.boot$t,2,match.fun(stat));
}
for(prob in c(0.005,0.025,0.15,0.5,0.85,0.975,0.995)){
    ref.stats[[sprintf("Q%s",prob)]] <-
        round(expm1(apply(ref.boot$t,2,quantile,probs=prob)),0);
}
### this can be done with boot.ci as well:
## a <- boot.ci(ref.boot,index=1, type="perc", hinv=expm1, conf=c(0.7,0.95,0.99))

Does this method make sense? We're doing both subsampling and a DESeq2 DE analysis so that we can compare methods, and hopefully get an idea of which approach works best for our disease.

Cheers,

--
David Eccles
Bioinformatics Research Analyst, Gringene Bioinformatics
Room 2.10 x857
Malaghan Institute of Medical Research

deseq2 bootstrap differential expression • 1.6k views

ADD COMMENT • link updated 9.1 years ago by Michael Love 42k • written 9.1 years ago by David Eccles (gringer) ▴ 30

score 0 · Answer 1 · 2015-03-20

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 9 hours ago

United States

I don't have advice for the alternate methods but to answer this: "how to estimate Nbinom parameters from a set of data"

You can plug in ~1 for the design and then run DESeq() to estimate means and dispersions. See the section "Access to all calculated values" in the DESeq2 vignette.

ADD COMMENT • link 9.1 years ago Michael Love 42k