DESeq2 -- Bootstrap subsampling to estimate distribution of expression
1
0
Entering edit mode
@david-eccles-gringer-7488
Last seen 4.3 years ago
New Zealand

I'm working in collaboration with a clinical diagnostics / research team in New Zealand, trying to use RNA biomarkers to determine disease risk. What we would ideally like to do is to generate expression ranges (in log space) using the reference samples, and then use those ranges to generate individual likelihoods per biomarker that are combined to generate a disease risk score. The original expectation was that this would be using the negative binomial distribution, but I can't find any documentation about how to estimate Nbinom parameters from a set of data (i.e. in a similar way that mean and sd can be used for normally-distributed data). Does anyone know of any ways to do this?

In light of this difficulty, I've been playing round with the 'boot' package to see if I can do bootstrap sub-sampling to estimate the expression distribution. One of our concerns is whether or not 6 samples is enough to be able to do this (that's what we've got at the moment, but we will have more in the future). Another concern is whether or not the approach I'm using is correct -- it's not something I've had much experience with previously, so I'm flailing around a bit. I'm currently subsampling reference samples with replacement, then calculating probability thresholds based on the distribution produced from the subsampling:

## extract out size-normalised counts for reference set
## [dgd.full is a DESeq2 dataset]
normcounts.ref.mat <-
    counts(dgd.full, normalized=TRUE)[,dgd.full$reference == "Ref"];
## subsample individuals (with replacement) to generate per-marker stats
ref.boot <- boot(data=t(normcounts.ref.mat), statistic = function(data,indices){
    colMeans(log1p(data[indices,]));
}, R = 10000);

## extract out descriptive statistics
ref.stats <- data.frame(row.names=rownames(normcounts.ref.mat));
for(stat in c("median","mean","sd","min","max")){
    ref.stats[[sprintf("LogT.%s",stat)]] <-
        apply(ref.boot$t,2,match.fun(stat));
}
for(prob in c(0.005,0.025,0.15,0.5,0.85,0.975,0.995)){
    ref.stats[[sprintf("Q%s",prob)]] <-
        round(expm1(apply(ref.boot$t,2,quantile,probs=prob)),0);
}
### this can be done with boot.ci as well:
## a <- boot.ci(ref.boot,index=1, type="perc", hinv=expm1, conf=c(0.7,0.95,0.99))

Does this method make sense? We're doing both subsampling and a DESeq2 DE analysis so that we can compare methods, and hopefully get an idea of which approach works best for our disease.

Cheers,

--
David Eccles
Bioinformatics Research Analyst, Gringene Bioinformatics
Room 2.10 x857
Malaghan Institute of Medical Research

deseq2 bootstrap differential expression • 1.6k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 9 hours ago
United States

I don't have advice for the alternate methods but to answer this: "how to estimate Nbinom parameters from a set of data"

You can plug in ~1 for the design and then run DESeq() to estimate means and dispersions. See the section "Access to all calculated values" in the DESeq2 vignette.

ADD COMMENT

Login before adding your answer.

Traffic: 660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6