Defining subsets for FDR correction
1
0
Entering edit mode
sven.schenk ▴ 30
@svenschenk-15837
Last seen 5.6 years ago
Vienna

Hello,

I have rather general question concering FDR/adjusted pvalues.

I did an RNASeq experiment (treated vs untreated cells), and analysed the data with DESeq. There I found that I had only very few transcripts passing the FDR cut off at 0.1, which I assumed to be because I have rather low effect sizes (max log2FC ~0.8) and a rather high variation. So I thought to define a subset of "interesing" genes, DESeq e.g. allows one to define a certain log2fc cut off for defining subsets. I however decided to not go for a log2fc cut off, as I thought variation between replicates might be the main issue (and that given my low overall log2fc I potentially might lose too much information) and so I decided to only taking those transcripts into account in which the quotient between SD and log2FC is smaller than 1 (effectively this is a pvalue cut off).

interestingGenes<-subset(input, abs(lfcSE*sqrt(3)/log2FoldChange)<=1.0)

My question(s) are now the following:

How can I now estimate the FDR for this gene set.

a) should I calculate the FDR based on the original pvalues, i.e.:

p <- interestingGenes$pvalue
p.adj <- p.adjust(p, "BH")
interestingGenes$p.adj <- p.adj

b) could I use the interestingGenes set to subset the input table and re-run DESeq? Which would re-estimate all dispersions, and size factors, and effectively be a completely independent analysis.

c) could I use the interestingGenes set to subset the DESeq countdata set? I.e.

cds <- DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design = ~ pheno)
size <- estimateSizeFactors(cds)
disp.par <- estimateDispersions(size, fitType="parametric")
ddsFullCountTable <- subset(disp.par, rownames(size) %in% interestingGenes$X)
ddsFullCountTable <- DESeq(ddsFullCountTable, fitType="parametric", parallel=T)
 #using pre-existing size factors
 #estimating dispersions
 #gene-wise dispersion estimates: 4 workers

This would use the old size factors and re-estimate the dispersions for DESeq and would in contrast to b) not be an entirely different experiment.

Cheers

Sven

fdr deseq2 ihw • 1.7k views
ADD COMMENT
0
Entering edit mode

I added a DESeq2 tag, and also IHW, as the IHW folks may have some advice on increasing power at the FDR correction step.

ADD REPLY
0
Entering edit mode

Thanks, let`s see if it helps.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

You defintiely don’t want to use any kind of filter or “co-data” which is correlated with the test statistic under the null.  This will lose type I error control in an extreme way.

The test statistic is the LFC/SE so the LFC should definitely not be used to filter. The mean of counts is independent (besides discreteness at very low counts) and we show this with histograms in the DESeq2 paper.

ADD COMMENT
0
Entering edit mode

ok, I guessed as much.

so that would mean I should generally not filter like this:

interestingGenes<-subset(input, abs(lfcSE*sqrt(3)/log2FoldChange)<=1)

but rather use use something involving baseMean,but how would I find an unbiased way to define a cut off for my subset? Could I use baseMean/SE?

 

ADD REPLY
0
Entering edit mode

You can’t use anything correlated with the test statistic under the null. So SE is also not good.

ADD REPLY
0
Entering edit mode

ok, that makes sense so nothing that comes out of DESeq (apart from baseMean) can be used. But I was thinking if baseMean is independent so should be SD(baseMean) this would also most directly account for my varioation issue.

 

ADD REPLY

Login before adding your answer.

Traffic: 716 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6