Question: Defining subsets for FDR correction
0
11 months ago by
sven.schenk10
Vienna
sven.schenk10 wrote:

Hello,

I have rather general question concering FDR/adjusted pvalues.

I did an RNASeq experiment (treated vs untreated cells), and analysed the data with DESeq. There I found that I had only very few transcripts passing the FDR cut off at 0.1, which I assumed to be because I have rather low effect sizes (max log2FC ~0.8) and a rather high variation. So I thought to define a subset of "interesing" genes, DESeq e.g. allows one to define a certain log2fc cut off for defining subsets. I however decided to not go for a log2fc cut off, as I thought variation between replicates might be the main issue (and that given my low overall log2fc I potentially might lose too much information) and so I decided to only taking those transcripts into account in which the quotient between SD and log2FC is smaller than 1 (effectively this is a pvalue cut off).

interestingGenes<-subset(input, abs(lfcSE*sqrt(3)/log2FoldChange)<=1.0)

My question(s) are now the following:

How can I now estimate the FDR for this gene set.

a) should I calculate the FDR based on the original pvalues, i.e.:

p <- interestingGenes$pvalue p.adj <- p.adjust(p, "BH") interestingGenes$p.adj <- p.adj

b) could I use the interestingGenes set to subset the input table and re-run DESeq? Which would re-estimate all dispersions, and size factors, and effectively be a completely independent analysis.

c) could I use the interestingGenes set to subset the DESeq countdata set? I.e.

cds <- DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design = ~ pheno)
size <- estimateSizeFactors(cds)
disp.par <- estimateDispersions(size, fitType="parametric")
ddsFullCountTable <- subset(disp.par, rownames(size) %in% interestingGenes\$X)
ddsFullCountTable <- DESeq(ddsFullCountTable, fitType="parametric", parallel=T)
#using pre-existing size factors
#estimating dispersions
#gene-wise dispersion estimates: 4 workers

This would use the old size factors and re-estimate the dispersions for DESeq and would in contrast to b) not be an entirely different experiment.

Cheers

Sven

deseq2 fdr ihw • 306 views
modified 11 months ago by Michael Love23k • written 11 months ago by sven.schenk10

I added a DESeq2 tag, and also IHW, as the IHW folks may have some advice on increasing power at the FDR correction step.

Thanks, lets see if it helps.

Answer: C: Defining subsets for FDR correction
0
11 months ago by
Michael Love23k
United States
Michael Love23k wrote:

You defintiely don’t want to use any kind of filter or “co-data” which is correlated with the test statistic under the null.  This will lose type I error control in an extreme way.

The test statistic is the LFC/SE so the LFC should definitely not be used to filter. The mean of counts is independent (besides discreteness at very low counts) and we show this with histograms in the DESeq2 paper.

ok, I guessed as much.

so that would mean I should generally not filter like this:

interestingGenes<-subset(input, abs(lfcSE*sqrt(3)/log2FoldChange)<=1)

but rather use use something involving baseMean,`but how would I find an unbiased way to define a cut off for my subset? Could I use baseMean/SE?

You can’t use anything correlated with the test statistic under the null. So SE is also not good.

ok, that makes sense so nothing that comes out of DESeq (apart from baseMean) can be used. But I was thinking if baseMean is independent so should be SD(baseMean) this would also most directly account for my varioation issue.