Search
Question: Defining subsets for FDR correction
0
gravatar for sven.schenk
4 months ago by
sven.schenk10
Vienna
sven.schenk10 wrote:

Hello,

I have rather general question concering FDR/adjusted pvalues.

I did an RNASeq experiment (treated vs untreated cells), and analysed the data with DESeq. There I found that I had only very few transcripts passing the FDR cut off at 0.1, which I assumed to be because I have rather low effect sizes (max log2FC ~0.8) and a rather high variation. So I thought to define a subset of "interesing" genes, DESeq e.g. allows one to define a certain log2fc cut off for defining subsets. I however decided to not go for a log2fc cut off, as I thought variation between replicates might be the main issue (and that given my low overall log2fc I potentially might lose too much information) and so I decided to only taking those transcripts into account in which the quotient between SD and log2FC is smaller than 1 (effectively this is a pvalue cut off).

interestingGenes<-subset(input, abs(lfcSE*sqrt(3)/log2FoldChange)<=1.0)

My question(s) are now the following:

How can I now estimate the FDR for this gene set.

a) should I calculate the FDR based on the original pvalues, i.e.:

p <- interestingGenes$pvalue
p.adj <- p.adjust(p, "BH")
interestingGenes$p.adj <- p.adj

b) could I use the interestingGenes set to subset the input table and re-run DESeq? Which would re-estimate all dispersions, and size factors, and effectively be a completely independent analysis.

c) could I use the interestingGenes set to subset the DESeq countdata set? I.e.

cds <- DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design = ~ pheno)
size <- estimateSizeFactors(cds)
disp.par <- estimateDispersions(size, fitType="parametric")
ddsFullCountTable <- subset(disp.par, rownames(size) %in% interestingGenes$X)
ddsFullCountTable <- DESeq(ddsFullCountTable, fitType="parametric", parallel=T)
 #using pre-existing size factors
 #estimating dispersions
 #gene-wise dispersion estimates: 4 workers

This would use the old size factors and re-estimate the dispersions for DESeq and would in contrast to b) not be an entirely different experiment.

Cheers

Sven

ADD COMMENTlink modified 4 months ago by Michael Love19k • written 4 months ago by sven.schenk10

I added a DESeq2 tag, and also IHW, as the IHW folks may have some advice on increasing power at the FDR correction step.

ADD REPLYlink written 4 months ago by Michael Love19k

Thanks, let`s see if it helps.

ADD REPLYlink written 4 months ago by sven.schenk10
0
gravatar for Michael Love
4 months ago by
Michael Love19k
United States
Michael Love19k wrote:

You defintiely don’t want to use any kind of filter or “co-data” which is correlated with the test statistic under the null.  This will lose type I error control in an extreme way.

The test statistic is the LFC/SE so the LFC should definitely not be used to filter. The mean of counts is independent (besides discreteness at very low counts) and we show this with histograms in the DESeq2 paper.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Michael Love19k

ok, I guessed as much.

so that would mean I should generally not filter like this:

interestingGenes<-subset(input, abs(lfcSE*sqrt(3)/log2FoldChange)<=1)

but rather use use something involving baseMean,but how would I find an unbiased way to define a cut off for my subset? Could I use baseMean/SE?

 

ADD REPLYlink modified 4 months ago • written 4 months ago by sven.schenk10

You can’t use anything correlated with the test statistic under the null. So SE is also not good.

ADD REPLYlink modified 4 months ago • written 4 months ago by Michael Love19k

ok, that makes sense so nothing that comes out of DESeq (apart from baseMean) can be used. But I was thinking if baseMean is independent so should be SD(baseMean) this would also most directly account for my varioation issue.

 

ADD REPLYlink written 4 months ago by sven.schenk10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 325 users visited in the last hour