I have rather general question concering FDR/adjusted pvalues.
I did an RNASeq experiment (treated vs untreated cells), and analysed the data with DESeq. There I found that I had only very few transcripts passing the FDR cut off at 0.1, which I assumed to be because I have rather low effect sizes (max log2FC ~0.8) and a rather high variation. So I thought to define a subset of "interesing" genes, DESeq e.g. allows one to define a certain log2fc cut off for defining subsets. I however decided to not go for a log2fc cut off, as I thought variation between replicates might be the main issue (and that given my low overall log2fc I potentially might lose too much information) and so I decided to only taking those transcripts into account in which the quotient between SD and log2FC is smaller than 1 (effectively this is a pvalue cut off).
My question(s) are now the following:
How can I now estimate the FDR for this gene set.
a) should I calculate the FDR based on the original pvalues, i.e.:
p <- interestingGenes$pvalue p.adj <- p.adjust(p, "BH") interestingGenes$p.adj <- p.adj
b) could I use the interestingGenes set to subset the input table and re-run DESeq? Which would re-estimate all dispersions, and size factors, and effectively be a completely independent analysis.
c) could I use the interestingGenes set to subset the DESeq countdata set? I.e.
cds <- DESeqDataSetFromMatrix(countData=countdata,colData=coldata,design = ~ pheno) size <- estimateSizeFactors(cds) disp.par <- estimateDispersions(size, fitType="parametric") ddsFullCountTable <- subset(disp.par, rownames(size) %in% interestingGenes$X) ddsFullCountTable <- DESeq(ddsFullCountTable, fitType="parametric", parallel=T) #using pre-existing size factors #estimating dispersions #gene-wise dispersion estimates: 4 workers
This would use the old size factors and re-estimate the dispersions for DESeq and would in contrast to b) not be an entirely different experiment.