so large FDR using deseq without replicate
1
0
Entering edit mode
Rui Guo • 0
@rui-guo-8318
Last seen 8.8 years ago
China

I use deseq2 for rna-seq data in treated and untreated group. I have no replicate in each group. The result I get ordered by pvalue:

log2 fold change (MAP): condition treated vs untreated 

Wald test p-value: condition treated vs untreated 
DataFrame with 6 rows and 6 columns
         baseMean log2FoldChange     lfcSE      stat      pvalue
        <numeric>      <numeric> <numeric> <numeric>   <numeric>
Cxcl1   2931.3112       3.038196  1.052850  2.885687 0.003905598
Angpt1   883.8835      -3.010196  1.055311 -2.852427 0.004338684
Pcdh1   1482.8029       3.001023  1.058310  2.835676 0.004572883
C3       644.1743       2.992916  1.057344  2.830599 0.004646087
Postn    618.3649      -2.913716  1.049750 -2.775628 0.005509526
Zc3h12a 1049.7000       2.902225  1.046408  2.773510 0.005545509
             padj
        <numeric>
Cxcl1   0.9222841
Angpt1  0.9222841
Pcdh1   0.9222841
C3      0.9222841
Postn   0.9222841
Zc3h12a 0.9222841

 

I wonder why so large padj? I have 24062 genes in count table. Can I just ignore padj and use the 20 or 50 genes with smallest pvalue?  The count table:

https://drive.google.com/file/d/0B7lsqOFmCD1sWFVJUFFOLVpOS00/view?usp=sharing

 

 

deseq2 • 1.6k views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

No, you should not use the genes with smallest p-value for publication, although you might look at these only for exploratory purposes. But for publication you need to control for multiple testing, by using the adjusted p-values to build sets with FDR less than a chosen threshold.

When you run DESeq() without replicates, it warns you:

Warning message:
In checkForExperimentalReplicates(object, modelMatrix) :
  same number of samples and coefficients to fit,
  estimating dispersion by treating samples as replicates.
  read the ?DESeq section on 'Experiments without replicates'

Go read that section in the help file (type ?DESeq into your R session and press return).

It is expected in this case to have large adjusted p-values: the DESeq() function treats the samples as if they were replicates to estimate dispersion, which means you have very little power to detect differential expression, and this is why we tell users that DESeq() is only for exploratory purposes in this case. No power means you are unlikely to generate genes with small p-value or adjusted p-value

ADD COMMENT
0
Entering edit mode

Thank you Mike, nice to see you again after ph525 series. I think I have to try other packages to do with this case.

ADD REPLY

Login before adding your answer.

Traffic: 906 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6