Recently I used both cuffdiff and Deseq2 to find the differential genes. The outcome is very different. Cuffdiff gives me about 1600 changed genes and Deseq2 gives me 12347 genes. The overlapping is only 25%, which is very low. I expected some number like 50%. Cuffdiff unique is about 5% and Deseq2 unique is about 70%. The huge difference actually comes out from the downregulated genes, which is 12347
(hard to believehttp://imgur.com/a/jq6Qw). I am not sure if there is something fundamentally wrong when I fed my data to the Deseq2. Please help me to find it out. Many thanks in advance!
There are 5 samples, 3 in con and 2 in exp. There is no warning message. The code I used
deseqSampleMatrix <- data.frame(
row.names=sampleNames,
condition = as.factor(conditions),
samplename = sampleNames)
dds <- DESeqDataSetFromMatrix(countData = as.matrix(exprMat), colData=deseqSampleMatrix, design=~condition)
dds
dds <- DESeq(dds)
res <- results(dds, contrast=c("condition","exp","con"))
res <- res[complete.cases(res),]
sum(res$padj < 0.05, na.rm=TRUE)
sum( res$padj < 0.1, na.rm=TRUE ) # 14514
sum(res$log2FoldChange > 2 & res$padj < 0.05, na.rm = TRUE) # 853
sum(res$log2FoldChange < 2 & res$padj < 0.05, na.rm = TRUE) # 12347
sum(res$log2FoldChange == 2 & res$padj < 0.05, na.rm = TRUE) # 0
as.data.frame(colData(dds))
condition samplename sizeFactor
con1 con con1 0.6391152
con2 con con2 1.1517159
con3 con con3 0.8991909
exp1 exp exp1 1.4485410
exp2 exp exp2 1.0539740
optional parameter: independentFiltering=FALSE, cooksCutoff=FALSE does not make big difference for the outcome
sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] pheatmap_1.0.8 gplots_3.0.1 RColorBrewer_1.1-2
[4] ggplot2_2.1.0 DESeq2_1.12.3 SummarizedExperiment_1.2.3
[7] Biobase_2.32.0 GenomicRanges_1.24.2 GenomeInfoDb_1.8.3
[10] IRanges_2.6.1 S4Vectors_0.10.2 BiocGenerics_0.18.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.6 plyr_1.8.4 XVector_0.12.1 bitops_1.0-6
[5] tools_3.3.1 zlibbioc_1.18.0 digest_0.6.10 rpart_4.1-10
[9] RSQLite_1.0.0 annotate_1.50.0 gtable_0.2.0 lattice_0.20-33
[13] Matrix_1.2-6 DBI_0.4-1 gridExtra_2.2.1 genefilter_1.54.2
[17] cluster_2.0.4 caTools_1.17.1 gtools_3.5.0 locfit_1.5-9.1
[21] grid_3.3.1 nnet_7.3-12 data.table_1.9.6 AnnotationDbi_1.34.4
[25] XML_3.98-1.4 survival_2.39-5 BiocParallel_1.6.3 foreign_0.8-66
[29] gdata_2.17.0 latticeExtra_0.6-28 Formula_1.2-1 geneplotter_1.50.0
[33] Hmisc_3.17-4 scales_0.4.0 splines_3.3.1 colorspace_1.2-6
[37] xtable_1.8-2 labeling_0.3 KernSmooth_2.23-15 acepack_1.3-3.3
[41] munsell_0.4.3 chron_2.3-47
The dispersion plot is attached for the convenience,which is shown a lot of outliers here to me.
Can you post an MA plot? (Use the "plotMA" function.)