3 replicates vs 2 replicates biased outcome?
2
0
Entering edit mode
kan.jiang • 0
@kanjiang-11217
Last seen 7.8 years ago

Recently I used both cuffdiff and Deseq2 to find the differential genes. The outcome is very different. Cuffdiff gives me about 1600 changed genes and Deseq2 gives me  12347 genes. The overlapping is only 25%, which is very low. I expected some number like 50%. Cuffdiff unique is about 5% and Deseq2  unique is about 70%.  The huge difference actually comes out from the downregulated genes, which is 12347(hard to believehttp://imgur.com/a/jq6Qw).  I am not sure if there is something fundamentally wrong when I fed my data to the Deseq2. Please help me to find it out. Many thanks in advance!

There are 5 samples, 3 in con and 2 in exp. There is no warning message. The code I used

deseqSampleMatrix <- data.frame( 

    row.names=sampleNames,

    condition = as.factor(conditions),

    samplename = sampleNames)

  dds <- DESeqDataSetFromMatrix(countData = as.matrix(exprMat),  colData=deseqSampleMatrix, design=~condition)

  dds

 dds <- DESeq(dds)

  res <- results(dds, contrast=c("condition","exp","con"))

  res <- res[complete.cases(res),]

  sum(res$padj < 0.05, na.rm=TRUE)

  sum( res$padj < 0.1, na.rm=TRUE ) # 14514

  sum(res$log2FoldChange > 2 & res$padj < 0.05, na.rm = TRUE) #  853

  sum(res$log2FoldChange < 2 & res$padj < 0.05, na.rm = TRUE) # 12347

  sum(res$log2FoldChange == 2 & res$padj < 0.05, na.rm = TRUE)  # 0

as.data.frame(colData(dds))
     condition samplename sizeFactor
con1       con       con1  0.6391152
con2       con       con2  1.1517159
con3       con       con3  0.8991909
exp1       exp       exp1  1.4485410
exp2       exp       exp2  1.0539740

 optional parameter:  independentFiltering=FALSE, cooksCutoff=FALSE does not make big difference for the outcome

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pheatmap_1.0.8             gplots_3.0.1               RColorBrewer_1.1-2        
 [4] ggplot2_2.1.0              DESeq2_1.12.3              SummarizedExperiment_1.2.3
 [7] Biobase_2.32.0             GenomicRanges_1.24.2       GenomeInfoDb_1.8.3        
[10] IRanges_2.6.1              S4Vectors_0.10.2           BiocGenerics_0.18.0       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.6          plyr_1.8.4           XVector_0.12.1       bitops_1.0-6        
 [5] tools_3.3.1          zlibbioc_1.18.0      digest_0.6.10        rpart_4.1-10        
 [9] RSQLite_1.0.0        annotate_1.50.0      gtable_0.2.0         lattice_0.20-33     
[13] Matrix_1.2-6         DBI_0.4-1            gridExtra_2.2.1      genefilter_1.54.2   
[17] cluster_2.0.4        caTools_1.17.1       gtools_3.5.0         locfit_1.5-9.1      
[21] grid_3.3.1           nnet_7.3-12          data.table_1.9.6     AnnotationDbi_1.34.4
[25] XML_3.98-1.4         survival_2.39-5      BiocParallel_1.6.3   foreign_0.8-66      
[29] gdata_2.17.0         latticeExtra_0.6-28  Formula_1.2-1        geneplotter_1.50.0  
[33] Hmisc_3.17-4         scales_0.4.0         splines_3.3.1        colorspace_1.2-6    
[37] xtable_1.8-2         labeling_0.3         KernSmooth_2.23-15   acepack_1.3-3.3     
[41] munsell_0.4.3        chron_2.3-47       

The dispersion plot is attached for the convenience,which is shown a lot of outliers here to me.

 

deseq2 • 615 views
ADD COMMENT
0
Entering edit mode
@ryan-c-thompson-5618
Last seen 8 months ago
Scripps Research, La Jolla, CA

Your criteria for counting genes that are significantly up and down are wrong. You're using a log2 fold change threshold of 2, which is equivalent to 4-fold upregulation. This means that if a gene is 3.9-fold up and has a significant FDR, you are counting it as a down-regulated gene. So it's pretty clear why you are getting so many more "down-regulated" genes. If you want to split the genes into upregulated and downregulated, you should simply look for positive and negative log2 fold change values (i.e. greater than or less than 0). Also, it's not clear what the last line, which yields a count of zero, is supposed to be measuring. You're never going to get a gene with a fold change of exactly 4.

ADD COMMENT
0
Entering edit mode
kan.jiang • 0
@kanjiang-11217
Last seen 7.8 years ago

Hi Thompson,

Thanks for the quick response.  Thanks point it out. Sorry! I made mistake for the post: my updated code:

sum( res$padj < 0.05, na.rm=TRUE ) # 13255
sum(res$log2FoldChange > 2 & res$padj < 0.05, na.rm = TRUE) #  854
sum(res$log2FoldChange < -2 & res$padj < 0.05, na.rm = TRUE) # 5426

  sum(res$log2FoldChange == 2 & res$padj < 0.05, na.rm = TRUE)  # 0

Still Deseq2 give me  5426 down genes and 854 up genes, which is 6.35 fold than that of the down genes. Cuffdiff give me 876 down and 661 up genes, which is not very different from that of the up genes. Is it artifact or possible fact?

 

 

 

 

 

 

ADD COMMENT
0
Entering edit mode

Can you post an MA plot? (Use the "plotMA" function.)

ADD REPLY

Login before adding your answer.

Traffic: 822 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6