Question: DESeq2: different number of significant genes between DESeqDataSetFromMatrix and DESeqDataSetFromHTSeqCount
0
gravatar for jillian.waters
22 months ago by
jillian.waters0 wrote:

Hello,

I've been running in to some troubles with my data analysis with DESeq2, and one thing I decided to confirm was whether I was checking the difference in my results when I used an input count table for DESeqDataSetFromMatrix or when I followed the guidelines for DESeqDataSetFromHTSeqCount. In the former example I have ~1000 genes that are significant, whereas in the latter I have only 1. I am surprised considering that in theory I would think this is virtually identical input, although I am guessing it is an error on my part. 

The same sample files were used as direct input into DESeqDataSetFromHTSeqCount that were used to generate the count table used in DESeqDataSetFromMatrix. I have also spot checked some of these files, and the numbers within the individual file match what is in the geneCounts data.frame.

I have entered the code below. Any help would be greatly appreciated. Thank you in advance!

directory <- "/ebio//abt3_projects//Christensenella_Project//CM12//CM12_RNA-seq//WAT_v1//HTSeq//20171006_ALL"
sampleFiles <- grep("htseq_count",list.files(directory),value=TRUE)
sampleTable <- read.table('/ebio//abt3_projects//Christensenella_Project//CM12//CM12_RNA-seq//WAT_v1//CM12_WAT_Design_v11.txt',header=TRUE,sep="\t")
sample_table2=data.frame(sampleName = sampleTable$SampleID,
                          fileName = sampleFiles,
                          condition = sampleTable$Condition)

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sample_table2,
                                       directory = directory,
                                       design= ~ condition)

contrasts = c('condition', 'Blautia', 'Cminuta') 
dds <- DESeq(dds)
res <- results(dds, contrast=contrasts)
summary(res)

out of 37999 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 0, 0% 
LFC < 0 (down)   : 1, 0.0026% 
outliers [1]     : 804, 2.1% 
low counts [2]   : 2, 0.0053% 
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

The input to generate the count table for DEseqDataSetfromMatrix is below

Cmin1=read.table("/ebio/abt3_projects/Christensenella_Project/CM12/CM12_RNA-seq/WAT_v1/HTSeq/20171006_ALL/S1_htseq_count_V1.txt",sep="\t",header=FALSE)
Cmin2=read.table("/ebio/abt3_projects/Christensenella_Project/CM12/CM12_RNA-seq/WAT_v1/HTSeq/20171006_ALL/S5_htseq_count_V1.txt", sep="\t",header=FALSE)
Cmin3=read.table("/ebio/abt3_projects/Christensenella_Project/CM12/CM12_RNA-seq/WAT_v1/HTSeq/20171006_ALL/S10_htseq_count_V1.txt",sep="\t",header=FALSE)
Cmin4=read.table("/ebio/abt3_projects/Christensenella_Project/CM12/CM12_RNA-seq/WAT_v1/HTSeq/20171006_ALL/S15_htseq_count_V1.txt",sep="\t",header=FALSE)
Cmin5=read.table("/ebio/abt3_projects/Christensenella_Project/CM12/CM12_RNA-seq/WAT_v1/HTSeq/20171006_ALL/S20_htseq_count_V1.txt",sep="\t",header=FALSE)
Bhyd1=read.table("/ebio/abt3_projects/Christensenella_Project/CM12/CM12_RNA-seq/WAT_v1/HTSeq/20171006_ALL/S13_htseq_count_V1.txt",sep="\t",header=FALSE)
Bhyd2=read.table("/ebio/abt3_projects/Christensenella_Project/CM12/CM12_RNA-seq/WAT_v1/HTSeq/20171006_ALL/S19_htseq_count_V1.txt",sep="\t",header=FALSE)
Bhyd3=read.table("/ebio/abt3_projects/Christensenella_Project/CM12/CM12_RNA-seq/WAT_v1/HTSeq/20171006_ALL/S2_htseq_count_V1.txt",sep="\t",header=FALSE)
Bhyd4=read.table("/ebio/abt3_projects/Christensenella_Project/CM12/CM12_RNA-seq/WAT_v1/HTSeq/20171006_ALL/S4_htseq_count_V1.txt",sep="\t",header=FALSE)
Bhyd5=read.table("/ebio/abt3_projects/Christensenella_Project/CM12/CM12_RNA-seq/WAT_v1/HTSeq/20171006_ALL/S8_htseq_count_V1.txt",sep="\t",header=FALSE)


geneCounts=data.frame(Cmin1[,2],Cmin2[,2],Cmin3[,2],Cmin4[,2],Cmin5[,2],Bhyd1[,2],Bhyd2[,2],Bhyd3[,2],Bhyd4[,2],Bhyd5[,2],Bhyd6[,2],Bhyd7[,2],Bhyd8[,2],Bhyd9[,2],Bhyd10[,2],BiAssoc1[,2],BiAssoc2[,2],BiAssoc3[,2],BiAssoc4[,2],BiAssoc5[,2],BiAssoc6[,2],BiAssoc7[,2],BiAssoc8[,2],BiAssoc9[,2],GF1[,2],GF2[,2])
row.names(geneCounts)=Cmin1[,1]
sizeGeneCounts=dim(geneCounts)
geneCounts = geneCounts[1:(sizeGeneCounts[1]-5),]
condition=c(rep("C_minuta",5),rep("B_hydrogenotrophica",10),rep("BiAssociated",9),rep("Germfree",2))
sampleNames=c("Cmin1","Cmin2","Cmin3","Cmin4","Cmin5","Bhyd1","Bhyd2","Bhyd3","Bhyd4","Bhyd5","Bhyd6","Bhyd7","Bhyd8","Bhyd9","Bhyd10","BiAssoc1","BiAssoc2","BiAssoc3","BiAssoc4","BiAssoc5","BiAssoc6","BiAssoc7","BiAssoc8","BiAssoc9","GF1","GF2")
colnames(geneCounts) = sampleNames

write.table(geneCounts, "geneCounts_20171009_V1.txt", sep="\t",quote=FALSE)

Input for DESeqDataSetFromMatrix

count_file='/ebio//abt3_projects//Christensenella_Project/CM12/CM12_RNA-seq//WAT_v1//HTSeq//20171006_ALL/geneCounts_20171009_V1.txt'
meta_file='/ebio//abt3_projects/Christensenella_Project/CM12/CM12_RNA-seq/WAT_v1/CM12_WAT_Design_v11.txt'

meta = read.delim(meta_file, sep='\t')
counts = read.delim(count_file, sep='\t')
# ordering
rownames(meta) = meta$SampleID
meta = meta[colnames(counts),]

dds_old_2 = DESeqDataSetFromMatrix(countData=counts,
                             colData=meta,
                             design=~Condition)

contrasts = c('Condition', 'Blautia', 'Cminuta') 
dds_old_2 <- DESeq(dds_old_2)
res_old_2 <- results(dds_old_2, contrast=contrasts)
summary(res_old_2)

out of 37999 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 145, 0.38% 
LFC < 0 (down)   : 932, 2.5% 
outliers [1]     : 142, 0.37% 
low counts [2]   : 13714, 36% 
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

 

SessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /ebio/abt3_projects/Christensenella_Project/anaconda3/envs/R_DEseq2/lib/R/lib/libRblas.so
LAPACK: /ebio/abt3_projects/Christensenella_Project/anaconda3/envs/R_DEseq2/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BiocParallel_1.12.0        DESeq2_1.18.1             
 [3] SummarizedExperiment_1.8.0 DelayedArray_0.4.1        
 [5] matrixStats_0.52.2         Biobase_2.38.0            
 [7] GenomicRanges_1.30.0       GenomeInfoDb_1.14.0       
 [9] IRanges_2.12.0             S4Vectors_0.16.0          
[11] BiocGenerics_0.24.0        ggplot2_2.2.1             
[13] tidyr_0.7.2                dplyr_0.7.4               

loaded via a namespace (and not attached):
 [1] bit64_0.9-7             jsonlite_1.5            splines_3.4.1          
 [4] Formula_1.2-2           assertthat_0.2.0        latticeExtra_0.6-28    
 [7] blob_1.1.0              GenomeInfoDbData_0.99.1 RSQLite_2.0            
[10] backports_1.1.1         lattice_0.20-35         glue_1.2.0             
[13] uuid_0.1-2              digest_0.6.12           RColorBrewer_1.1-2     
[16] XVector_0.18.0          checkmate_1.8.5         colorspace_1.3-2       
[19] htmltools_0.3.6         Matrix_1.2-12           plyr_1.8.4             
[22] XML_3.98-1.9            pkgconfig_2.0.1         genefilter_1.60.0      
[25] zlibbioc_1.24.0         purrr_0.2.4             xtable_1.8-2           
[28] scales_0.5.0            tibble_1.3.4            htmlTable_1.9          
[31] annotate_1.56.1         repr_0.12.0             nnet_7.3-12            
[34] lazyeval_0.2.1          survival_2.41-3         magrittr_1.5           
[37] crayon_1.3.4            memoise_1.1.0           evaluate_0.10.1        
[40] foreign_0.8-69          tools_3.4.1             data.table_1.10.4-3    
[43] stringr_1.2.0           locfit_1.5-9.1          munsell_0.4.3          
[46] cluster_2.0.6           AnnotationDbi_1.40.0    bindrcpp_0.2           
[49] compiler_3.4.1          rlang_0.1.4             grid_3.4.1             
[52] RCurl_1.95-4.8          pbdZMQ_0.2-6            IRkernel_0.7.1         
[55] htmlwidgets_0.9         bitops_1.0-6            base64enc_0.1-3        
[58] gtable_0.2.0            DBI_0.7                 R6_2.2.2               
[61] gridExtra_2.3           knitr_1.17              bit_1.1-12             
[64] bindr_0.1               Hmisc_4.0-3             stringi_1.1.6          
[67] IRdisplay_0.4.4         Rcpp_0.12.14            geneplotter_1.56.0     
[70] rpart_4.1-11            acepack_1.4.1          
ADD COMMENTlink modified 22 months ago by Michael Love25k • written 22 months ago by jillian.waters0
Answer: DESeq2: different number of significant genes between DESeqDataSetFromMatrix and
1
gravatar for Michael Love
22 months ago by
Michael Love25k
United States
Michael Love25k wrote:

Can you run a test to show that counts(dds) is identical?

ADD COMMENTlink written 22 months ago by Michael Love25k

Hi Michael,

I ran this, and they are clearly not identical to one another. At first glance the order of the columns is different, as well as some of the values. 

After a more thorough look at the input sample information, I can now see that the discrepancy is that in my input sample_table_2 the file name does not match the correct file name, and this is what caused the error. 

Apologies for having missed this before posting, and thank you so much for your response!

ADD REPLYlink written 22 months ago by jillian.waters0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 245 users visited in the last hour