https://filebin.net/o4opqajy9jhrofg8
- count_data.tsv
- meta_data.tsv
I have the above test files. The DEGs are obviously those *_de genes.
count_file='count_data.tsv'
sample_file='meta_data.tsv'
condA='A'
condB='B'
qthr=0.5
sample_table=read.table(sample_file, header = T, sep = '\t', quote = '', check.names = F, comment.char = '', stringsAsFactors = F)
count_table=read.table(count_file, header = T, sep = '\t', quote = '', check.names = F, comment.char = '', stringsAsFactors = F)[, sample_table[, 1], drop=F]
suppressPackageStartupMessages(library(DESeq))
scaled_count_table=counts(
estimateSizeFactors(
newCountDataSet(count_table, sample_table[, 2])
)
, normalized=T
)
rs=rowSums(scaled_count_table)
sub_count_table=count_table[rs > quantile(rs, qthr), , drop=F]
cds=estimateDispersions(
estimateSizeFactors(
newCountDataSet(sub_count_table, sample_table[, 2])
)
, method='per-condition'
, fitType='local'
)
res=nbinomTest(cds, condA, condB)
sort(res, ~ padj, decreasing = F)
But when I try the above code, I don't get those *_de genes as the DEG. Is there anything wrong with DESeq?
R> head(sort(res, ~ padj, decreasing = F))
id baseMean baseMeanA baseMeanB foldChange log2FoldChange
64 gene00072 21.04255 30.18614 11.898966 0.3941864 -1.3430499
68 gene00076 16.58478 24.51167 8.657886 0.3532148 -1.5013824
73 gene00085 16.37519 21.81525 10.935135 0.5012611 -0.9963659
75 gene00087 20.99714 29.26174 12.732543 0.4351259 -1.2004952
117 gene00180 16.87296 23.34788 10.398049 0.4453530 -1.1669788
131 gene00207 16.29031 22.26660 10.314017 0.4632058 -1.1102748
pval padj
64 0.002871331 0.6846296
68 0.006280698 0.6846296
73 0.038339258 0.6846296
75 0.008285847 0.6846296
117 0.014302842 0.6846296
131 0.025645128 0.6846296
The following DESeq2 code can call the *_de genes as DEG correctly, however.
count_file='/tmp/count_data.tsv'
sample_file='/tmp/meta_data.tsv'
condA='A'
condB='B'
qthr=0.5
sample_table=read.table(sample_file, header = T, sep = '\t', quote = '', check.names = F, comment.char = '', stringsAsFactors = F)
count_table=read.table(count_file, header = T, sep = '\t', quote = '', check.names = F, comment.char = '', stringsAsFactors = F)[, sample_table[, 1], drop=F]
design=as.formula(sprintf('~%s', names(sample_table)[[2]]))
suppressWarnings(suppressPackageStartupMessages(library(DESeq2)))
scaled_count_table=counts(
estimateSizeFactors(
suppressWarnings(
DESeqDataSetFromMatrix(
countData=count_table
, colData=sample_table[, 2, drop=F]
, design=design
)
)
)
, normalized=T
)
rs=rowSums(scaled_count_table)
sub_count_table=count_table[rs > quantile(rs, qthr), , drop=F]
cds=estimateSizeFactors(
suppressWarnings(
DESeqDataSetFromMatrix(
countData=sub_count_table
, colData=sample_table
, design=design
)
)
)
res=data.frame(
results(
DESeq(cds, quiet = T)
, contrast=c(names(sample_table)[[2]], condA, condB)
)
)
res = sort(res, ~ padj, decreasing = F)
R> head(res)
baseMean log2FoldChange lfcSE stat pvalue
gene00016_de 260.5795 -5.568279 0.3368658 -16.52966 2.243893e-61
gene00049_de 249.7879 -5.049353 0.3150445 -16.02743 8.221806e-58
gene00040_de 245.6060 -5.027730 0.3179104 -15.81493 2.454812e-56
gene00041_de 273.9157 -5.212352 0.3482061 -14.96915 1.167964e-50
gene00007_de 276.0693 -4.827268 0.3257409 -14.81935 1.098469e-49
gene00046_de 244.2483 -4.657491 0.3144487 -14.81161 1.232599e-49
padj
gene00016_de 1.121946e-58
gene00049_de 2.055451e-55
gene00040_de 4.091354e-54
gene00041_de 1.459956e-48
gene00007_de 1.027166e-47
gene00046_de 1.027166e-47
Is DESeq deprecated? If it is not supported anymore, why not add a message to the package info so that people will not use it?
Regarding the DESeq code that I used, can you take a look whether is anything wrong with it? Thanks.
No, I actually don’t have time for that.
Yes, DESeq is depreacated since at least five years, Maybe the message that we put there five years ago ("Welcome to 'DESeq'. For improved performance, usability and functionality, please consider migrating to 'DESeq2'.") is too mild.
But it is hard to understand why people still use DESeq (first released in 2010 and last updated in 2014), when DESeq2 has now been out for 6 years. Bioinformatics is a fast moving field, and 6 years is an eternity. You can't keep using 10-year old software, because the manpower to maintain superseded tools is simply not there.