Hi All,
I want to compare Deseq2 with sSeq and I am getting similar log2foldchange values but the direction is opposite. I am confident with the DEseq2 because I use the >relevel to make sure the control is the first level, but I am not sure how does sSeq does it.
Does anyone has an idea of what I am writing wrong or why this is happening?
DEseq2:
countdata=as.matrix(read.table("2014_04_06_6h_Co2.csv",header=T, sep=",", row.names=1))
coldata= read.table ("targets.csv", header = T, sep=",",row.names=1)
dds <- DESeqDataSetFromMatrix(
countData = countdata,
colData = coldata,
design = ~ Subject + Treatment)
dds
dds$Treatment <- factor(dds$Treatment,
levels=c("PBS", "LPS"))
dds$Treatment <- relevel (dds$Treatment,"PBS")
dds$Subject <- factor(dds$Subject)
colData (dds)
ddsLRT <- DESeq (dds, test ="LRT", reduced= ~Subject)
resLRT<- results(ddsLRT)
sSeq:
countdata=as.matrix(read.table("2014_04_06_6h_Co2.csv",header=T, sep=",", row.names=1))
conds.data=c("PBS","PBS","PBS","PBS","LPS","LPS","LPS","LPS")
coLevels=data.frame(subjects=c(1,2,4,5,1,2,4,5))
coLevels
## Filtering
nSamples <- ncol(countdata)
minSamples <- floor(nSamples / 2)
keep <- countdata > 4
keep <- rowSums(keep) > minSamples
countdata <- countdata[keep,]
res2 = nbTestSH(countdata, conds.data,"PBS", "LPS",
coLevels= coLevels, pairedDesign=TRUE, pairedDesign.dispMethod="pooled")
sSeq vs DESeq2 first two columns (after Row.names) are from sSeq and last three columns are from DESeq2
Row.names | rawLog2FoldChange | pval | log2FoldChange | pvalue | padj |
1.2.9712.m1 | -1.211 | 6.6E-13 | 1.225 | 2.6E-12 | 6E-09 |
1.2.21563.m1 | -1.183 | 9.6E-13 | 1.304 | 3.4E-02 | NA |
1.2.10726.m1 | -1.170 | 6.5E-20 | 1.280 | 4.3E-07 | 2E-04 |
1.2.3589.m1 | -1.160 | 3.6E-22 | 1.266 | 1.4E-07 | 6E-05 |
1.2.1063.m1 | -1.120 | 3.5E-04 | 1.191 | 5.4E-03 | NA |
1.2.4666.m1 | -1.100 | 5.6E-05 | 1.359 | 3.2E-02 | NA |
1.2.7266.m1 | -1.072 | 8.3E-06 | 0.645 | 2.4E-01 | NA |
1.2.7703.m1 | -1.040 | 1.6E-03 | 1.112 | 4.4E-02 | NA |
1.2.20078.m1 | -1.006 | 1.2E-16 | 0.568 | 3.5E-01 | 8E-01 |
1.2.22598.m1 | -1.006 | 6.1E-18 | 1.001 | 5.5E-06 | 1E-03 |
1.2.9108.m1 | -1.000 | 6.4E-04 | 0.971 | 4.9E-02 | NA |
1.2.11800.m1 | -1.000 | 2.7E-03 | 1.147 | 6.5E-02 | NA |
1.2.11765.m1 | -0.977 | 2.3E-23 | 0.872 | 3.6E-01 | NA |
1.2.7931.m1 | -0.965 | 1.7E-11 | 1.017 | 7.5E-02 | NA |
1.2.12788.m1 | -0.963 | 3.3E-03 | 1.001 | 4.1E-02 | NA |
1.2.8237.m1 | -0.959 | 1.6E-04 | 1.020 | 6.0E-03 | NA |
1.2.19252.m1 | -0.951 | 3.3E-05 | 1.210 | 3.1E-02 | NA |
1.2.8020.m1 | -0.949 | 2.4E-11 | 0.985 | 1.8E-03 | 7E-02 |
1.2.8023.m1 | -0.935 | 4.3E-09 | 0.824 | 2.6E-03 | 8E-02 |
Thanks for your help,
Catalina
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.4.5 RcppArmadillo_0.4.450.1.0 Rcpp_0.11.3 GenomicRanges_1.16.4 GenomeInfoDb_1.0.2
[6] IRanges_1.22.10 BiocGenerics_0.10.0 sSeq_1.2.0 RColorBrewer_1.0-5 caTools_1.17.1
loaded via a namespace (and not attached):
[1] annotate_1.42.1 AnnotationDbi_1.26.1 Biobase_2.24.0 bitops_1.0-6 DBI_0.3.1 genefilter_1.46.1
[7] geneplotter_1.42.0 grid_3.1.0 lattice_0.20-29 locfit_1.5-9.1 RSQLite_0.11.4 splines_3.1.0
[13] stats4_3.1.0 survival_2.37-7 tools_3.1.0 XML_3.98-1.1 xtable_1.7-4 XVector_0.4.0
I take it back, it's not so clear.
If you check the vignette you can see that the fold changes are A/B, by looking at the mean of A and the mean of B for the first gene.
I read both but couldn't understand how it does it. So in DESeq2 says that: "The column log2FoldChange is the effect size estimate. It tells us how much the gene's expression
seems to have changed due to treatment with DPN in comparison to control." Meaning I should interpret the sSeq in a different way? how does DESeq2 calculates them? Sorry I am confuse on how I can compare the two results.
From the DESeq2 vignette
1.2.5 Note on factor levels
...It is important to supply levels (otherwise the levels are chosen in
alphabetical order) and to put the “control” or “untreated” level as the first element (”base level”), so
that the log2 fold changes produced by default will be the expected comparison against the base level.
so in DESeq2, the default log fold change given is log(last-level / base-level). You made PBS the base level, so you know the log fold change will be log( LPS / PBS )
DESeq2 also tells you this information at the top when you print a results table
In the sSeq vignette, by looking at the first example of a results table you can see the log fold change given is log(condA / condB). You have condA="PBS" and condB="LPS" so you are getting log( PBS / LPS ).
Thanks for your explanation Michael. Now I understand that is being calculated in different way. So if I get one gene as an example (below) in DESeq (5th column, log2FoldChange) I will said that this gene is being up-regulated (relative to the control), but in sSeq (2nd column, rawLog2FoldChange) is being down-regulated in the control (relative to the treatment)?