Entering edit mode
hi Olga and Karen,
Thanks for your interest in DESeq2 and for using the development
branch, which allows us to clarify changes before they enter the
release branch.
This concerns a modeling change from v1.2 to v1.3. Changes like these
are documented in the NEWS file with pointers to the
functions/arguments that changed:
(e.g. http://www.bioconductor.org/packages/2.14/bioc/news/DESeq2/NEWS)
I've been working on including more documentation in the package, but
haven't gotten to adding the parts about interaction terms yet, so
this gives me a chance to explain a bit.
The new resultsNames(dds) are due to "expanded model matrices", which
include a column in the design matrix for each level of a factor, as
opposed to "standard model matrices" produced by model.matrix, which
have a column in the deisgn matrix for the levels other than the base
level. Expanded model matrices are described a bit in the man pages
and in the vignette, but not yet using an example with interaction
effects. This makes the DE analysis independent of the choice to base
level, whereas previously the log fold change (LFC) shrinkage was not
independent of base level choices. In addition, it simplifies certain
comparisons.
You now have the columns "metasmet_no" and "metasmet_yes", whereas
before you had "metas_met_yes_vs_met_no". Previously you would use
the 'name' argument to extract this comparison, whereas with version
>= 1.3 we want to encourage users to use the 'contrast' argument:
results(dds, contrast=c("metas","met_yes","met_no"))
If the analysis includes interaction terms, previously the above
contrast would give the metastasis effect only for the base level of
the other factors in the design, but for version >= 1.3, this gives
the overall metastasis effect over all levels of the time variable.
If you want the individual time-period-specific effects, this is the
overall effect plus the interaction effects for that time period.You
have the following columns of the model matrix:
> [1] "Intercept" "timetime_1" "timetime_2"
"timetime_3" "timetime_4" "timetime_5"
> [7] "timetime_6" "metasmet_no"
"metasmet_yes" "timetime_1.metasmet_no"
"timetime_2.metasmet_no" "timetime_3.metasmet_no"
> [13] "timetime_4.metasmet_no" "timetime_5.metasmet_no"
"timetime_6.metasmet_no" "timetime_1.metasmet_yes"
"timetime_2.metasmet_yes" "timetime_3.metasmet_yes"
> [19] "timetime_4.metasmet_yes" "timetime_5.metasmet_yes"
"timetime_6.metasmet_yes"
So the metastasis effect specific to time period 2 would be, the
contrast of "metasmet_yes" vs "metasmet_no" and additionally, the
contrast of "timetime_2.metasmet_yes" vs "timetime_2.metasmet_no". We
can pull these out using the numeric contrast vector, starting with
all 0's and then filing in the -1's and 1's:
ctrst <- numeric(21)
rn <- resultsNames(dds)
ctrst[ rn == "metasmet_no" ] <- -1
ctrst[ rn == "metasmet_yes" ] <- 1
ctrst[ rn == "timetime_2.metasmet_no" ] <- -1
ctrst[ rn == "timetime_2.metasmet_yes" ] <- 1
results(dds, contrast=ctrst)
Mike
On Mon, Mar 17, 2014 at 5:07 AM, solgakar at bi.technion.ac.il
<solgakar at="" bi.technion.ac.il=""> wrote:
>
>
>
> From: Karen Chait [mailto:kchait at tx.technion.ac.il]
> Sent: Monday, March 17, 2014 10:57 AM
> To: Olga Karinsky
> Subject: RE: using DESeq2 with multi factor data
>
> Hello all,
> I am trying to use the DESeq2 package to perform RNA-Seq analysis on
a data containing several factors.
> I have been closely following the emails between Ming Yi and Michael
Love, because I think that my problem is very similar to what they
have discussed. But even though I received a lot of useful
information from their discussion, I still have several questions
regarding my specific data.
> Just as an overall information regarding my data, I have 96 samples
and the two factors I am interested in exploring are "time" and
"metastasis".
> In order to build my data set I used the following commands:
>
> > countData = read.table("merged_counts.txt", header=TRUE,
row.names=1)
>
> > metasVector=c("met_no","met_no","met_no","met_no","met_no","met_n
o","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met
_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","m
et_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no",
"met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no
","met_no","met_no","met_no","met_no","met_no","met_no","met_no","met_
no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","me
t_no","met_no","met_no","met_no","met_no","met_no","met_no","met_no","
met_no","met_no","met_no","met_no","met_no","met_no","met_yes","met_ye
s","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_ye
s","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_ye
s","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_ye
s","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_ye
s"(
>
> > timePointsVector=c("6","4","6","6","3","6","3","5","6","6","1","5
","3","4","3","6","6","6","2","6","1","2","4","6","5","5","5","3","6",
"5","6","2","6","6","1","5","5","6","6","6","6","6","6","4","2","6","3
","1","2","5","6","1","1","3","6","3","6","4","4","5","6","6","3","5",
"4","6","1","4","3","1","1","1","4","2","1","1","3","6","1","1","2","1
","6","3","3","2","5","3","2","3","1","4","1","1","6","1")
>
> > colData=data.frame(row.names=colnames(countData),metas=metasVecto
r,gender=gendarVector)
>
> > colData$metas=factor(colData$metas, levels=c("met_no","met_yes"))
>
> > colData$time = factor(colData$time, levels = c("1", "2", "3",
"4", "5", "6"))
>
> > dds=DESeqDataSetFromMatrix(countData=tmpcountData,
colData=colData, design=~time + metas + metas:time)
>
> > dds=DESeq(dds)
> I have several questions:
>
> - first of all I have tried running those commands on
DESeq2 version 1.2.10 (R version 3.0.2) and DESeq2 version 1.3.47 (R
version 3.0.2) and what I have received from the resultsNames()
function I both cases is very different. Using the 1.2.10 version I
have received:
>
> > resultsNames(dds)
>
> [1] "Intercept" "time_2_vs_1"
"time_3_vs_1" "time_4_vs_1" "time_5_vs_1"
"time_6_vs_1" "metas_met_yes_vs_met_no"
"time2.metasmet_yes"
>
> [9] "time3.metasmet_yes" "time4.metasmet_yes"
"time5.metasmet_yes" "time6.metasmet_yes"
>
> > sessionInfo()
>
> R version 3.0.2 (2013-09-25)
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
>
> locale:
>
> [1] LC_COLLATE=Hebrew_Israel.1255 LC_CTYPE=Hebrew_Israel.1255
LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C
LC_TIME=Hebrew_Israel.1255
>
>
>
> attached base packages:
>
> [1] parallel stats graphics grDevices utils datasets
methods base
>
>
>
> other attached packages:
>
> [1] DESeq2_1.2.10 RcppArmadillo_0.4.100.2.1 Rcpp_0.11.0
GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.7
BiocGenerics_0.8.0
>
>
>
> loaded via a namespace (and not attached):
>
> [1] annotate_1.40.1 AnnotationDbi_1.24.0 Biobase_2.22.0
DBI_0.2-7 genefilter_1.44.0 grid_3.0.2
lattice_0.20-27 locfit_1.5-9.1 RColorBrewer_1.0-5
RSQLite_0.11.4
>
> [11] splines_3.0.2 stats4_3.0.2 survival_2.37-7
tools_3.0.2 XML_3.98-1.1 xtable_1.7-3
>
>
>
> Using the 1.3.47 version I have received:
>
> > resultsNames(dds)
>
> [1] "Intercept" "timetime_1" "timetime_2"
"timetime_3" "timetime_4" "timetime_5"
>
> [7] "timetime_6" "metasmet_no"
"metasmet_yes" "timetime_1.metasmet_no"
"timetime_2.metasmet_no" "timetime_3.metasmet_no"
>
> [13] "timetime_4.metasmet_no" "timetime_5.metasmet_no"
"timetime_6.metasmet_no" "timetime_1.metasmet_yes"
"timetime_2.metasmet_yes" "timetime_3.metasmet_yes"
>
> [19] "timetime_4.metasmet_yes" "timetime_5.metasmet_yes"
"timetime_6.metasmet_yes"
>
> > sessionInfo()
>
> R version 3.0.2 (2013-09-25)
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
>
> locale:
>
> [1] LC_COLLATE=Hebrew_Israel.1255 LC_CTYPE=Hebrew_Israel.1255
LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C
LC_TIME=Hebrew_Israel.1255
>
>
>
> attached base packages:
>
> [1] parallel stats graphics grDevices utils datasets
methods base
>
>
>
> other attached packages:
>
> [1] DESeq2_1.3.47 RcppArmadillo_0.4.100.0 Rcpp_0.11.0
GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.7
>
> [7] BiocGenerics_0.8.0
>
>
>
> loaded via a namespace (and not attached):
>
> [1] annotate_1.40.1 AnnotationDbi_1.24.0 Biobase_2.22.0
DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0
grid_3.0.2
>
> [8] lattice_0.20-27 locfit_1.5-9.1 RColorBrewer_1.0-5
RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2
survival_2.37-7
>
> [15] XML_3.98-1.1 xtable_1.7-3
>
> (I have ran the 1.3.47 version the same way besides a difference in
the names of the time levels, but I do not believe that this is the
reason for the differences)
> I don't fully understand the results I receive using
the 1.3.47 version and even more the difference between the versions.
>
> - From my understanding, the results I received using the
1.2.10 version are the more reasonable and they fit my settings of
base levels in the data. Now after receiving these results I would
love to understand how do I receive different contrast testing? For
each time period metas_yes vs. metas_no (for example
timetime_2.metasmet_yes vs. timetime_2.metasmet_no)
>
>
>
> Thank you in advance,
>
> Olga and Karen
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor