Hi,
I am using DESeq2 for my RNAseq data analysis. My design has two
strains and two conditions. I did my analysis with DESeq2 v 1.4.5 with
the design
dds1 <- DESeqDataSetFromMatrix(countData=counts, colData=design,
design=~ Colony + Treatment + Colony:Treatment)
dds1 <- DESeq(dds1, test="LRT", reduced=Colony + Treatment)
>resultsNames (dds1)
[1] "Intercept" "Colony_1_vs_2" "Treatment_1_vs_2"
"Colony1.Treatment1"
"Colony1.Treatment1" is the interaction term. Am I right? I don't
understand what the log2fold change represents. I read the vignette
(last updated May 2014) as well several discussions online, but I am
not sure I understand the concept. For those genes with FDR corrected
p values below 0.05, when go back and check the VST counts/raw reads I
see trends. Could you please help me understand what the
log2foldchange means?
Thanks in advance.
Neetha
-- output of sessionInfo():
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] DESeq2_1.4.5 RcppArmadillo_0.4.320.0 Rcpp_0.11.2
[4] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.9
[7] BiocGenerics_0.10.0 BiocInstaller_1.14.2
loaded via a namespace (and not attached):
[1] annotate_1.42.0 AnnotationDbi_1.26.0 Biobase_2.24.0
DBI_0.2-7
[5] genefilter_1.46.1 geneplotter_1.42.0 grid_3.1.1
lattice_0.20-29
[9] locfit_1.5-9.1 RColorBrewer_1.0-5 RSQLite_0.11.4
splines_3.1.1
[13] stats4_3.1.1 survival_2.37-7 tools_3.1.1
XML_3.98-1.1
[17] xtable_1.7-3 XVector_0.4.0
--
Sent via the guest posting facility at bioconductor.org.
hi Neetha,
You should look up a statistics reference for interaction terms in
linear models, or ask for an explanation from a statistician at your
institution. The interaction concept for linear models is the same as
here, although here we work with additive log fold changes, which
represent multiplication of fold changes.
Suppose we have a design ~ A + B+ A:B, and A=0/1, B=0/1. This means
we are modeling a log fold change due to variable A=1 over A=0, and a
log fold change due to variable B=1 over B=0. The interaction term is
an additional log fold change when A=1 and B=1 beyond the main effect
for A and the main effect for B. If the log fold change for the
interaction term is 0, then we know: the fold change when A=1 and B=1
is simply the two main effect fold changes multiplied (the log fold
changes added).
Mike
On Wed, Jul 16, 2014 at 1:10 PM, Neetha [guest] <guest at="" bioconductor.org=""> wrote:
> Hi,
>
>
> I am using DESeq2 for my RNAseq data analysis. My design has two
strains and two conditions. I did my analysis with DESeq2 v 1.4.5 with
the design
> dds1 <- DESeqDataSetFromMatrix(countData=counts, colData=design,
design=~ Colony + Treatment + Colony:Treatment)
> dds1 <- DESeq(dds1, test="LRT", reduced=Colony + Treatment)
>>resultsNames (dds1)
> [1] "Intercept" "Colony_1_vs_2" "Treatment_1_vs_2"
"Colony1.Treatment1"
>
> "Colony1.Treatment1" is the interaction term. Am I right? I don't
understand what the log2fold change represents. I read the vignette
(last updated May 2014) as well several discussions online, but I am
not sure I understand the concept. For those genes with FDR corrected
p values below 0.05, when go back and check the VST counts/raw reads I
see trends. Could you please help me understand what the
log2foldchange means?
> Thanks in advance.
> Neetha
>
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.1.1 (2014-07-10)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods base
>
> other attached packages:
> [1] DESeq2_1.4.5 RcppArmadillo_0.4.320.0 Rcpp_0.11.2
> [4] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.9
> [7] BiocGenerics_0.10.0 BiocInstaller_1.14.2
>
> loaded via a namespace (and not attached):
> [1] annotate_1.42.0 AnnotationDbi_1.26.0 Biobase_2.24.0
DBI_0.2-7
> [5] genefilter_1.46.1 geneplotter_1.42.0 grid_3.1.1
lattice_0.20-29
> [9] locfit_1.5-9.1 RColorBrewer_1.0-5 RSQLite_0.11.4
splines_3.1.1
> [13] stats4_3.1.1 survival_2.37-7 tools_3.1.1
XML_3.98-1.1
> [17] xtable_1.7-3 XVector_0.4.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
Hi Neetha,
The interaction term is (algebraically) something like
(colony1_treatment_1 - colony_1_treatment_2) - (colony_2_treatment_1 -
colony_2_treatment_2)
In other words, you are trying to see if the different treatments
affect
the two colonies differently. And because of the way it is
constructed,
the interaction term fold change isn't by itself that easily
interpreted.
As an example, if I just substitute fake numbers into the above
formula,
all of these result in a logFC of 2:
(3 - 3) - (1 - 3)
(3 - 1) - (3 - 3)
(3 - 2) - (2 - 3)
There are many other ways a logFC of 2 can arise, so you really want
to
look at a plot of the logCPM values to see the underlying pattern.
This
is where the ReportingTools package comes in. You can easily make an
HTML table that has little plots in each row that show the
directionality of expression for all four groups. See the vignette for
more information:
http://bioconductor.org/packages/release/bioc/vignettes/ReportingTools
/inst/doc/rnaseqAnalysis.pdf
Best,
Jim
On 7/16/2014 1:10 PM, Neetha [guest] wrote:
> Hi,
>
>
> I am using DESeq2 for my RNAseq data analysis. My design has two
strains and two conditions. I did my analysis with DESeq2 v 1.4.5 with
the design
> dds1 <- DESeqDataSetFromMatrix(countData=counts, colData=design,
design=~ Colony + Treatment + Colony:Treatment)
> dds1 <- DESeq(dds1, test="LRT", reduced=Colony + Treatment)
>> resultsNames (dds1)
> [1] "Intercept" "Colony_1_vs_2" "Treatment_1_vs_2"
"Colony1.Treatment1"
>
> "Colony1.Treatment1" is the interaction term. Am I right? I don't
understand what the log2fold change represents. I read the vignette
(last updated May 2014) as well several discussions online, but I am
not sure I understand the concept. For those genes with FDR corrected
p values below 0.05, when go back and check the VST counts/raw reads I
see trends. Could you please help me understand what the
log2foldchange means?
> Thanks in advance.
> Neetha
>
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.1.1 (2014-07-10)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods base
>
> other attached packages:
> [1] DESeq2_1.4.5 RcppArmadillo_0.4.320.0 Rcpp_0.11.2
> [4] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.9
> [7] BiocGenerics_0.10.0 BiocInstaller_1.14.2
>
> loaded via a namespace (and not attached):
> [1] annotate_1.42.0 AnnotationDbi_1.26.0 Biobase_2.24.0
DBI_0.2-7
> [5] genefilter_1.46.1 geneplotter_1.42.0 grid_3.1.1
lattice_0.20-29
> [9] locfit_1.5-9.1 RColorBrewer_1.0-5 RSQLite_0.11.4
splines_3.1.1
> [13] stats4_3.1.1 survival_2.37-7 tools_3.1.1
XML_3.98-1.1
> [17] xtable_1.7-3 XVector_0.4.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099