Hi Mike and Simon,
Thanks for your help. You were right, I was lost and didn't understand
the
models properly.
Additionally I realized that when calling colData(), my Subject was
not
defined as a factor and hence was getting 0 DEGs.
Now I am getting 480 DEGs (FDR 0.1) when defining Subject as a factor
and
setting the reduced model properly:
dds <- DESeqDataSetFromMatrix(
countData = countdata,
colData = coldata,
design = ~ Subject + Treatment)
dds
dds$Treatment <- factor(dds$Treatment,
levels=c("PBS", "LPS"))
dds$Subject <- factor(dds$Subject)
colData (dds)
ddsLRT <- DESeq (dds, test ="LRT", reduced= ~Subject)
Thanks a lot for you help,
Catalina
On Wed, May 21, 2014 at 11:36 PM, Michael Love
<michaelisaiahlove@gmail.com>wrote:
> hi Catalina,
>
> Simon addressed the question about models, but I want to follow up
on
> another point.
>
> On Tue, May 20, 2014 at 9:03 PM, Catalina Aguilar Hurtado
> <catagui@gmail.com> wrote:
> >
> > Otherwise it would mean that my whole experiment didnât work and
can not
> > get any data out of it.
> >
>
> in the sense that it was underpowered to detect changes from the 5
> control to the 4 treated samples at an FDR cutoff of 0.05. Note that
> 1/20 is an arbitrary cutoff. You might also consider valuable to
> create lists of DE genes with 1/5 false discoveries for follow up
> study, although it might be the case that there are few genes at
this
> threshold as well. Furthermore, you can still use the experiment for
> exploratory analysis. For instance, examining the MA plot will give
> you insights into how many genes had very high or low fold changes.
> You can rank the results table by absolute value of log2FoldChange,
> and prioritize these genes for follow up study.
>
> Mike
>
>
> > Thanks again,
> >
> > Catalina
> >
> >
> > On Tue, May 20, 2014 at 10:12 PM, Michael Love
> > <michaelisaiahlove@gmail.com>wrote:
> >
> >> hi Catalina,
> >>
> >>
> >> On Mon, May 19, 2014 at 8:38 PM, Catalina Aguilar Hurtado
> >> <catagui@gmail.com> wrote:
> >> > Hi Mike,
> >> >
> >> > Not sure what you mean by the intersection.
> >>
> >> I meant those genes which had padj < cutoff for both DESeq and
DESeq2.
> >>
> >> You can calculate this with
> >>
> >> length(intersect(padj1rownames, padj2rownames))
> >>
> >> It's safest to work with the rownames, to make sure you are
referring
> >> to the same genes.
> >>
> >> You can get this vector, e.g. in DESeq2 with:
> >>
> >> padj2rownames <- rownames(results[which(results$padj < cutoff),])
> >>
> >> > I have run both without any
> >> > filter and reduced formula of ~Subject. As you can see below,
there is
> >> > almost no DEGs and the only way I could get was when ~1.
> >> >
> >>
> >> This means that, without independent filtering, for most genes,
the
> >> treatment effect was too small to observe a significant
difference for
> >> the experiment's sample size and/or sequencing depth. However,
you
> >> should use independent filtering to report as many genes as
possible,
> >> and I'd recommend using DESeq2 which will do this automatically
within
> >> the results() function.
> >>
> >> The reason I asked for you to not use independent filtering in
both is
> >> to see if in fact there were many genes called by either version
of
> >> DESeq but not by both, when the settings were made comparable.
> >>
> >> Note that the ~ 1 reduced design is not what you want:
> >>
> >> OK: "we tested for the treatment effect, controlling for the
subject
> >> effect"
> >> full: ~ subject + treament
> >> reduced: ~ subject
> >>
> >>
> >> not ok: "we tested for the treatment and subject effect"
> >> full: ~ subject + treament
> >> reduced: ~ 1
> >>
> >>
> >> Mike
> >>
> >>
> >> > DESeq2:
> >> >
> >> >> dds <- nbinomLRT(dds, full = design(dds), reduced = ~ Subject)
> >> >> resLRT <- results(dds, independentFiltering=FALSE)
> >> >
> >> >> sum( resLRT$padj < 0.5, na.rm=TRUE )
> >> > [1] 6
> >> >
> >> >
> >> > DESeq:
> >> >
> >> >
> >> >>fit0 <- fitNbinomGLMs (cds, count~Subject)
> >> >
> >> >>fit1 <- fitNbinomGLMs (cds, count~Treatment)
> >> >
> >> >>pvals <- nbinomGLMTest (fit1, fit0)
> >> >
> >> >> padj <- p.adjust( pvals, method="BH" )
> >> >> padj <- data.frame(padj)
> >> >
> >> >> row.names(padj)=row.names(cds)
> >> >> padj_fil <- subset (padj,padj <0.05 )
> >> >> dim (padj_fil)
> >> > [1] 0 1
> >> >
> >> > Thanks,
> >> >
> >> > Cata
> >> >
> >> > On Mon, May 19, 2014 at 10:31 PM, Michael Love
> >> > <michaelisaiahlove@gmail.com>wrote:
> >> >
> >> >> Hi Catalina,
> >> >>
> >> >> What are the numbers exactly and the intersection for using a
reduced
> >> >> design of ~ subject for both, and not filtering in either case
> >> >> (independentFiltering = FALSE)?
> >> >>
> >> >> Mike
> >> >> On May 18, 2014 10:34 PM, "Catalina Aguilar Hurtado" <
> catagui@gmail.com
> >> >
> >> >> wrote:
> >> >>
> >> >>> Hi Mike,
> >> >>>
> >> >>> Thanks for your reply.
> >> >>>
> >> >>> I had read the manual and did try the reduced design ~Subject
and
> got 5
> >> >>> DEGs. I tried with both DESeq and DESeq2.
> >> >>>
> >> >>> My data has a stronger effect of the subject compare to the
> treatment,
> >> if
> >> >>> I
> >> >>> do a heatmap the samples group by subject rather than by
treatment.
> >> >>>
> >> >>> The filtering theta value was choose based on a plot of
"Number of
> >> DEGs vs
> >> >>> theta" when using filtered_R in genefilter. Thanks for
pointing out
> >> that
> >> >>> results() was doing automatically. I have just run it again
without
> >> >>> pre-filter and got 812 DEGs.
> >> >>>
> >> >>> Still my concern is why the DEGs that I get in DESeq
disappear in
> the
> >> >>> DESeq2 results?
> >> >>>
> >> >>> Thanks again,
> >> >>>
> >> >>> Catalina
> >> >>>
> >> >>>
> >> >>> On Fri, May 16, 2014 at 10:35 PM, Michael Love
> >> >>> <michaelisaiahlove@gmail.com>wrote:
> >> >>>
> >> >>> > hi Catalina,
> >> >>> >
> >> >>> > If you want to test for the treatment effect and control
the
> subject
> >> >>> > effect you should use the reduced design of ~ subject.
> >> >>> >
> >> >>> > In the DESeq2 vignette we say: "the user provides the full
formula
> >> >>> > (the formula stored in design(dds)), and a reduced formula,
e.g.
> one
> >> >>> > which does not contain the variable of interest."
> >> >>> >
> >> >>> > For your filtering for the old version of DESeq, how did
you chose
> >> the
> >> >>> > value of theta=0.2? Note that such filtering in DESeq2 is
> >> accomplished
> >> >>> > automatically by results(), and the function optimizes for
the
> best
> >> >>> > value of theta (most adjusted p-values less than a
threshold). So
> you
> >> >>> > do not need to pre-filter.
> >> >>> >
> >> >>> > Mike
> >> >>> >
> >> >>> >
> >> >>> > On Fri, May 16, 2014 at 1:42 AM, Catalina Aguilar Hurtado
> >> >>> > <catagui@gmail.com> wrote:
> >> >>> > > Hi I want to compare DESeq vs DESeq2 and I am getting
different
> >> >>> number of
> >> >>> > > DEGs which I will expect to be normal. But when I compare
the
> 149
> >> >>> genes
> >> >>> > iD
> >> >>> > > that I get with DESeq with the 869 from DESeq2 there are
only
> ~10
> >> >>> genes
> >> >>> > > that are in common which I don\'92t understand (using
FDR <0.05
> >> for
> >> >>> > both).
> >> >>> > > I want to block the Subject effect for which I am
including the
> >> >>> reduced
> >> >>> > > formula of ~1.
> >> >>> > >
> >> >>> > > Shouldn\'92t this two methods output similar results?
Because
> at
> >> the
> >> >>> > > moment I could interpret my results with different ways.\
> >> >>> > > \
> >> >>> > > Thanks for your help,\
> >> >>> > > \
> >> >>> > > Catalina\
> >> >>> > > \
> >> >>> > > \
> >> >>> > > This the DESeq script that I am using:\
> >> >>> > > \
> >> >>> > > \
> >> >>> > > DESeq\
> >> >>> > > \
> >> >>> > > library(DESeq)\
> >> >>> > > \
> >> >>> > > co=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T,
> sep=",",
> >> >>> > > row.names=1))\
> >> >>> > > \
> >> >>> > > \
> >> >>> > > Subject=c(1,2,3,4,5,1,2,4,5)\
> >> >>> > > \
> >> >>> > > Treatment=c(rep("co",5),rep("c2",4))\
> >> >>> > > a.con=cbind(Subject,Treatment)\
> >> >>> > > \
> >> >>> > > cds=newCountDataSet(co,a.con)\
> >> >>> > > \
> >> >>> > > \
> >> >>> > > cds <- estimateSizeFactors( cds)\
> >> >>> > > \
> >> >>> > > cds <- estimateDispersions(cds,method="pooled-CR",
> >> >>> > > modelFormula=count~Subject+Treatment)\
> >> >>> > > \
> >> >>> > > \
> >> >>> > > #filtering\
> >> >>> > > \
> >> >>> > > rs = rowSums ( counts ( cds ))\
> >> >>> > > theta = 0.2\
> >> >>> > > use = (rs > quantile(rs, probs=theta))\
> >> >>> > > table(use)\
> >> >>> > > cdsFilt= cds[ use, ]\
> >> >>> > > \
> >> >>> > > \
> >> >>> > > \
> >> >>> > > fit0 <- fitNbinomGLMs (cdsFilt, count~1)\
> >> >>> > > \
> >> >>> > > fit1 <- fitNbinomGLMs (cdsFilt, count~Treatment)\
> >> >>> > > \
> >> >>> > > pvals <- nbinomGLMTest (fit1, fit0)\
> >> >>> > > \
> >> >>> > > \
> >> >>> > > padj <- p.adjust( pvals, method="BH" )\
> >> >>> > > \
> >> >>> > > padj <- data.frame(padj)\
> >> >>> > > \
> >> >>> > > row.names(padj)=row.names(cdsFilt)\
> >> >>> > > \
> >> >>> > > padj_fil <- subset (padj,padj <0.05 )\
> >> >>> > > \
> >> >>> > > dim (padj_fil)\
> >> >>> > > \
> >> >>> > > [1] 149 1\
> >> >>> > > \
> >> >>> > > \
> >> >>> > > ------------
> >> >>> > > \
> >> >>> > > library ("DESeq2")\
> >> >>> > > \
> >> >>> > >
countdata=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T,
> >> >>> sep=",",
> >> >>> > > row.names=1))\
> >> >>> > > \
> >> >>> > > coldata= read.table ("targets.csv", header = T,
> >> sep=",",row.names=1)\
> >> >>> > > \
> >> >>> > > coldata\
> >> >>> > > \
> >> >>> > >>Subject Treatment\
> >> >>> > >>F1 1 co\
> >> >>> > >>F2 2 co\
> >> >>> > >>F3 3 co\
> >> >>> > >>F4 4 co\
> >> >>> > >>F5 5 co\
> >> >>> > >>H1 1 c2\
> >> >>> > >>H2 2 c2\
> >> >>> > >>H4 4 c2\
> >> >>> > >>H5 5 c2\
> >> >>> > > \
> >> >>> > > dds <- DESeqDataSetFromMatrix(\
> >> >>> > > countData = countdata,\
> >> >>> > > colData = coldata,\
> >> >>> > > design = ~ Subject + Treatment)\
> >> >>> > > dds\
> >> >>> > > \
> >> >>> > > dds$Treatment <- relevel (dds$Treatment, "co")\
> >> >>> > > \
> >> >>> > > \
> >> >>> > > dds <- estimateSizeFactors( dds)\
> >> >>> > > \
> >> >>> > > dds <- estimateDispersions(dds)\
> >> >>> > > \
> >> >>> > > \
> >> >>> > > rs = rowSums ( counts ( dds ))\
> >> >>> > > theta = 0.2\
> >> >>> > > use = (rs > quantile(rs, probs=theta))\
> >> >>> > > table(use)\
> >> >>> > > ddsFilt= dds[ use, ]\
> >> >>> > > \
> >> >>> > > \
> >> >>> > > dds <- nbinomLRT(ddsFilt, full = design(dds), reduced = ~
1)\
> >> >>> > > \
> >> >>> > > resLRT <- results(dds)\
> >> >>> > > \
> >> >>> > > sum( resLRT$padj < 0.05, na.rm=TRUE )\
> >> >>> > > \
> >> >>> > > #[1] 869}
> >> >>> > >
> >> >>> > > [[alternative HTML version deleted]]
> >> >>> > >
> >> >>> > > _______________________________________________
> >> >>> > > Bioconductor mailing list
> >> >>> > > Bioconductor@r-project.org
> >> >>> > >
https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> >>> > > Search the archives:
> >> >>> >
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> *Catalina Aguilar Hurtado*
> >> >>>
> >> >>> PhD Candidate
> >> >>> ARC Centre of Excellence for Coral Reef Studies
> >> >>> Department of Biochemistry and Molecular Biology
> >> >>> AIMS@JCU
> >> >>> James Cook University
> >> >>> Townsville, QLD 4811, Australia
> >> >>> ph: +61 7 4781 6009
> >> >>>
> >> >>> [[alternative HTML version deleted]]
> >> >>>
> >> >>> _______________________________________________
> >> >>> Bioconductor mailing list
> >> >>> Bioconductor@r-project.org
> >> >>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> >>> Search the archives:
> >> >>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >> >>>
> >> >>
> >> >
> >> > [[alternative HTML version deleted]]
> >> >
> >> > _______________________________________________
> >> > Bioconductor mailing list
> >> > Bioconductor@r-project.org
> >> >
https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> > Search the archives:
> >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> > [[alternative HTML version deleted]]
> >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@r-project.org
> >
https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]