Hi,
I am trying to use DESeq2 for a set of counts obtained via
summerizedOverlaps and have the counts for 4 samples in one file.
Earlier I had used DESeq2 with counts from htseq-count, which reports
the counts for each samples in a separate file. Now that I have these
in one file, is there a method analogous to
"DESeqDataSetFromHTSeqCount" for constructing the dds object. With
htseq-count I had used the following set of commands:
"directory" contains the 4 count files.
> sampleFiles=list.files(directory)
> sampleFiles
> [1] "abc.htseq.count" "def.htseq.count"
> [3] "pqr.htseq.count" "xyz.htseq.count"
>
> sampleCondition=c("Tumor","Tumor","Normal","Normal")
> samplePatient=c("A","B","A","B")
> sampleTable=data.frame(sampleName=sampleFiles, fileName=sampleFiles,
condition=sampleCondition, patient=samplePatient)
> dds=DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory=directory, design= ~ patient + condition)
Thanks,
- Pankaj
________________________________
From: Michael Love [michaelisaiahlove@gmail.com]
Sent: Thursday, April 17, 2014 9:18 AM
To: Pankaj Agarwal
Subject: RE: [BioC] DESeq2: Paired Test
Hi Pankaj
For future questions to Bioconductor maintainers, always cc the
mailing list. This helps avoids duplicate questions, which we get a
lot of. Answer below,
On Apr 17, 2014 8:49 AM, "Pankaj Agarwal"
<p.agarwal@duke.edu<mailto:p.agarwal@duke.edu>> wrote:
>
> Thanks for your help and response.
>
> >> I did not get the normalization factors in the output, whereas it
does
> >> show up in the results in the vignette, so I am not sure if I
missed some step.
>
> >I'm confused, which section of the vignette are you referencing?
>
> I was referring to Section 1.5 which report "sizeFactor" as the
third column in the table, but when I execute the same function I
dont see that column:
You have to run DESeq or estimateSizeFactor first, see ?DESeq
>
> colData(dds)
> DataFrame with 4 rows and 2 columns
> condition patient
> <factor> <factor>
> abc.htseq.count Tumor A
> def.htseq.count Tumor B
> pqr.htseq.count Normal A
> xyz.htseq.countt Normal B
>
> Is the third column in Section 1.5 the output of
estimateSizeFactors( dds )?
Yes
>
> Thanks,
>
> - Pankaj
>
> -----Original Message-----
> From: Michael Love [mailto:michaelisaiahlove@gmail.com<mailto:michae lisaiahlove@gmail.com="">]
> Sent: Wednesday, April 16, 2014 4:37 PM
> To: Pankaj Agarwal
> Subject: Re: [BioC] DESeq2: Paired Test
>
> hi Pankaj,
>
> On Wed, Apr 16, 2014 at 3:59 PM, Pankaj Agarwal
<p.agarwal@duke.edu<mailto:p.agarwal@duke.edu>> wrote:
> > Hi Michael,
> >
> > I followed the vignette for the section on htseq-count and your
> > suggestion as best as I could and got some results.
> > But I am not sure if I did everything right, so I would really
> > appreciate if you could verify if the following steps is the
correct
> > way to analyze this data set:
> >
> > There are 4 samples for two patients:
> > Patient A: abc.htseq.count (Tumor), pqr.htseq.count (Normal)
Patient
> > B: def.htseq.count (Tumor), xyz.htseq.count (Normal)
> >
> >> sampleFiles=list.files(directory)
> >> sampleFiles
> > [1] "abc.htseq.count" "def.htseq.count"
> > [3] "pqr.htseq.count" "xyz.htseq.count"
> >>
> >> sampleCondition=c("Tumor","Tumor","Normal","Normal")
> >> samplePatient=c("A","B","A","B")
> >> sampleTable=data.frame(sampleName=sampleFiles,
fileName=sampleFiles,
> >> condition=sampleCondition, patient=samplePatient)
> >> dds=DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
> >> directory=directory, design= ~ patient + condition)
> >>
> >> colData(dds)
> > DataFrame with 4 rows and 2 columns
> > condition patient
> > <factor> <factor>
> > abc.htseq.count Tumor A
> > def.htseq.count Tumor B
> > pqr.htseq.count Normal A
> > xyz.htseq.countt Normal B
> >>
> >> dds$condition
> > [1] Tumor Tumor Normal Normal
> > Levels: Normal Tumor
> >>
> >> dds$patient
> > [1] A B A B
> > Levels: A B
> >>
> >> dds = DESeq(dds)
> > estimating size factors
> > estimating dispersions
> > gene-wise dispersion estimates
> > mean-dispersion relationship
> > final dispersion estimates
> > fitting model and testing
> >>
> >> result=results(dds)
> >>
> >> resOrdered2=result[order(result$padj),]
> >>
> >> head(resOrdered2,20)
> > DataFrame with 20 rows and 6 columns
> > baseMean log2FoldChange lfcSE stat pvalue
> > padj
> > <numeric> <numeric> <numeric> <numeric> <numeric>
> > <numeric>
> > GAL1 873531.57 -11.645190 0.7175710 -16.22863 3.164692e-59
> > 5.627771e-55
> > ...
> > ...
> >
>
> Your lines of code look correct.
>
> > I did not get the normalization factors in the output, whereas it
does
> > show up in the results in the vignette, so I am not sure if I
missed some step.
>
> I'm confused, which section of the vignette are you referencing?
>
> >
> > Also, would you expect to get meaningful results with just two
matched
> > pairs of samples with this analysis? I am wondering if it is
better
> > to just report the FC rather than the p-values.
>
> The analysis has 4 samples and 3 parameters to estimate (intercept,
patient, condition), so you have one residual degree of freedom. So
you can estimate the dispersion and the p-values should be meaningful.
>
> One drawback with having only two matched pairs is that there are
too few samples to look for outliers. DESeq2 looks for outliers when
there are at least 3 samples per condition (actually, per unique
combination of factor variables in the design). I would take a look at
the genes with the highest counts and see if these should be filtered
manually.
> For example, the first gene in your ordered results table looks to
me like it just contains an outliers count. However with no
replicates, you have to perform such filtering manually.
>
> Mike
>
> >
> > Thank you,
> >
> > - Pankaj
> >
> >
> > ________________________________
> > From: Michael Love
[michaelisaiahlove@gmail.com<mailto:michaelisaiahlove@gmail.com>]
> > Sent: Thursday, April 10, 2014 4:08 PM
> > To: Pankaj Agarwal
> > Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org>
> > Subject: Re: [BioC] DESeq2: Paired Test
> >
> > hi Pankaj,
> >
> > You should follow the multifactor analysis section of the
vignette.
> > You would include the patient effect in the design like so:
> >
> > design(dds) <- ~ patient + condition
> >
> > Where 'patient' and 'condition' are factors which are columns in
> > colData(dds), and condition has levels: normal, tumor (where
normal is
> > the base level).
> >
> > Then results(dds) will build a result table for tumor vs normal,
> > controlling for the patient effect.
> >
> > Mike
> >
> >
> > On Thu, Apr 10, 2014 at 3:52 PM, Pankaj Agarwal
<p.agarwal@duke.edu<mailto:p.agarwal@duke.edu>> wrote:
> >>
> >> Hi,
> >>
> >> This query is in reference to the following post
> >>
https://stat.ethz.ch/pipermail/bioconductor/2010-April/032869.html
> >> where it was mentioned that "DESeq does not support paired tests
(yet)".
> >> I have a need for performing rna-seq data analysis with matched
> >> tumor/normal pairs of data and was wondering if this capability
is
> >> now available in the latest version of DESeq2. I searched the
list
> >> site and also the latest vignette, but could not find anything
> >> indicating this capability is now available in DEseq. I would be
> >> grateful if, in case this feature is not yet available, someone
could
> >> advice me how to do the analysis. I have samples from two
patients, and from each patient there is a "normal" and a "tumor"
> >> rna-seq sample data.
> >>
> >> Thanks,
> >>
> >> - Pankaj
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
> >>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
[[alternative HTML version deleted]]