Hello,
I have constructed the following dataset for analysis using DESeq2:
class: DESeqDataSet
dim: 57396 10
exptData(0):
assays(1): counts
rownames(57396): ENSG00000223972 ENSG00000227232 ... ENSG00000210195
ENSG00000210196
rowData metadata column names(0):
colnames(10): 1 2 ... 10 11
colData names(1): condition
> colData(ddsHTSeq)
DataFrame with 10 rows and 1 column
condition
<factor>
1 na
2 na
3 Resistant
4 na
5 Resistant
6 Resistant
7 na
8 na
10 Sensitive
11 Sensitive
I am interested in the differential expression between the drug
resistant and sensitive samples ('na' are control samples).
I've clustered the samples and plotted a PCA as described in the
vignette. However, in each of these plots the samples do not cluster
by their drug sensitivity but are distributed across the plot. I
don't have any more information about the samples with which to model
any potential covariates.
I was wondering if there were any pointers as to how I could extract
some useful meanings from these data please? As might be expected,
when I try a DESeq on these data I get no significant p-values.
Thanks in advance,
Dave
-- output of sessionInfo():
R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] pasilla_0.4.0 matrixStats_0.8.14 gplots_2.13.0
[4] vsn_3.32.0 Biobase_2.24.0 DESeq2_1.4.5
[7] RcppArmadillo_0.4.300.0 Rcpp_0.11.1
GenomicRanges_1.16.3
[10] GenomeInfoDb_1.0.2 IRanges_1.22.7
BiocGenerics_0.10.0
loaded via a namespace (and not attached):
[1] affy_1.42.2 affyio_1.32.0 annotate_1.42.0
[4] AnnotationDbi_1.26.0 BiocInstaller_1.14.2 bitops_1.0-6
[7] caTools_1.17 DBI_0.2-7 DESeq_1.16.0
[10] gdata_2.13.3 genefilter_1.46.1 geneplotter_1.42.0
[13] grid_3.1.0 gtools_3.4.0 KernSmooth_2.23-12
[16] lattice_0.20-29 limma_3.20.4 locfit_1.5-9.1
[19] preprocessCore_1.26.1 RColorBrewer_1.0-5 R.methodsS3_1.6.1
[22] RSQLite_0.11.4 splines_3.1.0 stats4_3.1.0
[25] survival_2.37-7 tcltk_3.1.0 tools_3.1.0
[28] XML_3.98-1.1 xtable_1.7-3 XVector_0.4.0
[31] zlibbioc_1.10.0
--
Sent via the guest posting facility at bioconductor.org.
hi Dave,
If you don't find a set of genes with low FDR, then the experiment
could
have been underpowered to find the small differences, i.e. not enough
sample size.
Did you compare sensitive vs resistant using the contrast argument to
results()? The default comparison is the last level of the first level
of
the last variable in the design, but there are three possible pairs of
the
three groups.
Mike
On Fri, Jun 27, 2014 at 9:27 AM, Dave Wettmann [guest] <
guest@bioconductor.org> wrote:
> Hello,
>
> I have constructed the following dataset for analysis using DESeq2:
>
> class: DESeqDataSet
> dim: 57396 10
> exptData(0):
> assays(1): counts
> rownames(57396): ENSG00000223972 ENSG00000227232 ... ENSG00000210195
> ENSG00000210196
> rowData metadata column names(0):
> colnames(10): 1 2 ... 10 11
> colData names(1): condition
>
>
> > colData(ddsHTSeq)
> DataFrame with 10 rows and 1 column
> condition
> <factor>
> 1 na
> 2 na
> 3 Resistant
> 4 na
> 5 Resistant
> 6 Resistant
> 7 na
> 8 na
> 10 Sensitive
> 11 Sensitive
>
> I am interested in the differential expression between the drug
resistant
> and sensitive samples ('na' are control samples).
> I've clustered the samples and plotted a PCA as described in the
vignette.
> However, in each of these plots the samples do not cluster by their
drug
> sensitivity but are distributed across the plot. I don't have any
more
> information about the samples with which to model any potential
covariates.
> I was wondering if there were any pointers as to how I could extract
some
> useful meanings from these data please? As might be expected, when
I try a
> DESeq on these data I get no significant p-values.
>
> Thanks in advance,
> Dave
>
> -- output of sessionInfo():
>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods
> [8] base
>
> other attached packages:
> [1] pasilla_0.4.0 matrixStats_0.8.14 gplots_2.13.0
> [4] vsn_3.32.0 Biobase_2.24.0 DESeq2_1.4.5
> [7] RcppArmadillo_0.4.300.0 Rcpp_0.11.1
GenomicRanges_1.16.3
> [10] GenomeInfoDb_1.0.2 IRanges_1.22.7
BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
> [1] affy_1.42.2 affyio_1.32.0 annotate_1.42.0
> [4] AnnotationDbi_1.26.0 BiocInstaller_1.14.2 bitops_1.0-6
> [7] caTools_1.17 DBI_0.2-7 DESeq_1.16.0
> [10] gdata_2.13.3 genefilter_1.46.1 geneplotter_1.42.0
> [13] grid_3.1.0 gtools_3.4.0 KernSmooth_2.23-12
> [16] lattice_0.20-29 limma_3.20.4 locfit_1.5-9.1
> [19] preprocessCore_1.26.1 RColorBrewer_1.0-5 R.methodsS3_1.6.1
> [22] RSQLite_0.11.4 splines_3.1.0 stats4_3.1.0
> [25] survival_2.37-7 tcltk_3.1.0 tools_3.1.0
> [28] XML_3.98-1.1 xtable_1.7-3 XVector_0.4.0
> [31] zlibbioc_1.10.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
[[alternative HTML version deleted]]
Thanks Mike - yes I used the sensitive vs resistant contrast argument
to
results()
On 27 June 2014 14:39, Michael Love <michaelisaiahlove@gmail.com>
wrote:
> hi Dave,
>
> If you don't find a set of genes with low FDR, then the experiment
could
> have been underpowered to find the small differences, i.e. not
enough
> sample size.
>
> Did you compare sensitive vs resistant using the contrast argument
to
> results()? The default comparison is the last level of the first
level of
> the last variable in the design, but there are three possible pairs
of the
> three groups.
>
> Mike
>
>
> On Fri, Jun 27, 2014 at 9:27 AM, Dave Wettmann [guest] <
> guest@bioconductor.org> wrote:
>
>> Hello,
>>
>> I have constructed the following dataset for analysis using DESeq2:
>>
>> class: DESeqDataSet
>> dim: 57396 10
>> exptData(0):
>> assays(1): counts
>> rownames(57396): ENSG00000223972 ENSG00000227232 ...
ENSG00000210195
>> ENSG00000210196
>> rowData metadata column names(0):
>> colnames(10): 1 2 ... 10 11
>> colData names(1): condition
>>
>>
>> > colData(ddsHTSeq)
>> DataFrame with 10 rows and 1 column
>> condition
>> <factor>
>> 1 na
>> 2 na
>> 3 Resistant
>> 4 na
>> 5 Resistant
>> 6 Resistant
>> 7 na
>> 8 na
>> 10 Sensitive
>> 11 Sensitive
>>
>> I am interested in the differential expression between the drug
resistant
>> and sensitive samples ('na' are control samples).
>> I've clustered the samples and plotted a PCA as described in the
>> vignette. However, in each of these plots the samples do not
cluster by
>> their drug sensitivity but are distributed across the plot. I
don't have
>> any more information about the samples with which to model any
potential
>> covariates.
>> I was wondering if there were any pointers as to how I could
extract some
>> useful meanings from these data please? As might be expected, when
I try a
>> DESeq on these data I get no significant p-values.
>>
>> Thanks in advance,
>> Dave
>>
>> -- output of sessionInfo():
>>
>> R version 3.1.0 (2014-04-10)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
methods
>> [8] base
>>
>> other attached packages:
>> [1] pasilla_0.4.0 matrixStats_0.8.14 gplots_2.13.0
>> [4] vsn_3.32.0 Biobase_2.24.0 DESeq2_1.4.5
>> [7] RcppArmadillo_0.4.300.0 Rcpp_0.11.1
GenomicRanges_1.16.3
>> [10] GenomeInfoDb_1.0.2 IRanges_1.22.7
BiocGenerics_0.10.0
>>
>> loaded via a namespace (and not attached):
>> [1] affy_1.42.2 affyio_1.32.0 annotate_1.42.0
>> [4] AnnotationDbi_1.26.0 BiocInstaller_1.14.2 bitops_1.0-6
>> [7] caTools_1.17 DBI_0.2-7 DESeq_1.16.0
>> [10] gdata_2.13.3 genefilter_1.46.1 geneplotter_1.42.0
>> [13] grid_3.1.0 gtools_3.4.0 KernSmooth_2.23-12
>> [16] lattice_0.20-29 limma_3.20.4 locfit_1.5-9.1
>> [19] preprocessCore_1.26.1 RColorBrewer_1.0-5 R.methodsS3_1.6.1
>> [22] RSQLite_0.11.4 splines_3.1.0 stats4_3.1.0
>> [25] survival_2.37-7 tcltk_3.1.0 tools_3.1.0
>> [28] XML_3.98-1.1 xtable_1.7-3 XVector_0.4.0
>> [31] zlibbioc_1.10.0
>>
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>
>
[[alternative HTML version deleted]]
On Fri, Jun 27, 2014 at 9:27 AM, Dave Wettmann [guest] <
guest@bioconductor.org> wrote:
> Hello,
>
> I have constructed the following dataset for analysis using DESeq2:
>
> class: DESeqDataSet
> dim: 57396 10
> exptData(0):
> assays(1): counts
> rownames(57396): ENSG00000223972 ENSG00000227232 ... ENSG00000210195
> ENSG00000210196
> rowData metadata column names(0):
> colnames(10): 1 2 ... 10 11
> colData names(1): condition
>
>
> > colData(ddsHTSeq)
> DataFrame with 10 rows and 1 column
> condition
> <factor>
> 1 na
> 2 na
> 3 Resistant
> 4 na
> 5 Resistant
> 6 Resistant
> 7 na
> 8 na
> 10 Sensitive
> 11 Sensitive
>
> I am interested in the differential expression between the drug
resistant
> and sensitive samples ('na' are control samples).
> I've clustered the samples and plotted a PCA as described in the
vignette.
> However, in each of these plots the samples do not cluster by their
drug
> sensitivity but are distributed across the plot. I don't have any
more
> information about the samples with which to model any potential
covariates.
> I was wondering if there were any pointers as to how I could extract
some
> useful meanings from these data please? As might be expected, when
I try a
> DESeq on these data I get no significant p-values.
>
Hi, Dave.
With an n of only 5, you might simply be underpowered to find
significant
genes, so increasing your sample size might be warranted. You could
try
using gene set analysis to look for coordinately regulated sets of
genes,
each with small effects. Alternatively, you could use the p-values
for
ranking the genes and try to validate a few genes of interest on a
larger
set of samples using pcr or some other technology.
Sean
>
> Thanks in advance,
> Dave
>
> -- output of sessionInfo():
>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods
> [8] base
>
> other attached packages:
> [1] pasilla_0.4.0 matrixStats_0.8.14 gplots_2.13.0
> [4] vsn_3.32.0 Biobase_2.24.0 DESeq2_1.4.5
> [7] RcppArmadillo_0.4.300.0 Rcpp_0.11.1
GenomicRanges_1.16.3
> [10] GenomeInfoDb_1.0.2 IRanges_1.22.7
BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
> [1] affy_1.42.2 affyio_1.32.0 annotate_1.42.0
> [4] AnnotationDbi_1.26.0 BiocInstaller_1.14.2 bitops_1.0-6
> [7] caTools_1.17 DBI_0.2-7 DESeq_1.16.0
> [10] gdata_2.13.3 genefilter_1.46.1 geneplotter_1.42.0
> [13] grid_3.1.0 gtools_3.4.0 KernSmooth_2.23-12
> [16] lattice_0.20-29 limma_3.20.4 locfit_1.5-9.1
> [19] preprocessCore_1.26.1 RColorBrewer_1.0-5 R.methodsS3_1.6.1
> [22] RSQLite_0.11.4 splines_3.1.0 stats4_3.1.0
> [25] survival_2.37-7 tcltk_3.1.0 tools_3.1.0
> [28] XML_3.98-1.1 xtable_1.7-3 XVector_0.4.0
> [31] zlibbioc_1.10.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
Hi Dave,
If in your PCA your samples do not cluster by treatment, you likely
have
some sort of unwanted variation or batch effect masking the effect of
the
treatment in your data. I am not sure more samples will help.
Have you taken a look at the PC loadings past 1 and 2 to see if there
is
any PC that captures your treatment? do you have any positive
controls? are
you sure your treatment actually causes measurable differences in gene
expression?
The only think I believe will help is RUVSeq:
http://www.bioconductor.org/packages/devel/bioc/html/RUVSeq.html
Lucia
On Fri, Jun 27, 2014 at 9:27 AM, Dave Wettmann [guest] <
guest@bioconductor.org> wrote:
> Hello,
>
> I have constructed the following dataset for analysis using DESeq2:
>
> class: DESeqDataSet
> dim: 57396 10
> exptData(0):
> assays(1): counts
> rownames(57396): ENSG00000223972 ENSG00000227232 ... ENSG00000210195
> ENSG00000210196
> rowData metadata column names(0):
> colnames(10): 1 2 ... 10 11
> colData names(1): condition
>
>
> > colData(ddsHTSeq)
> DataFrame with 10 rows and 1 column
> condition
> <factor>
> 1 na
> 2 na
> 3 Resistant
> 4 na
> 5 Resistant
> 6 Resistant
> 7 na
> 8 na
> 10 Sensitive
> 11 Sensitive
>
> I am interested in the differential expression between the drug
resistant
> and sensitive samples ('na' are control samples).
> I've clustered the samples and plotted a PCA as described in the
vignette.
> However, in each of these plots the samples do not cluster by their
drug
> sensitivity but are distributed across the plot. I don't have any
more
> information about the samples with which to model any potential
covariates.
> I was wondering if there were any pointers as to how I could extract
some
> useful meanings from these data please? As might be expected, when
I try a
> DESeq on these data I get no significant p-values.
>
> Thanks in advance,
> Dave
>
> -- output of sessionInfo():
>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods
> [8] base
>
> other attached packages:
> [1] pasilla_0.4.0 matrixStats_0.8.14 gplots_2.13.0
> [4] vsn_3.32.0 Biobase_2.24.0 DESeq2_1.4.5
> [7] RcppArmadillo_0.4.300.0 Rcpp_0.11.1
GenomicRanges_1.16.3
> [10] GenomeInfoDb_1.0.2 IRanges_1.22.7
BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
> [1] affy_1.42.2 affyio_1.32.0 annotate_1.42.0
> [4] AnnotationDbi_1.26.0 BiocInstaller_1.14.2 bitops_1.0-6
> [7] caTools_1.17 DBI_0.2-7 DESeq_1.16.0
> [10] gdata_2.13.3 genefilter_1.46.1 geneplotter_1.42.0
> [13] grid_3.1.0 gtools_3.4.0 KernSmooth_2.23-12
> [16] lattice_0.20-29 limma_3.20.4 locfit_1.5-9.1
> [19] preprocessCore_1.26.1 RColorBrewer_1.0-5 R.methodsS3_1.6.1
> [22] RSQLite_0.11.4 splines_3.1.0 stats4_3.1.0
> [25] survival_2.37-7 tcltk_3.1.0 tools_3.1.0
> [28] XML_3.98-1.1 xtable_1.7-3 XVector_0.4.0
> [31] zlibbioc_1.10.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Lucia Peixoto PhD
Postdoctoral Research Fellow
Laboratory of Dr. Ted Abel
Department of Biology
School of Arts and Sciences
University of Pennsylvania
"Think boldly, don't be afraid of making mistakes, don't miss small
details, keep your eyes open, and be modest in everything except your
aims."
Albert Szent-Gyorgyi
[[alternative HTML version deleted]]
Hi,
On Fri, Jun 27, 2014 at 7:06 AM, Lucia Peixoto <luciap at="" iscb.org="">
wrote:
> Hi Dave,
>
> If in your PCA your samples do not cluster by treatment, you likely
have
> some sort of unwanted variation or batch effect masking the effect
of the
> treatment in your data. I am not sure more samples will help.
> Have you taken a look at the PC loadings past 1 and 2 to see if
there is
> any PC that captures your treatment? do you have any positive
controls? are
> you sure your treatment actually causes measurable differences in
gene
> expression?
>
> The only think I believe will help is RUVSeq:
>
> http://www.bioconductor.org/packages/devel/bioc/html/RUVSeq.html
Not the only thing ... this is slightly different, but also something
to keep an eye on "in this context" (ie. removing nuisance effects):
svaseq: removing batch effects and other unwanted noise from
sequencing data
http://biorxiv.org/content/early/2014/06/25/006585
Thank you for bringing my attention to RUVSeq, though, as I haven't
seen it before.
HTH,
-steve
--
Steve Lianoglou
Computational Biologist
Genentech
Thanks Lucia; I've checked that PCs 1 and 2 capture 55% and 15% of the
total variance, respectively. Could you explain how, if I did find
that
the treatment effect was present in another PC, that would help me
please?
I don't have any positive control because it's an experiment to
characterise a response to a drug treatment.
Thanks,
Dave
On 27 June 2014 15:06, Lucia Peixoto <luciap@iscb.org> wrote:
> Hi Dave,
>
> If in your PCA your samples do not cluster by treatment, you likely
have
> some sort of unwanted variation or batch effect masking the effect
of the
> treatment in your data. I am not sure more samples will help.
> Have you taken a look at the PC loadings past 1 and 2 to see if
there is
> any PC that captures your treatment? do you have any positive
controls? are
> you sure your treatment actually causes measurable differences in
gene
> expression?
>
> The only think I believe will help is RUVSeq:
>
> http://www.bioconductor.org/packages/devel/bioc/html/RUVSeq.html
>
> Lucia
>
>
> On Fri, Jun 27, 2014 at 9:27 AM, Dave Wettmann [guest] <
> guest@bioconductor.org> wrote:
>
>> Hello,
>>
>> I have constructed the following dataset for analysis using DESeq2:
>>
>> class: DESeqDataSet
>> dim: 57396 10
>> exptData(0):
>> assays(1): counts
>> rownames(57396): ENSG00000223972 ENSG00000227232 ...
ENSG00000210195
>> ENSG00000210196
>> rowData metadata column names(0):
>> colnames(10): 1 2 ... 10 11
>> colData names(1): condition
>>
>>
>> > colData(ddsHTSeq)
>> DataFrame with 10 rows and 1 column
>> condition
>> <factor>
>> 1 na
>> 2 na
>> 3 Resistant
>> 4 na
>> 5 Resistant
>> 6 Resistant
>> 7 na
>> 8 na
>> 10 Sensitive
>> 11 Sensitive
>>
>> I am interested in the differential expression between the drug
resistant
>> and sensitive samples ('na' are control samples).
>> I've clustered the samples and plotted a PCA as described in the
>> vignette. However, in each of these plots the samples do not
cluster by
>> their drug sensitivity but are distributed across the plot. I
don't have
>> any more information about the samples with which to model any
potential
>> covariates.
>> I was wondering if there were any pointers as to how I could
extract some
>> useful meanings from these data please? As might be expected, when
I try a
>> DESeq on these data I get no significant p-values.
>>
>> Thanks in advance,
>> Dave
>>
>> -- output of sessionInfo():
>>
>> R version 3.1.0 (2014-04-10)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
methods
>> [8] base
>>
>> other attached packages:
>> [1] pasilla_0.4.0 matrixStats_0.8.14 gplots_2.13.0
>> [4] vsn_3.32.0 Biobase_2.24.0 DESeq2_1.4.5
>> [7] RcppArmadillo_0.4.300.0 Rcpp_0.11.1
GenomicRanges_1.16.3
>> [10] GenomeInfoDb_1.0.2 IRanges_1.22.7
BiocGenerics_0.10.0
>>
>> loaded via a namespace (and not attached):
>> [1] affy_1.42.2 affyio_1.32.0 annotate_1.42.0
>> [4] AnnotationDbi_1.26.0 BiocInstaller_1.14.2 bitops_1.0-6
>> [7] caTools_1.17 DBI_0.2-7 DESeq_1.16.0
>> [10] gdata_2.13.3 genefilter_1.46.1 geneplotter_1.42.0
>> [13] grid_3.1.0 gtools_3.4.0 KernSmooth_2.23-12
>> [16] lattice_0.20-29 limma_3.20.4 locfit_1.5-9.1
>> [19] preprocessCore_1.26.1 RColorBrewer_1.0-5 R.methodsS3_1.6.1
>> [22] RSQLite_0.11.4 splines_3.1.0 stats4_3.1.0
>> [25] survival_2.37-7 tcltk_3.1.0 tools_3.1.0
>> [28] XML_3.98-1.1 xtable_1.7-3 XVector_0.4.0
>> [31] zlibbioc_1.10.0
>>
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> --
> Lucia Peixoto PhD
> Postdoctoral Research Fellow
> Laboratory of Dr. Ted Abel
> Department of Biology
> School of Arts and Sciences
> University of Pennsylvania
>
> "Think boldly, don't be afraid of making mistakes, don't miss small
> details, keep your eyes open, and be modest in everything except
your
> aims."
> Albert Szent-Gyorgyi
>
[[alternative HTML version deleted]]
Hi Dave,
I assume you have only plotted PC1 vs PC2, you can do the same type of
plots PC1 vs PC3, PC1 vs PC4 and so on...., to see if any PC captures
the
grouping by treatment
This is regardless of how much variance each PC explains. I usually
don't
use not DESeq to do the PCA plots, so I am not sure how you will do
this
within DESeq
I do understand that you are "characterizing" a response to a drug,
but
your underlying assumption is that part of that response is
differences in
gene expression that can be observed at the time point your are
measuring.
It could simply be that the differences between being drug resistant
and
sensitive have nothing to do with gene expression differences at the
steady
state, and that's why you don't get any significant p-values. Positive
controls assure you that there are differences you can measure. Have
you
plotted the p-value distribution?
you can find how to do it in the Nature protocols tutorial:
http://www.nature.com/nprot/journal/v8/n9/full/nprot.2013.099.html
Lucia
On Fri, Jun 27, 2014 at 11:41 AM, Dave Wettmann
<david.wettmann@gmail.com>
wrote:
> Thanks Lucia; I've checked that PCs 1 and 2 capture 55% and 15% of
the
> total variance, respectively. Could you explain how, if I did find
that
> the treatment effect was present in another PC, that would help me
please?
> I don't have any positive control because it's an experiment to
> characterise a response to a drug treatment.
> Thanks,
> Dave
>
>
> On 27 June 2014 15:06, Lucia Peixoto <luciap@iscb.org> wrote:
>
>> Hi Dave,
>>
>> If in your PCA your samples do not cluster by treatment, you likely
have
>> some sort of unwanted variation or batch effect masking the effect
of the
>> treatment in your data. I am not sure more samples will help.
>> Have you taken a look at the PC loadings past 1 and 2 to see if
there is
>> any PC that captures your treatment? do you have any positive
controls? are
>> you sure your treatment actually causes measurable differences in
gene
>> expression?
>>
>> The only think I believe will help is RUVSeq:
>>
>> http://www.bioconductor.org/packages/devel/bioc/html/RUVSeq.html
>>
>> Lucia
>>
>>
>> On Fri, Jun 27, 2014 at 9:27 AM, Dave Wettmann [guest] <
>> guest@bioconductor.org> wrote:
>>
>>> Hello,
>>>
>>> I have constructed the following dataset for analysis using
DESeq2:
>>>
>>> class: DESeqDataSet
>>> dim: 57396 10
>>> exptData(0):
>>> assays(1): counts
>>> rownames(57396): ENSG00000223972 ENSG00000227232 ...
ENSG00000210195
>>> ENSG00000210196
>>> rowData metadata column names(0):
>>> colnames(10): 1 2 ... 10 11
>>> colData names(1): condition
>>>
>>>
>>> > colData(ddsHTSeq)
>>> DataFrame with 10 rows and 1 column
>>> condition
>>> <factor>
>>> 1 na
>>> 2 na
>>> 3 Resistant
>>> 4 na
>>> 5 Resistant
>>> 6 Resistant
>>> 7 na
>>> 8 na
>>> 10 Sensitive
>>> 11 Sensitive
>>>
>>> I am interested in the differential expression between the drug
>>> resistant and sensitive samples ('na' are control samples).
>>> I've clustered the samples and plotted a PCA as described in the
>>> vignette. However, in each of these plots the samples do not
cluster by
>>> their drug sensitivity but are distributed across the plot. I
don't have
>>> any more information about the samples with which to model any
potential
>>> covariates.
>>> I was wondering if there were any pointers as to how I could
extract
>>> some useful meanings from these data please? As might be
expected, when I
>>> try a DESeq on these data I get no significant p-values.
>>>
>>> Thanks in advance,
>>> Dave
>>>
>>> -- output of sessionInfo():
>>>
>>> R version 3.1.0 (2014-04-10)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel stats graphics grDevices utils datasets
methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] pasilla_0.4.0 matrixStats_0.8.14 gplots_2.13.0
>>> [4] vsn_3.32.0 Biobase_2.24.0 DESeq2_1.4.5
>>> [7] RcppArmadillo_0.4.300.0 Rcpp_0.11.1
GenomicRanges_1.16.3
>>> [10] GenomeInfoDb_1.0.2 IRanges_1.22.7
BiocGenerics_0.10.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affy_1.42.2 affyio_1.32.0 annotate_1.42.0
>>> [4] AnnotationDbi_1.26.0 BiocInstaller_1.14.2 bitops_1.0-6
>>> [7] caTools_1.17 DBI_0.2-7 DESeq_1.16.0
>>> [10] gdata_2.13.3 genefilter_1.46.1
geneplotter_1.42.0
>>> [13] grid_3.1.0 gtools_3.4.0
KernSmooth_2.23-12
>>> [16] lattice_0.20-29 limma_3.20.4 locfit_1.5-9.1
>>> [19] preprocessCore_1.26.1 RColorBrewer_1.0-5 R.methodsS3_1.6.1
>>> [22] RSQLite_0.11.4 splines_3.1.0 stats4_3.1.0
>>> [25] survival_2.37-7 tcltk_3.1.0 tools_3.1.0
>>> [28] XML_3.98-1.1 xtable_1.7-3 XVector_0.4.0
>>> [31] zlibbioc_1.10.0
>>>
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>>
>> --
>> Lucia Peixoto PhD
>> Postdoctoral Research Fellow
>> Laboratory of Dr. Ted Abel
>> Department of Biology
>> School of Arts and Sciences
>> University of Pennsylvania
>>
>> "Think boldly, don't be afraid of making mistakes, don't miss small
>> details, keep your eyes open, and be modest in everything except
your
>> aims."
>> Albert Szent-Gyorgyi
>>
>
>
--
Lucia Peixoto PhD
Postdoctoral Research Fellow
Laboratory of Dr. Ted Abel
Department of Biology
School of Arts and Sciences
University of Pennsylvania
"Think boldly, don't be afraid of making mistakes, don't miss small
details, keep your eyes open, and be modest in everything except your
aims."
Albert Szent-Gyorgyi
[[alternative HTML version deleted]]