Question

error in DESeq : analysis without replicates

0

Entering edit mode

Andreia Fonseca ▴ 810

@andreia-fonseca-3796

Last seen 7.2 years ago

Dear list, I am trying analysis without replicates for a large data set of small transcripts and I am getting the following error Error: condB %in% levels(conditions(cds)) is not TRUE the code that I have used is myvars_2<-c("transcript","T1","T2") first_T1_2<-first[myvars_2] countsTable_T1_2<-first_T1_2 countsTable_T1_2<-countsTable_T1_2[,-1] rownames(countsTable_T1_2)<-first_T1_2$transcript #script for DEA library(DESeq) #estimate dispersions conds <- c("NS","NS_HIV1") cds_T1_2<-newCountDataSet(countsTable_T1_2, conds) cds_T1_2<-estimateSizeFactors(cds_T1_2) sizeFactors(cds_T1_2) cds_T1_2<-estimateDispersions(cds_T1_2, method="blind",fitType="local",sharingMode="fit-only") the dataset has 3965346 transcripts. can someone help? thanks R version 2.15.0 (2012-03-30) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] DESeq_1.8.3 locfit_1.5-8 Biobase_2.16.0 BiocGenerics_0.2.0 [5] reshape_0.8.4 plyr_1.7.1 loaded via a namespace (and not attached): [1] annotate_1.34.1 AnnotationDbi_1.18.1 DBI_0.2-5 [4] genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.0 [7] IRanges_1.14.4 lattice_0.20-10 RColorBrewer_1.0-5 [10] RSQLite_0.11.1 splines_2.15.0 stats4_2.15.0 [13] survival_2.36-14 tools_2.15.0 XML_3.9-4 [16] xtable_1.7-0 -- ---------------------------------------------------------------------- ----------------------- Andreia J. Amaral, PhD BioFIG - Center for Biodiversity, Functional and Integrative Genomics Instituto de Medicina Molecular University of Lisbon Tel: +352 217500000 (ext. office: 28253) email:andreiaamaral@fm.ul.pt ; andreiaamaral@fc.ul.pt [[alternative HTML version deleted]]

• 1.1k views

ADD COMMENT • link updated 11.6 years ago by Wolfgang Huber ★ 13k • written 11.6 years ago by Andreia Fonseca ▴ 810

score 0 · Answer 1 · 2012-09-03

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 17 days ago

EMBL European Molecular Biology Laborat…

Dear Andreia, have a look at the error message that was produced, it might point you to the fact that the argument 'condB' that you provided to 'binomTest' does not match the condition names that you used for the cds_T1_2 object. It would be unusual if the results of such an analysis of these data were to make a lot of sense; if they do not, please do not blame the tool (DESeq) for it. Btw, please in the future provide a reproducible example (in below post, you did not tell us how you called the function 'nbinomTest', and the object 'first' was undefined). Best wishes Wolfgang Sep/3/12 5:22 PM, Andreia Fonseca scripsit: > Dear list, > > I am trying analysis without replicates for a large data set of small > transcripts and I am getting the following error > > Error: condB %in% levels(conditions(cds)) is not TRUE > > the code that I have used is > > myvars_2<-c("transcript","T1","T2") > first_T1_2<-first[myvars_2] > countsTable_T1_2<-first_T1_2 > countsTable_T1_2<-countsTable_T1_2[,-1] > rownames(countsTable_T1_2)<-first_T1_2$transcript > #script for DEA > library(DESeq) > > #estimate dispersions > > conds <- c("NS","NS_HIV1") > > cds_T1_2<-newCountDataSet(countsTable_T1_2, conds) > cds_T1_2<-estimateSizeFactors(cds_T1_2) > sizeFactors(cds_T1_2) > > cds_T1_2<-estimateDispersions(cds_T1_2, > method="blind",fitType="local",sharingMode="fit-only") > > the dataset has 3965346 transcripts. > > can someone help? > > thanks > > > R version 2.15.0 (2012-03-30) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] DESeq_1.8.3 locfit_1.5-8 Biobase_2.16.0 > BiocGenerics_0.2.0 > [5] reshape_0.8.4 plyr_1.7.1 > > loaded via a namespace (and not attached): > [1] annotate_1.34.1 AnnotationDbi_1.18.1 DBI_0.2-5 > [4] genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.0 > [7] IRanges_1.14.4 lattice_0.20-10 RColorBrewer_1.0-5 > [10] RSQLite_0.11.1 splines_2.15.0 stats4_2.15.0 > [13] survival_2.36-14 tools_2.15.0 XML_3.9-4 > [16] xtable_1.7-0 > > -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 11.6 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear Wolfgang, thanks for the reply. In the meanwhile I solved the problem by eliminating the transcripts that had <2000 reads in one of the conditions. Then it worked without any problems. Kind regards, Andreia PS: why would I blame the package? I must say that we have been having many reproducible results from analysis without replicates. And I am satisfied with it. Of course there can be a lot of false positives in transcripts with low read counts and small fold changes....and I am aware of that.... On Mon, Sep 3, 2012 at 9:46 PM, Wolfgang Huber <whuber@embl.de> wrote: > Dear Andreia, > > have a look at the error message that was produced, it might point you to > the fact that the argument 'condB' that you provided to 'binomTest' does > not match the condition names that you used for the cds_T1_2 object. > > It would be unusual if the results of such an analysis of these data were > to make a lot of sense; if they do not, please do not blame the tool > (DESeq) for it. > > Btw, please in the future provide a reproducible example (in below post, > you did not tell us how you called the function 'nbinomTest', and the > object 'first' was undefined). > > Best wishes > Wolfgang > > > > > Sep/3/12 5:22 PM, Andreia Fonseca scripsit: > > Dear list, >> >> I am trying analysis without replicates for a large data set of small >> transcripts and I am getting the following error >> >> Error: condB %in% levels(conditions(cds)) is not TRUE >> >> the code that I have used is >> >> myvars_2<-c("transcript","T1",**"T2") >> first_T1_2<-first[myvars_2] >> countsTable_T1_2<-first_T1_2 >> countsTable_T1_2<-countsTable_**T1_2[,-1] >> rownames(countsTable_T1_2)<-**first_T1_2$transcript >> #script for DEA >> library(DESeq) >> >> #estimate dispersions >> >> conds <- c("NS","NS_HIV1") >> >> cds_T1_2<-newCountDataSet(**countsTable_T1_2, conds) >> cds_T1_2<-estimateSizeFactors(**cds_T1_2) >> sizeFactors(cds_T1_2) >> >> cds_T1_2<-estimateDispersions(**cds_T1_2, >> method="blind",fitType="local"**,sharingMode="fit-only") >> >> the dataset has 3965346 transcripts. >> >> can someone help? >> >> thanks >> >> >> R version 2.15.0 (2012-03-30) >> Platform: x86_64-redhat-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] DESeq_1.8.3 locfit_1.5-8 Biobase_2.16.0 >> BiocGenerics_0.2.0 >> [5] reshape_0.8.4 plyr_1.7.1 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.34.1 AnnotationDbi_1.18.1 DBI_0.2-5 >> [4] genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.0 >> [7] IRanges_1.14.4 lattice_0.20-10 RColorBrewer_1.0-5 >> [10] RSQLite_0.11.1 splines_2.15.0 stats4_2.15.0 >> [13] survival_2.36-14 tools_2.15.0 XML_3.9-4 >> [16] xtable_1.7-0 >> >> >> > > -- > Best wishes > Wolfgang > > Wolfgang Huber > EMBL > http://www.embl.de/research/**units/genome_biology/huber<http: www.="" embl.de="" research="" units="" genome_biology="" huber=""> > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > -- ---------------------------------------------------------------------- ----------------------- Andreia J. Amaral, PhD BioFIG - Center for Biodiversity, Functional and Integrative Genomics Instituto de Medicina Molecular University of Lisbon Tel: +352 217500000 (ext. office: 28253) email:andreiaamaral@fm.ul.pt ; andreiaamaral@fc.ul.pt [[alternative HTML version deleted]]

ADD REPLY • link 11.6 years ago Andreia Fonseca ▴ 810

0

Entering edit mode

Dear Andreia I am glad to hear it worked out. What I meant to say, please do not take the p-values serious that you get from this analysis. For ranking, it may be useful. (Although one would not really need DESeq for that.) Good luck with the follow-up analysis - Wolfgang Sep/3/12 11:54 PM, Andreia Fonseca scripsit: > Dear Wolfgang, > > thanks for the reply. In the meanwhile I solved the problem by > eliminating the transcripts that had <2000 reads in one of the > conditions. Then it worked without any problems. > > Kind regards, > > Andreia > > PS: why would I blame the package? I must say that we have been having > many reproducible results from analysis without replicates. And I am > satisfied with it. > Of course there can be a lot of false positives in transcripts with low > read counts and small fold changes....and I am aware of that.... > > On Mon, Sep 3, 2012 at 9:46 PM, Wolfgang Huber <whuber at="" embl.de=""> <mailto:whuber at="" embl.de="">> wrote: > > Dear Andreia, > > have a look at the error message that was produced, it might point > you to the fact that the argument 'condB' that you provided to > 'binomTest' does not match the condition names that you used for the > cds_T1_2 object. > > It would be unusual if the results of such an analysis of these data > were to make a lot of sense; if they do not, please do not blame the > tool (DESeq) for it. > > Btw, please in the future provide a reproducible example (in below > post, you did not tell us how you called the function 'nbinomTest', > and the object 'first' was undefined). > > Best wishes > Wolfgang > > > > > Sep/3/12 5:22 PM, Andreia Fonseca scripsit: > > Dear list, > > I am trying analysis without replicates for a large data set of > small > transcripts and I am getting the following error > > Error: condB %in% levels(conditions(cds)) is not TRUE > > the code that I have used is > > myvars_2<-c("transcript","T1",__"T2") > first_T1_2<-first[myvars_2] > countsTable_T1_2<-first_T1_2 > countsTable_T1_2<-countsTable___T1_2[,-1] > rownames(countsTable_T1_2)<-__first_T1_2$transcript > #script for DEA > library(DESeq) > > #estimate dispersions > > conds <- c("NS","NS_HIV1") > > cds_T1_2<-newCountDataSet(__countsTable_T1_2, conds) > cds_T1_2<-estimateSizeFactors(__cds_T1_2) > sizeFactors(cds_T1_2) > > cds_T1_2<-estimateDispersions(__cds_T1_2, > method="blind",fitType="local"__,sharingMode="fit-only") > > the dataset has 3965346 transcripts. > > can someone help? > > thanks > > > R version 2.15.0 (2012-03-30) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] DESeq_1.8.3 locfit_1.5-8 Biobase_2.16.0 > BiocGenerics_0.2.0 > [5] reshape_0.8.4 plyr_1.7.1 > > loaded via a namespace (and not attached): > [1] annotate_1.34.1 AnnotationDbi_1.18.1 DBI_0.2-5 > [4] genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.0 > [7] IRanges_1.14.4 lattice_0.20-10 RColorBrewer_1.0-5 > [10] RSQLite_0.11.1 splines_2.15.0 stats4_2.15.0 > [13] survival_2.36-14 tools_2.15.0 XML_3.9-4 > [16] xtable_1.7-0 > > > > > -- > Best wishes > Wolfgang > > Wolfgang Huber > EMBL > http://www.embl.de/research/__units/genome_biology/huber > <http: www.embl.de="" research="" units="" genome_biology="" huber=""> > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > > > -- > -------------------------------------------------------------------- ------------------------- > Andreia J. Amaral, PhD > BioFIG - Center for Biodiversity, Functional and Integrative Genomics > Instituto de Medicina Molecular > University of Lisbon > Tel: +352 217500000 (ext. office: 28253) > email:andreiaamaral at fm.ul.pt <mailto:email%3aandreiaamaral at="" fm.ul.pt=""> ; > andreiaamaral at fc.ul.pt <mailto:andreiaamaral at="" fc.ul.pt=""> > -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD REPLY • link 11.6 years ago Wolfgang Huber ★ 13k