Search
Question: eayRNASeq with Ensemble GRCh37 help
0
gravatar for Aki Hoji
4.2 years ago by
Aki Hoji10
Aki Hoji10 wrote:
Hi, I've been trying to generate an output file for DESeq2 by easyRNASeq. An input file is a BAM generated by Tophat2/Bowtie2 with Ensemble GRCh37.72 which was a part of Illumina's iGenome package. I followed the overview and samples of easyRNASeq in a BioC mailing list and fired up a following; testcount<-easyRNASeq(filesDirectory=getwd(), organism="Hsapiens", chr.sizes="auto", readLength=100L, annotationMethod="gtf", annotationFile="Ensemble.gtf", count="exons", outputFormat="DESeq", filenames="4673Bsorted.bam") Then I got this error; Checking arguments... Fetching annotations... Read 2280612 records Error in easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : The number of conditions: 0 did not correspond to the number of samples: 1 In addition: Warning messages: 1: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : You enforce UCSC chromosome conventions, however the provided chromosome size list is not compliant. Correcting it. 2: In .Method(..., deparse.level = deparse.level) : number of columns of result is not a multiple of vector length (arg 1) 3: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : There are 966272 features/exons defined in your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want? 4: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it. As far as I can tell, I am not really enforcing the UCSC chromosome convention, and chr.sizes could be set to auto since the BAM file is used. I am getting stuck at this point and any help/pointer will be really appreciated. Thanks. AH > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] easyRNASeq_1.6.0 ShortRead_1.18.0 latticeExtra_0.6-26 [4] RColorBrewer_1.0-5 Rsamtools_1.12.4 DESeq_1.12.1 [7] lattice_0.20-23 locfit_1.5-9.1 BSgenome_1.28.0 [10] GenomicRanges_1.12.5 Biostrings_2.28.0 IRanges_1.18.3 [13] edgeR_3.2.4 limma_3.16.7 biomaRt_2.16.0 [16] Biobase_2.20.1 genomeIntervals_1.16.0 BiocGenerics_0.6.0 [19] intervals_0.14.0 BiocInstaller_1.10.3 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.6 bitops_1.0-6 [4] DBI_0.2-7 genefilter_1.42.0 geneplotter_1.38.0 [7] grid_3.0.1 hwriter_1.3 RCurl_1.95-4.1 [10] RSQLite_0.11.4 splines_3.0.1 stats4_3.0.1 [13] survival_2.37-4 tools_3.0.1 XML_3.95-0.2 [16] xtable_1.7-1 zlibbioc_1.6.0
ADD COMMENTlink modified 4.2 years ago by delhomme@embl.de1.2k • written 4.2 years ago by Aki Hoji10
0
gravatar for delhomme@embl.de
4.2 years ago by
delhomme@embl.de1.2k wrote:
Hej Aki Hoji! You can indeed ignore the warnings. The error is this: > The number of conditions: 0 did not correspond to the number of samples: 1 For using the DESeq output, you need to precise the conditions, see the ?easyRNASeq help page and the easyRNASeq and DESeq vignettes (e.g. vignette("easyRNASeq")) for more details on the arguments and how to use DESeq. Even if you provide a condition, easyRNASeq is bound to fail again as DESeq can't work with a single sample. Finally, note that easyRNASeq as of now only returns a DESeq and not DESeq2 output (i.e. a CountDataSet and not a SummarizedExperiment). This is planned for next release, planned early October. Best, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On 16 Sep 2013, at 20:17, Aki Hoji wrote: > Hi, > > I've been trying to generate an output file for DESeq2 by easyRNASeq. An input file is a BAM generated by Tophat2/Bowtie2 with Ensemble GRCh37.72 which was a part of Illumina's iGenome package. I followed the overview and samples of easyRNASeq in a BioC mailing list and fired up a following; > > testcount<-easyRNASeq(filesDirectory=getwd(), organism="Hsapiens", chr.sizes="auto", readLength=100L, annotationMethod="gtf", annotationFile="Ensemble.gtf", count="exons", outputFormat="DESeq", filenames="4673Bsorted.bam") > > Then I got this error; > > Checking arguments... > Fetching annotations... > Read 2280612 records > Error in easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : > The number of conditions: 0 did not correspond to the number of samples: 1 > In addition: Warning messages: > 1: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : > You enforce UCSC chromosome conventions, however the provided chromosome size list is not compliant. Correcting it. > 2: In .Method(..., deparse.level = deparse.level) : > number of columns of result is not a multiple of vector length (arg 1) > 3: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : > There are 966272 features/exons defined in your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want? > 4: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : > You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it. > > As far as I can tell, I am not really enforcing the UCSC chromosome convention, and chr.sizes could be set to auto since the BAM file is used. I am getting stuck at this point and any help/pointer will be really appreciated. > > Thanks. > > AH > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] easyRNASeq_1.6.0 ShortRead_1.18.0 latticeExtra_0.6-26 > [4] RColorBrewer_1.0-5 Rsamtools_1.12.4 DESeq_1.12.1 > [7] lattice_0.20-23 locfit_1.5-9.1 BSgenome_1.28.0 > [10] GenomicRanges_1.12.5 Biostrings_2.28.0 IRanges_1.18.3 > [13] edgeR_3.2.4 limma_3.16.7 biomaRt_2.16.0 > [16] Biobase_2.20.1 genomeIntervals_1.16.0 BiocGenerics_0.6.0 > [19] intervals_0.14.0 BiocInstaller_1.10.3 > > loaded via a namespace (and not attached): > [1] annotate_1.38.0 AnnotationDbi_1.22.6 bitops_1.0-6 > [4] DBI_0.2-7 genefilter_1.42.0 geneplotter_1.38.0 > [7] grid_3.0.1 hwriter_1.3 RCurl_1.95-4.1 > [10] RSQLite_0.11.4 splines_3.0.1 stats4_3.0.1 > [13] survival_2.37-4 tools_3.0.1 XML_3.95-0.2 > [16] xtable_1.7-1 zlibbioc_1.6.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 4.2 years ago by delhomme@embl.de1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 256 users visited in the last hour