reads of different lengths (easyRNASeq)
0
0
Entering edit mode
@delhommeemblde-3232
Last seen 9.6 years ago
Dear Mark, I'm cc'ing the Bioconductor mailing list in case it helps others. I've implemented the support for variable read length (version 1.3.7 of easyRNASeq, available in SVN now and in a couple of days from Bioc). Thanks for your reproducible use case. Below is the code I ran: library(easyRNASeq) counts <- easyRNASeq(organism="Dmelanogaster", annotationMethod="gtf", annotationFile="Drosophila_melanogaster.BDGP5.67.gtf", gapped=TRUE, count="genes", summarization="geneModels", pattern=".subread_s.bam$", filesDirectory=".") It took 4 minutes on my machine. Having variable read length does affect the performance but I'm working on that. In the command line, note that I removed the argument 'format' as bam is now the default. Since you're using 'bam', the arguments chr.sizes is not necessary either as the chromosome lengths are extracted from the BAM file. Using the chr.map is not necessary for Dmelanogaster as easyRNASeq should correctly convert the chromosome names. To list which organism easyRNASeq support that conversion for, I have added a method: knownOrganisms that lists them. If your organism is not part of that list but the chromosome/scaffold names are identical between the BAM file and the annotation file, it will work smoothly too. Thanks again, Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On Jun 13, 2012, at 2:21 PM, Mark Robinson wrote: > Dear Nico, > > I get this error (see below for full details): > > Error in fetchCoverage(obj, format = format, filename = file, filter = filter, : > The file ./SRR031714.subread_s.bam contains reads of different sizes: 74, 70 . We cannot deal with such data at the moment. Please contact the authors to add this functionality. > > With read trimming and so on, won't this be a common occurrence? Your error message says to ask you to add this functionality, so I ask. > > Best, > Mark > > >> # setup >> gtf <- "Drosophila_melanogaster.BDGP5.67.gtf" >> fn <- dir(,".subread_s.bam$") # file names >> >> nm <- names( seqlengths(Dmelanogaster) ) >> chr.map <- data.frame(from=nm,to=gsub("chr","",nm)) >> #chr.map <- data.frame(from=nm,to=gsub("chr","",nm)) >> >> counts <- easyRNASeq(organism="Dmelanogaster", chr.sizes=as.list(seqlengths(Dmelanogaster)), > + readLength=74L, annotationMethod="gtf", annotationFile=gtf, format="bam", gapped=TRUE, > + count="exons", filenames=fn, filesDirectory=".", chr.map=chr.map) > Checking arguments... > Fetching annotations... > Read 308180 records > Summarizing counts... > Processing SRR031708.subread_s.bam > Processing SRR031709.subread_s.bam > Processing SRR031710.subread_s.bam > Processing SRR031711.subread_s.bam > Processing SRR031712.subread_s.bam > Processing SRR031713.subread_s.bam > Processing SRR031714.subread_s.bam > Error in fetchCoverage(obj, format = format, filename = file, filter = filter, : > The file ./SRR031714.subread_s.bam contains reads of different sizes: 74, 70 . We cannot deal with such data at the moment. Please contact the authors to add this functionality. > In addition: There were 23 warnings (use warnings() to see them) >> sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 > [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] GenomicFeatures_1.8.1 > [2] AnnotationDbi_1.18.1 > [3] BSgenome.Dmelanogaster.UCSC.dm3_1.3.17 > [4] easyRNASeq_1.2.3 > [5] ShortRead_1.14.4 > [6] latticeExtra_0.6-19 > [7] RColorBrewer_1.0-5 > [8] lattice_0.20-6 > [9] Rsamtools_1.8.5 > [10] DESeq_1.8.3 > [11] locfit_1.5-8 > [12] BSgenome_1.24.0 > [13] GenomicRanges_1.8.6 > [14] Biostrings_2.24.1 > [15] IRanges_1.14.3 > [16] edgeR_2.6.7 > [17] limma_3.12.1 > [18] biomaRt_2.12.0 > [19] Biobase_2.16.0 > [20] genomeIntervals_1.12.0 > [21] BiocGenerics_0.2.0 > [22] intervals_0.13.3 > > loaded via a namespace (and not attached): > [1] annotate_1.34.0 bitops_1.0-4.1 DBI_0.2-5 genefilter_1.38.0 > [5] geneplotter_1.34.0 grid_2.15.0 hwriter_1.3 RCurl_1.91-1 > [9] RSQLite_0.11.1 rtracklayer_1.16.1 splines_2.15.0 stats4_2.15.0 > [13] survival_2.36-14 tools_2.15.0 XML_3.9-4 xtable_1.7-0 > [17] zlibbioc_1.2.0 > > > > > ---------- > Prof. Dr. Mark Robinson > Bioinformatics > Institute of Molecular Life Sciences > University of Zurich > Winterthurerstrasse 190 > 8057 Zurich > Switzerland > > v: +41 44 635 4848 > f: +41 44 635 6898 > e: mark.robinson at imls.uzh.ch > o: Y11-J-16 > w: http://tiny.cc/mrobin >
Annotation Organism BSgenome convert BSgenome easyRNASeq Annotation Organism BSgenome • 883 views
ADD COMMENT

Login before adding your answer.

Traffic: 974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6