Possible problem with featureCounts() and scipen

0

Entering edit mode

Oscar Rueda ▴ 10

@oscar-rueda-6611

Last seen 9.6 years ago

Dear list, I've noticed a problem counting reads using the Rsubread package. I can not include a reproducible example for lack of a bam file, but I hope the following code explains the problem: Suppose I have this genomic region: > regions <- data.frame(GeneID="G1", Chr="chr2", Start=202000000, End=202002100, Strand=1) Then I want to count the reads in this region: > featureCounts("MyReadsfile.bam", annot.ext=regions, annot.inbuilt="hg19", ignoreDup=T) I get a huge number of reads, but if I look at the output, I see that the annotation is wrong: $counts X.MyReadsfile.bam G1 715291 $annotation GeneID Chr Start End Strand Length 1 Error chr2 2 202002100 + 202002099 That is, the start is 2 instead of 202000000. If I print my regions I see > regions GeneID Chr Start End Strand 1 Error chr2 2.02e+08 202002100 1 That is, the scientific notation is not taken properly. I can fix this doing > par(scipen=10) But I wanted to ask if anyone has noticed this behaviour and if this is expected by the function. Thanks a lot for your comments, Oscar Oscar M. Rueda, PhD. Postdoctoral Research Fellow, Caldas Lab, Breast Cancer Functional Genomics. University of Cambridge. Cancer Research UK Cambridge Institute. Li Ka Shing Centre, Robinson Way. Cambridge CB2 0RE England > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Rsubread_1.14.1 GenomicAlignments_1.0.1 BSgenome_1.32.0 [4] Rsamtools_1.16.1 Biostrings_2.32.0 XVector_0.4.0 [7] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.9 [10] BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.1 bitops_1.0-6 [5] brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 [9] fail_1.2 foreach_1.4.2 iterators_1.0.7 plyr_1.8.1 [13] Rcpp_0.11.2 RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0 [17] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 [[alternative HTML version deleted]]

Annotation Cancer Breast Rsubread Annotation Cancer Breast Rsubread • 1.0k views

ADD COMMENT • link updated 9.9 years ago by Wei Shi ★ 3.6k • written 9.9 years ago by Oscar Rueda ▴ 10

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 14 days ago

Australia/Melbourne/Olivia Newton-John …

Dear Oscar, I have fixed the bug and committed to bioc (1.14.2). Also, you'd better refine your commands like below: regions <- data.frame(GeneID="G1", Chr="chr2", Start=202000000, End=202002100, Strand="+") featureCounts("MyReadsfile.bam", annot.ext=regions, ignoreDup=T) Best wishes, Wei On Jun 17, 2014, at 9:00 PM, Oscar Rueda wrote: > Dear list, > I've noticed a problem counting reads using the Rsubread package. I can not include a reproducible example for lack of a bam file, but I hope the following code explains the problem: > > Suppose I have this genomic region: > >> regions <- data.frame(GeneID="G1", Chr="chr2", Start=202000000, End=202002100, Strand=1) > > Then I want to count the reads in this region: > >> featureCounts("MyReadsfile.bam", annot.ext=regions, annot.inbuilt="hg19", ignoreDup=T) > > I get a huge number of reads, but if I look at the output, I see that the annotation is wrong: > > $counts > X.MyReadsfile.bam > G1 715291 > > $annotation > GeneID Chr Start End Strand Length > 1 Error chr2 2 202002100 + 202002099 > > That is, the start is 2 instead of 202000000. If I print my regions I see > >> regions > GeneID Chr Start End Strand > 1 Error chr2 2.02e+08 202002100 1 > > That is, the scientific notation is not taken properly. > I can fix this doing >> par(scipen=10) > > But I wanted to ask if anyone has noticed this behaviour and if this is expected by the function. > > Thanks a lot for your comments, > Oscar > > Oscar M. Rueda, PhD. > Postdoctoral Research Fellow, Caldas Lab, Breast Cancer Functional > Genomics. > University of Cambridge. Cancer Research UK Cambridge Institute. > Li Ka Shing Centre, Robinson Way. > Cambridge CB2 0RE > England > > >> sessionInfo() > R version 3.1.0 (2014-04-10) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Rsubread_1.14.1 GenomicAlignments_1.0.1 BSgenome_1.32.0 > [4] Rsamtools_1.16.1 Biostrings_2.32.0 XVector_0.4.0 > [7] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.9 > [10] BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.1 bitops_1.0-6 > [5] brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 > [9] fail_1.2 foreach_1.4.2 iterators_1.0.7 plyr_1.8.1 > [13] Rcpp_0.11.2 RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0 > [17] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD COMMENT • link 9.9 years ago Wei Shi ★ 3.6k

Login before adding your answer.