Possible problem with featureCounts() and scipen
1
0
Entering edit mode
Oscar Rueda ▴ 10
@oscar-rueda-6611
Last seen 9.6 years ago
Dear list, I've noticed a problem counting reads using the Rsubread package. I can not include a reproducible example for lack of a bam file, but I hope the following code explains the problem: Suppose I have this genomic region: > regions <- data.frame(GeneID="G1", Chr="chr2", Start=202000000, End=202002100, Strand=1) Then I want to count the reads in this region: > featureCounts("MyReadsfile.bam", annot.ext=regions, annot.inbuilt="hg19", ignoreDup=T) I get a huge number of reads, but if I look at the output, I see that the annotation is wrong: $counts X.MyReadsfile.bam G1 715291 $annotation GeneID Chr Start End Strand Length 1 Error chr2 2 202002100 + 202002099 That is, the start is 2 instead of 202000000. If I print my regions I see > regions GeneID Chr Start End Strand 1 Error chr2 2.02e+08 202002100 1 That is, the scientific notation is not taken properly. I can fix this doing > par(scipen=10) But I wanted to ask if anyone has noticed this behaviour and if this is expected by the function. Thanks a lot for your comments, Oscar Oscar M. Rueda, PhD. Postdoctoral Research Fellow, Caldas Lab, Breast Cancer Functional Genomics. University of Cambridge. Cancer Research UK Cambridge Institute. Li Ka Shing Centre, Robinson Way. Cambridge CB2 0RE England > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Rsubread_1.14.1 GenomicAlignments_1.0.1 BSgenome_1.32.0 [4] Rsamtools_1.16.1 Biostrings_2.32.0 XVector_0.4.0 [7] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.9 [10] BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.1 bitops_1.0-6 [5] brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 [9] fail_1.2 foreach_1.4.2 iterators_1.0.7 plyr_1.8.1 [13] Rcpp_0.11.2 RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0 [17] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 [[alternative HTML version deleted]]
Annotation Cancer Breast Rsubread Annotation Cancer Breast Rsubread • 1.0k views
ADD COMMENT
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 14 days ago
Australia/Melbourne/Olivia Newton-John …
Dear Oscar, I have fixed the bug and committed to bioc (1.14.2). Also, you'd better refine your commands like below: regions <- data.frame(GeneID="G1", Chr="chr2", Start=202000000, End=202002100, Strand="+") featureCounts("MyReadsfile.bam", annot.ext=regions, ignoreDup=T) Best wishes, Wei On Jun 17, 2014, at 9:00 PM, Oscar Rueda wrote: > Dear list, > I've noticed a problem counting reads using the Rsubread package. I can not include a reproducible example for lack of a bam file, but I hope the following code explains the problem: > > Suppose I have this genomic region: > >> regions <- data.frame(GeneID="G1", Chr="chr2", Start=202000000, End=202002100, Strand=1) > > Then I want to count the reads in this region: > >> featureCounts("MyReadsfile.bam", annot.ext=regions, annot.inbuilt="hg19", ignoreDup=T) > > I get a huge number of reads, but if I look at the output, I see that the annotation is wrong: > > $counts > X.MyReadsfile.bam > G1 715291 > > $annotation > GeneID Chr Start End Strand Length > 1 Error chr2 2 202002100 + 202002099 > > That is, the start is 2 instead of 202000000. If I print my regions I see > >> regions > GeneID Chr Start End Strand > 1 Error chr2 2.02e+08 202002100 1 > > That is, the scientific notation is not taken properly. > I can fix this doing >> par(scipen=10) > > But I wanted to ask if anyone has noticed this behaviour and if this is expected by the function. > > Thanks a lot for your comments, > Oscar > > Oscar M. Rueda, PhD. > Postdoctoral Research Fellow, Caldas Lab, Breast Cancer Functional > Genomics. > University of Cambridge. Cancer Research UK Cambridge Institute. > Li Ka Shing Centre, Robinson Way. > Cambridge CB2 0RE > England > > >> sessionInfo() > R version 3.1.0 (2014-04-10) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Rsubread_1.14.1 GenomicAlignments_1.0.1 BSgenome_1.32.0 > [4] Rsamtools_1.16.1 Biostrings_2.32.0 XVector_0.4.0 > [7] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.9 > [10] BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.1 bitops_1.0-6 > [5] brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 > [9] fail_1.2 foreach_1.4.2 iterators_1.0.7 plyr_1.8.1 > [13] Rcpp_0.11.2 RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0 > [17] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD COMMENT

Login before adding your answer.

Traffic: 661 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6