Question: Possible problem with featureCounts() and scipen
0
gravatar for Oscar Rueda
5.3 years ago by
Oscar Rueda10
Oscar Rueda10 wrote:
Dear list, I've noticed a problem counting reads using the Rsubread package. I can not include a reproducible example for lack of a bam file, but I hope the following code explains the problem: Suppose I have this genomic region: > regions <- data.frame(GeneID="G1", Chr="chr2", Start=202000000, End=202002100, Strand=1) Then I want to count the reads in this region: > featureCounts("MyReadsfile.bam", annot.ext=regions, annot.inbuilt="hg19", ignoreDup=T) I get a huge number of reads, but if I look at the output, I see that the annotation is wrong: $counts X.MyReadsfile.bam G1 715291 $annotation GeneID Chr Start End Strand Length 1 Error chr2 2 202002100 + 202002099 That is, the start is 2 instead of 202000000. If I print my regions I see > regions GeneID Chr Start End Strand 1 Error chr2 2.02e+08 202002100 1 That is, the scientific notation is not taken properly. I can fix this doing > par(scipen=10) But I wanted to ask if anyone has noticed this behaviour and if this is expected by the function. Thanks a lot for your comments, Oscar Oscar M. Rueda, PhD. Postdoctoral Research Fellow, Caldas Lab, Breast Cancer Functional Genomics. University of Cambridge. Cancer Research UK Cambridge Institute. Li Ka Shing Centre, Robinson Way. Cambridge CB2 0RE England > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Rsubread_1.14.1 GenomicAlignments_1.0.1 BSgenome_1.32.0 [4] Rsamtools_1.16.1 Biostrings_2.32.0 XVector_0.4.0 [7] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.9 [10] BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.1 bitops_1.0-6 [5] brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 [9] fail_1.2 foreach_1.4.2 iterators_1.0.7 plyr_1.8.1 [13] Rcpp_0.11.2 RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0 [17] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 [[alternative HTML version deleted]]
ADD COMMENTlink modified 5.3 years ago by Wei Shi3.2k • written 5.3 years ago by Oscar Rueda10
Answer: Possible problem with featureCounts() and scipen
0
gravatar for Wei Shi
5.3 years ago by
Wei Shi3.2k
Australia
Wei Shi3.2k wrote:
Dear Oscar, I have fixed the bug and committed to bioc (1.14.2). Also, you'd better refine your commands like below: regions <- data.frame(GeneID="G1", Chr="chr2", Start=202000000, End=202002100, Strand="+") featureCounts("MyReadsfile.bam", annot.ext=regions, ignoreDup=T) Best wishes, Wei On Jun 17, 2014, at 9:00 PM, Oscar Rueda wrote: > Dear list, > I've noticed a problem counting reads using the Rsubread package. I can not include a reproducible example for lack of a bam file, but I hope the following code explains the problem: > > Suppose I have this genomic region: > >> regions <- data.frame(GeneID="G1", Chr="chr2", Start=202000000, End=202002100, Strand=1) > > Then I want to count the reads in this region: > >> featureCounts("MyReadsfile.bam", annot.ext=regions, annot.inbuilt="hg19", ignoreDup=T) > > I get a huge number of reads, but if I look at the output, I see that the annotation is wrong: > > $counts > X.MyReadsfile.bam > G1 715291 > > $annotation > GeneID Chr Start End Strand Length > 1 Error chr2 2 202002100 + 202002099 > > That is, the start is 2 instead of 202000000. If I print my regions I see > >> regions > GeneID Chr Start End Strand > 1 Error chr2 2.02e+08 202002100 1 > > That is, the scientific notation is not taken properly. > I can fix this doing >> par(scipen=10) > > But I wanted to ask if anyone has noticed this behaviour and if this is expected by the function. > > Thanks a lot for your comments, > Oscar > > Oscar M. Rueda, PhD. > Postdoctoral Research Fellow, Caldas Lab, Breast Cancer Functional > Genomics. > University of Cambridge. Cancer Research UK Cambridge Institute. > Li Ka Shing Centre, Robinson Way. > Cambridge CB2 0RE > England > > >> sessionInfo() > R version 3.1.0 (2014-04-10) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Rsubread_1.14.1 GenomicAlignments_1.0.1 BSgenome_1.32.0 > [4] Rsamtools_1.16.1 Biostrings_2.32.0 XVector_0.4.0 > [7] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.9 > [10] BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.1 bitops_1.0-6 > [5] brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 > [9] fail_1.2 foreach_1.4.2 iterators_1.0.7 plyr_1.8.1 > [13] Rcpp_0.11.2 RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0 > [17] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD COMMENTlink written 5.3 years ago by Wei Shi3.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 277 users visited in the last hour