About Rsubread
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Dear list, I have a question about the arguments of the align function in the Rsubread package. I have mapped my RNA-seq SOLiD data (single-end, 16 samples, 50bp long reads, human) with Rsubread using the align() function in 3 versions: -default parameters (1) -unique=TRUE and tieBreakQS=TRUE (2) -unique=TRUE (3) For my surprise, the percentage of mapped reads is ordered like this: (1)>(2)>(3), for all samples. Why is it, that when unique and tieBreakQS=TRUE (2) is used, I get more mapped reads than only with unique (3)? tieBreaksQS argument should only decide, when two reads are equally optimally aligned, which read has to be kept. I expected something like this: (1)>(2)=(3) approximately. Where is my reasoning mistake? On the other hand, after the counting procedure using the featureCounts() function (gtf only with genes), I retrieved in some samples more genes with alignment (2) than (1). I thought that the set of mapped reads of (2) should be contained in (1)? Is this also wrong? It does not happen in many samples, and the difference is not that big, but is unexpected for me. So, if anyone could help and sees where my thinking mistake is, I would be very thankful! Cheers, Luc??a -- output of sessionInfo(): sessionInfo() +R version 2.15.0 (2012-03-30) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=es_ES.UTF-8 [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rsubread_1.6.3 loaded via a namespace (and not attached): [1] tools_2.15.0 -- Sent via the guest posting facility at bioconductor.org.
Alignment Rsubread Alignment Rsubread • 961 views
ADD COMMENT
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 17 days ago
Australia/Melbourne/Olivia Newton-John …
Dear Lucia, For your version 2 of alignment, align() function firstly used mapping quality scores to break ties if more than 1 best locations were found for a read. If one of the mapping locations was found to have a higher mapping score than others, that location will be selected and reported for that read. However, this read will be reported as a unmapped read in your version 3 of alignment, because it was mapped to multiple locations and the '-unique=TRUE' option instructed align() to not report it. So 'tieBreakQS=TRUE' changed a multi-mapping read to a unique-mapping read. This is why you got more mapped reads in (2) than (3). I would recommend setting 'tieBreakQS=TRUE" if you are worried about the multi-mapping reads. For you second question, you are right in that the set of mapped reads reported in version 2 should be subset of mapped reads reported in version 1. However, the mapping locations could be different for the same read if it was mapped to more than 1 locations. For version 1, the mapping location for such a read was randomly chosen. However, the location with best mapping quality score was chosen in version 2. If the mapping locations of such reads reported by version 2 fell within the gene region in your annotation but their locations reported by version 1 did not, these reads will counted in read summarization 2 but not counted in summarization 1. So the align() and featureCounts() function did what they are supposed to do for your different versions of running. Let me know if this is unclear to you. Cheers, Wei On Jun 15, 2012, at 4:55 AM, Lucia Spangenberg [guest] wrote: > > Dear list, > I have a question about the arguments of the align function in the Rsubread package. I have mapped my RNA-seq SOLiD data (single-end, 16 samples, 50bp long reads, human) with Rsubread using the align() function in 3 versions: > > -default parameters (1) > -unique=TRUE and tieBreakQS=TRUE (2) > -unique=TRUE (3) > > For my surprise, the percentage of mapped reads is ordered like this: (1)>(2)>(3), for all samples. > Why is it, that when unique and tieBreakQS=TRUE (2) is used, I get more mapped reads than only with unique (3)? tieBreaksQS argument should only decide, when two reads are equally optimally aligned, which read has to be kept. I expected something like this: > (1)>(2)=(3) approximately. > Where is my reasoning mistake? > > On the other hand, after the counting procedure using the featureCounts() function (gtf only with genes), I retrieved in some samples more genes with alignment (2) than (1). I thought that the set of mapped reads of (2) should be contained in (1)? Is this also wrong? It does not happen in many samples, and the difference is not that big, but is unexpected for me. > > So, if anyone could help and sees where my thinking mistake is, I would be very thankful! > > Cheers, > Luc?a > > > > > -- output of sessionInfo(): > > sessionInfo() > +R version 2.15.0 (2012-03-30) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C > [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=es_ES.UTF-8 > [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Rsubread_1.6.3 > > loaded via a namespace (and not attached): > [1] tools_2.15.0 > > > -- > Sent via the guest posting facility at bioconductor.org. ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD COMMENT

Login before adding your answer.

Traffic: 1052 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6