The optimal minMQS parameter in featureCounts for RNA-Seq quantification
1
1
Entering edit mode
Gary ▴ 20
@gary-7967
Last seen 5.8 years ago

I have asked this question on Biostars (https://www.biostars.org/p/361800/), and have not got any response. Many thanks for any suggestions and helps.

We have six RNA-Seq data, including three wild type and three knock-out mice. I use STAR 2.5.3a (built-in Partek Flow) for the alignment and featureCounts (Rsubread_1.30.9) for the quantification. The command I run featureCoutns is below.

files <- c("Chuong753.bam", "Chuong754.bam", "Chuong755.bam",
  "Chuong756.bam", "Chuong757.bam", "Chuong758.bam")
rc <- featureCounts(files, annot.ext = "mm10ncbiRefSeqCurated.gtf",
  isGTFAnnotationFile = TRUE, GTF.featureType = "exon",
  GTF.attrType = "gene_id", minMQS = 10, strandSpecific = 0,
  nthreads = 6, verbose = TRUE)

I found that there are a lot of Unassigned_MappingQuality reads (about 26.9% of total alignments, the detail below). Should I set minMQS=3 or 0 to increase the number of Assigned reads? Many thanks.

rc$stat
                          Status Chuong753 Chuong754 Chuong755 Chuong756 Chuong757 Chuong758
1                       Assigned  28552790  19795064  26274194  26601820  21264775  21703604
2            Unassigned_Unmapped         0         0         0         0         0         0
3      Unassigned_MappingQuality  10233734   6718392  10369784   9299763   9249801  12905108
4             Unassigned_Chimera         0         0         0         0         0         0
5      Unassigned_FragmentLength         0         0         0         0         0         0
6           Unassigned_Duplicate         0         0         0         0         0         0
7        Unassigned_MultiMapping         0         0         0         0         0         0
8           Unassigned_Secondary         0         0         0         0         0         0
9         Unassigned_Nonjunction         0         0         0         0         0         0
10         Unassigned_NoFeatures   2596367   1732967   2577948   1882419   2235492   2757615
11 Unassigned_Overlapping_Length         0         0         0         0         0         0
12          Unassigned_Ambiguity    237389    166711    228953    216951    178357    218899

In addition, I am confused about the mapping quality score. The range of mapping quality score seems:

  1. 0 to 255 for STAR (http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STAR.posix/doc/STARmanual.pdf)

  2. 0 to 40 for Rsubread v1.32.0 (https://bioconductor.org/packages/release/bioc/vignettes/Rsubread/inst/doc/SubreadUsersGuide.pdf)

  3. 0 to read length (i.e. 75 in my case) for Rsubread v1.20.1 (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.140&rep=rep1&type=pdf)

  4. 0 to 200 for Rsubread v1.11.10 (http://citeseerx.ist.psu.edu/viewdochelpnload?doi=10.1.1.366.8312&rep=rep1&type=pdf)

Could you hlep me? Thanks a lot.

RNA-Seq featureCounts STAR minMQS mapping quality score • 2.5k views
ADD COMMENT
2
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 1 day ago
Australia/Melbourne

Please post your questions regarding Rsubread (featureCounts is part of it) to this Support Site as Rsubread is a Bioconductor package.

In general I don't recommend throwing away reads with low MAQ value when you count them to genes. This is because an RNA-seq read with low MAQ is likely to be correctly mapped if it hits an annotated gene (more precisely it hits an exon of a gene). The transcriptome only accounts for ~2% of the genome, therefore the chance that a read incorrectly mapped to the genome happens to map to the transcriptome is very low.

Regarding MAQ values reported by Rsubread, we have recently changed them to the range of 0 to 40 to make it be in line with the Phred score range.

ADD COMMENT
0
Entering edit mode

Hi Wei Shi, Thank you so much. Your suggestion is very helpful.

ADD REPLY

Login before adding your answer.

Traffic: 791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6