Hello I am beginning in bioinformatics and I am analyzing RNAseq data mapped to a genome using RSEM+STAR and also only STAR. In RSEM log the mapping percentages for one sample look like this:
Number of input reads | 60070942
Average input read length | 200
UNIQUE READS:
Uniquely mapped reads number | 34093455
Uniquely mapped reads % | 56.76%
Average mapped length | 190.57
Number of splices: Total | 3020472
Number of splices: Annotated (sjdb) | 1918800
Number of splices: GT/AG | 2970169
Number of splices: GC/AG | 13559
Number of splices: AT/AC | 1783
Number of splices: Non-canonical | 34961
Mismatch rate per base, % | 1.17%
Deletion rate per base | 0.03%
Deletion average length | 2.10
Insertion rate per base | 0.02%
Insertion average length | 1.78
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 2860068
% of reads mapped to multiple loci | 4.76%
Number of reads mapped to too many loci | 57489
% of reads mapped to too many loci | 0.10%
Since I had forgotten to output the BAM file, I mapped again using only STAR. However when I run featureCounts, for the same sample:
featureCounts(files = BAM1, isPairedEnd = TRUE, allowMultiOverlap = TRUE,
countMultiMappingReads = TRUE, annot.ext = annotation1,
isGTFAnnotationFile = TRUE, GTF.featureType = c("gene"),
GTF.attrType = c("ID"), useMetaFeatures = TRUE, nthreads = 8)
Process BAM file sample1Aligned.out.bam... ||
|| Paired-end reads are included. ||
|| Total alignments : 47486220 ||
|| Successfully assigned alignments : 13191085 (27.8%)
And I also added a few more genes in the annotation and mapped again using RSEM, and the log looks the same as mentioned above. This time I outputted the .genome.bam file, and for featureCounts:
featureCounts(files = BAM2, isPairedEnd = TRUE, allowMultiOverlap = TRUE,
countMultiMappingReads = TRUE, annot.ext = annotation2,
isGTFAnnotationFile = TRUE, GTF.featureType = c("gene"),
GTF.attrType = c("ID"), useMetaFeatures = TRUE, nthreads = 8)
Process BAM file sample1.genome.bam... ||
|| Paired-end reads are included. ||
|| Total alignments : 26163546 ||
|| Successfully assigned alignments : 3046125 (11.6%)
I am having a hard time understanding why these percentages are so different between each other (if the logs are the same), and also different from the logs. I read here that in the logs we have reads and in featureCounts we have alignments, and for multiple mapping reads we can have more than one alignment. Ok, but it also says that the featureCounts is always higher which is not the case.. I am very confused. I am posting this here because here I found this other post I mentioned, but I can also post it in somewhere else (where?).
sessionInfo:
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] png_0.1-8 ggdendro_0.1.23 ggrepel_0.9.3 dplyr_1.1.2 dendextend_1.17.1 Rsubread_2.8.2 ggfortify_0.4.16 ggplot2_3.4.2 limma_3.50.3 stringr_1.5.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.10 pillar_1.9.0 compiler_4.1.2 viridis_0.6.3 tools_4.1.2 digest_0.6.31 evaluate_0.21 viridisLite_0.4.2 lattice_0.21-8 lifecycle_1.0.3 tibble_3.2.1
[12] gtable_0.3.3 pkgconfig_2.0.3 rlang_1.1.1 Matrix_1.5-4.1 cli_3.6.1 rstudioapi_0.14 yaml_2.3.7 xfun_0.39 fastmap_1.1.1 gridExtra_2.3 withr_2.5.0
[23] knitr_1.43 generics_0.1.3 vctrs_0.6.2 tidyselect_1.2.0 glue_1.6.2 R6_2.5.1 fansi_1.0.4 rmarkdown_2.22 tidyr_1.3.0 purrr_1.0.1 magrittr_2.0.3
[34] MASS_7.3-60 scales_1.2.1 htmltools_0.5.5 colorspace_2.1-0 utf8_1.2.3 stringi_1.7.12 munsell_0.5.0
You might also include the output from running samtools idxstats on each BAM file.
Oh wait. I meant
samtools flagstat
.