Hi,
For micro RNA (miRNA) data, the following aligners are recommended specifically for these short sequences:
MicroRazerS www.seqan.de/projects/microrazers/
mrFAST mrfast.sourceforge.net/
mrsFAST mrsfast.sourceforge.net/Home
PatMaN bioinf.eva.mpg.de/patman/
Does anyone know how the Rsubread align function compares to these? Has anyone performed any comparisons? I use Rsubread for RNAseq and it would be convenient to use it also for miRNAseq, but I am a little concerned and wonder whether I need to invest time in conducting some comparisons.
I have just noticed one potential problem with Rsubread align function when applied to miRNAseq: When I use the annotation file from mirBase (hsa.gff3) instead of the built-in annotation or the ensembl GTF file, then the Gene IDs in the counts (rownames) and annotation output from Rsubread-align are all NA (see code below).
counts_TH14_uniqtrue_annotMirBmature.out <- featureCounts(files=mapped.flist,
annot.inbuilt="hg38", chrAliases=NULL,
# use mirBase GTF file and feature = miRNA (mature miRNA)
annot.ext="/home/inah/Rsubread_miRNA/RefGTF/hsa.gff3",
isGTFAnnotationFile=TRUE,
GTF.featureType="miRNA", GTF.attrType="miRNA", useMetaFeatures=FALSE, ...
Many thanks, Ina
sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=en_US.UTF-8
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rsubread_1.22.3
loaded via a namespace (and not attached):
[1] tools_3.3.1
Correction to the 2nd half of my first email:
I have noticed one potential problem with Rsubread featureCounts function when applied to miRNAseq: When I use the annotation file from mirBase (hsa.gff3) instead of the built-in annotation or the ensembl GTF file, then the Gene IDs in the counts (rownames) and annotation output from Rsubread-featureCounts are all NA (see code below).
counts_TH14_uniqtrue_annotMirBmature.out <- featureCounts(files=mapped.flist,
annot.inbuilt="hg38", chrAliases=NULL,
# use mirBase GTF file and feature = miRNA (mature miRNA)
annot.ext="/home/inah/Rsubread_miRNA/RefGTF/hsa.gff3",
isGTFAnnotationFile=TRUE,
GTF.featureType="miRNA", GTF.attrType="miRNA", useMetaFeatures=FALSE, ...
Many thanks, Ina