Question: how to extract read alignments from reads aligning in mulitple locations
gravatar for Robert Castelo
15 months ago by
Robert Castelo2.2k
Spain/Barcelona/Universitat Pompeu Fabra
Robert Castelo2.2k wrote:


i have a 4Gb BAM file with RNA-seq reads aligned with STAR to the hg38 version of the human genome and where, according to STAR, an important fraction of them (~25%) aligned to multiple loci. I"m interested in finding the genes that overlap these multimapping reads to have an idea of the origin of these reads. Could anyone suggest me a Rsamtools/GenomicAlignments/GenomicFeatures route to extract these multimapped alignments and genes overlapping them?



ADD COMMENTlink modified 15 months ago by Dario Strbenac1.4k • written 15 months ago by Robert Castelo2.2k
gravatar for Dario Strbenac
15 months ago by
Dario Strbenac1.4k
Dario Strbenac1.4k wrote:

It's easy. For example,

ambiguousReads <- readGAlignmentPairs("/users/robert/project/example.bam", param = ScanBamParam(flag = scanBamFlag(isSecondaryAlignment = TRUE))) # Only import multi-mapping reads.

genesExonsTranscripts <- import("/users/robert/databases/hg38/GENCODEversion26.gtf") # GRanges with metadata.
genes <- subset(genesExonsTranscripts, type == "gene")

genesAlignments <- countOverlaps(genes, ambiguousReads)
mcols(GENCODEgenes)[genesAlignments > 0, "gene_name"] # Gene symbols of genes with non-zero counts.

You may like to assign such alignments to a particular gene using RSEM.

ADD COMMENTlink written 15 months ago by Dario Strbenac1.4k

Thanks a lot. I've tried out and the 'GAlignmentPairs' object 'ambiguousReads' has about 20 million pairs, however, STAR (the read mapper) tells me that there are about 9 million "reads mapped to multiple loci". Could you think of any reason responsible for this discrepancy? Maybe I'm missing some additional flag when reading the alignments?

ADD REPLYlink written 15 months ago by Robert Castelo2.2k

You imported alignments whereas STAR's summary is in regard to reads. If a read can have more than one alignment, then 9 million reads can result in 20 million alignments because there is a 1:many relationship between them.

ADD REPLYlink written 15 months ago by Dario Strbenac1.4k

True, using the 'use.names' argument and counting unique read identifiers i get 8.7 million reads, which is similar to the number given by STAR. thanks!!

ADD REPLYlink written 15 months ago by Robert Castelo2.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 389 users visited in the last hour