how to extract read alignments from reads aligning in mulitple locations
1
1
Entering edit mode
Robert Castelo ★ 3.3k
@rcastelo
Last seen 4 days ago
Barcelona/Universitat Pompeu Fabra

hi,

i have a 4Gb BAM file with RNA-seq reads aligned with STAR to the hg38 version of the human genome and where, according to STAR, an important fraction of them (~25%) aligned to multiple loci. I"m interested in finding the genes that overlap these multimapping reads to have an idea of the origin of these reads. Could anyone suggest me a Rsamtools/GenomicAlignments/GenomicFeatures route to extract these multimapped alignments and genes overlapping them?

thanks!!

robert.

rsamtools genomicalignments genomicfeatures • 2.4k views
ADD COMMENT
3
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 2 hours ago
Australia

It's easy. For example,

library(GenomicAlignments)
ambiguousReads <- readGAlignmentPairs("/users/robert/project/example.bam", param = ScanBamParam(flag = scanBamFlag(isSecondaryAlignment = TRUE))) # Only import multi-mapping reads.

library(rtracklayer)
genesExonsTranscripts <- import("/users/robert/databases/hg38/GENCODEversion26.gtf") # GRanges with metadata.
genes <- subset(genesExonsTranscripts, type == "gene")

library(GenomicRanges)
genesAlignments <- countOverlaps(genes, ambiguousReads)
mcols(GENCODEgenes)[genesAlignments > 0, "gene_name"] # Gene symbols of genes with non-zero counts.

You may like to assign such alignments to a particular gene using RSEM.

ADD COMMENT
0
Entering edit mode

Thanks a lot. I've tried out and the 'GAlignmentPairs' object 'ambiguousReads' has about 20 million pairs, however, STAR (the read mapper) tells me that there are about 9 million "reads mapped to multiple loci". Could you think of any reason responsible for this discrepancy? Maybe I'm missing some additional flag when reading the alignments?

ADD REPLY
1
Entering edit mode

You imported alignments whereas STAR's summary is in regard to reads. If a read can have more than one alignment, then 9 million reads can result in 20 million alignments because there is a 1:many relationship between them.

ADD REPLY
0
Entering edit mode

True, using the 'use.names' argument and counting unique read identifiers i get 8.7 million reads, which is similar to the number given by STAR. thanks!!

ADD REPLY

Login before adding your answer.

Traffic: 697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6