Question: Why extractList fails to relist paired-end RNA-seq data when using pcoverageByTranscript?
0
gravatar for Jiping Wang
5 weeks ago by
Jiping Wang70
Jiping Wang70 wrote:

I am trying to use pcoverageByTranscript function to calculate coverage score on each genes. I have a paired-end RNA-seq data. I read in with readGAlignments function as gal1 and readGAlignmentPairs function as gal2 separately.

> which <- GRanges(seqnames = c("chr1"))
> gr = as(seqinfo(bf), "GRanges")
> param = ScanBamParam(which = gr["chr1"])
> gal1=readGAlignments("SRR873822.bam",param=param)
> bf = BamFile("SRR873822.bam", asMates = TRUE, qnameSuffixStart = ".")
> gal2=readGAlignmentPairs(bf, param = param)

I have parsed transcript file dm3_transcripts from .gtf file. Following the sample codes from the tutorial of coverageByTranscript page, I got the results I want for gal1.

However for gal2, the extractList function fails to relist the alignmentPairs object according to the groups (i.e. under each gene name no compatible reads paired were listed as should be). For the paired-end data, how should I relist the compatible reads for coverage calculation? Thanks for help!!

> compat_hits <- findCompatibleOverlaps(gal1, dm3_transcripts)
> compat_hits
Hits object with 4263975 hits and 0 metadata columns:
            queryHits subjectHits
            <integer>   <integer>
        [1]         1         455
        [2]         2         455
        [3]         3         455
        [4]         4         455
        [5]         5         455
        ...       ...         ...
  [4263974]   6020051        1755
  [4263975]   6020052        1755
  -------
  queryLength: 6020257 / subjectLength: 2570
> tx2reads <- setNames(as(t(compat_hits), "List"), names(dm3_transcripts))
> compat_reads_by_tx <- extractList(gal1, tx2reads)
> compat_reads_by_tx
GAlignmentsList object of length 2570:
$A3GALT2 
GAlignments object with 0 alignments and 0 metadata columns:
     seqnames strand cigar qwidth start end width njunc

$AADACL3 
GAlignments object with 0 alignments and 0 metadata columns:
     seqnames strand cigar qwidth start end width njunc

...
<2567 more elements>
-------
seqinfo: 25 sequences from an unspecified genome
> 
> ## find compatible overlap between reads and genes for gal2
> 
> compat_hits <- findCompatibleOverlaps(gal2, dm3_transcripts)
> compat_hits
Hits object with 1815917 hits and 0 metadata columns:
            queryHits subjectHits
            <integer>   <integer>
        [1]         1         455
        [2]         2         455
        [3]         3         455
        [4]         4         455
        [5]         5         455
        ...       ...         ...
  [1815916]   2787062        1755
  [1815917]   2787064        1755
  -------
  queryLength: 2787118 / subjectLength: 2570
> tx2reads <- setNames(as(t(compat_hits), "List"), names(dm3_transcripts))
> compat_reads_by_tx <- extractList(gal2, tx2reads)
> compat_reads_by_tx
List of length 2570
names(2570): A3GALT2 AADACL3 AADACL4 ABCA4 ABCB10 ABCD3 ... ZSCAN20 ZSWIM5 ZYG11A ZYG11B ZZZ3
ADD COMMENTlink modified 4 weeks ago • written 5 weeks ago by Jiping Wang70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 93 users visited in the last hour