Question

Pysam pileup and Rsamtools pileup output discrepancy

0

Entering edit mode

Xinqi • 0

@b0780501

Last seen 22 months ago

United States

I have mRNA sequencing data that I've aligned to a genome. Specifically, I am interested in determining the total number of reads at each base and the percentage of occurrences of A, T, G, C, deletions, and insertions at these bases. I have utilized both pysam pileup and Rsamtools pileup to achieve this. However, I noticed that their outputs are inconsistent, even though I used the same BAM file as input for both tools. Below is the code I implemented:

Pysam:

bamfile = AlignmentFile(bam_dir, "rb")
pileups = bamfile.pileup('chr1', 944202, 959308, truncate=True, min_mapping_quality=20)
columns=['chr', 'position', "coverage", 'A', 'T', 'G', 'C', '-', '+'] 
df = pd.DataFrame(columns=columns)
for pileupcolumn in pileups:
    data = []
    for pileupread in pileupcolumn.pileups:
        if pileupread.is_del:
            data.append('-')
        elif pileupread.indel:
            data.append('+')
        else:
            data.append(pileupread.alignment.query_sequence[pileupread.query_position])
    base_counts = Counter(data)  
    row = {
        'chr': 'chr1',
        'position': pileupcolumn.pos+1,
        'coverage': pileupcolumn.n,
        'A': base_counts['A'],
        'T': base_counts['T'],
        'G': base_counts['G'],
        'C': base_counts['C'],
        '-': base_counts['-'],
        '+': base_counts['+']
    }
    df = df.append(row, ignore_index=True)

Rsamtools

param <- ScanBamParam(which=GRanges(strand = "-",
                                seqnames = "chr1",
                                ranges = IRanges(start=944203, 
                                                 end=959309)))
pilup_params =  Rsamtools::PileupParam(max_depth = 102500,
                                   min_mapq = 20,
                                   distinguish_nucleotides = T,
                                   ignore_query_Ns = F, 
                                   include_insertions =  T,
                                   distinguish_strands = F)
bam.pileup = pileup(bamfile,
                scanBamParam=param,
                pileupParam = pilup_params)
bam.pileup = bam.pileup[order(bam.pileup$pos), ]

The read count outputs from pysam and Rsamtools vary significantly, as can be seen in the attached pictures. Have I made any critical errors in my approach?

enter image description here

Pysam pileup Rsamtools RNASeq • 1.1k views

ADD COMMENT • link 2.3 years ago Xinqi • 0