Pysam pileup and Rsamtools pileup output discrepancy
0
0
Entering edit mode
Xinqi • 0
@b0780501
Last seen 8 months ago
United States

I have mRNA sequencing data that I've aligned to a genome. Specifically, I am interested in determining the total number of reads at each base and the percentage of occurrences of A, T, G, C, deletions, and insertions at these bases. I have utilized both pysam pileup and Rsamtools pileup to achieve this. However, I noticed that their outputs are inconsistent, even though I used the same BAM file as input for both tools. Below is the code I implemented:

Pysam:

bamfile = AlignmentFile(bam_dir, "rb")
pileups = bamfile.pileup('chr1', 944202, 959308, truncate=True, min_mapping_quality=20)
columns=['chr', 'position', "coverage", 'A', 'T', 'G', 'C', '-', '+'] 
df = pd.DataFrame(columns=columns)
for pileupcolumn in pileups:
    data = []
    for pileupread in pileupcolumn.pileups:
        if pileupread.is_del:
            data.append('-')
        elif pileupread.indel:
            data.append('+')
        else:
            data.append(pileupread.alignment.query_sequence[pileupread.query_position])
    base_counts = Counter(data)  
    row = {
        'chr': 'chr1',
        'position': pileupcolumn.pos+1,
        'coverage': pileupcolumn.n,
        'A': base_counts['A'],
        'T': base_counts['T'],
        'G': base_counts['G'],
        'C': base_counts['C'],
        '-': base_counts['-'],
        '+': base_counts['+']
    }
    df = df.append(row, ignore_index=True)

Rsamtools

param <- ScanBamParam(which=GRanges(strand = "-",
                                seqnames = "chr1",
                                ranges = IRanges(start=944203, 
                                                 end=959309)))
pilup_params =  Rsamtools::PileupParam(max_depth = 102500,
                                   min_mapq = 20,
                                   distinguish_nucleotides = T,
                                   ignore_query_Ns = F, 
                                   include_insertions =  T,
                                   distinguish_strands = F)
bam.pileup = pileup(bamfile,
                scanBamParam=param,
                pileupParam = pilup_params)
bam.pileup = bam.pileup[order(bam.pileup$pos), ]

The read count outputs from pysam and Rsamtools vary significantly, as can be seen in the attached pictures. Have I made any critical errors in my approach?

enter image description here enter image description here

Pysam pileup Rsamtools RNASeq • 581 views
ADD COMMENT

Login before adding your answer.

Traffic: 908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6