The issue that you describe with the tumor mutational burden (TMB) plots appearing linear and lacking consistent TMB values across samples may arise from several potential problems in your variant calling or post-processing pipeline. Since you mentioned using an inbuilt loop for filtering variants up to the generation of variant call format (VCF) files, and the loop executed without errors, the problem likely occurs during the TMB calculation or plotting stage rather than in the initial variant calling.
First, verify that your filtering loop correctly identifies somatic variants. In whole exome sequencing data from tumor tissues, without matched normal samples, distinguishing somatic mutations from germline variants can be challenging. If your loop did not apply appropriate filters (such as allele frequency thresholds, population database exclusions like gnomAD, or annotation with tools like ANNOVAR or VEP), the VCF files may include excessive non-somatic variants, leading to inflated or uniform TMB estimates. Re-examine your loop code to ensure it incorporates somatic-specific criteria.
Second, the TMB calculation itself may be incorrect. TMB is typically computed as the number of non-synonymous somatic mutations per megabase of the exome. If you omitted division by the effective exome size (often around 30-50 megabases for capture kits), or if you included all variants without filtering for coding regions, the values could appear artificially linear when plotted. In R/Bioconductor, if you are using packages like maftools for TMB estimation, ensure your code resembles the following:
library(maftools)
# Load your MAF file derived from VCF
maf <- read.maf(maf = "your_maf_file.maf")
# Calculate TMB with exome size
tmb_values <- tmb(maf = maf, captureSize = 50) # Adjust captureSize based on your kit
# Plot TMB
plotTMB(tmb_values)
Adjust the captureSize parameter to match your sequencing kit's targeted region size, as incorrect values can produce flat or linear plots.
Third, inspect your plotting code. A linear appearance may result from sorting samples by TMB value inadvertently, creating a monotonic line instead of a bar plot or boxplot. Use ggplot2 for explicit control:
library(ggplot2)
ggplot(tmb_values, aes(x = Tumor_Sample_Barcode, y = total_perMB)) +
geom_bar(stat = "identity") +
theme_minimal()
Finally, confirm that your VCF files contain variable mutation counts across samples by manually checking a subset with bcftools stats. If counts are identical, revisit the input FASTQ alignment and calling steps, as uniform depth or contamination could be factors.
Kevin