Hi,
I'm trying to perform a CNV analysis on 12 BAM files from WES data using cn.mops
package. BAM files were aligned to GChr38, markduplicated and recalibrated following the GATK best practices.
BAMFiles <- grep('84|93|117|134|350|454', list.files(path = 'results/Preprocessing/', pattern = '*.recal.bam$', recursive = T, full.names = T), value = T)
segments <- read.table("data/exons_plus50.bed",sep="\t",as.is=TRUE)
gr <- GRanges(segments[,1],IRanges(segments[,2],segments[,3]))
gr2 <- tidyChromosomes(
gr,
keep.X = T,
keep.Y = T,
keep.M = FALSE,
keep.nonstandard = F,
genome = NULL
)
X <- getSegmentReadCountsFromBAM(BAMFiles,GR=gr2, parallel = 16)
The targets bed file (data/exons_plus50.bed
) contains all the exons plus their flanking 50 bp for the corresponding human genome version.
I had to exclude the M and the non-standard (haplotypes) chromosomes because the getSegmentReadCountsFromBAM
function raises an error otherwise.
So far it ran as expected, without errors.
However, when running exomecn.mops
function, it prompts the following:
resCNMOPS <- exomecn.mops(X, parallel = 16)
Normalizing...
Starting local modeling, please be patient...
Reference sequence: chr1
Reference sequence: chr2
Reference sequence: chr3
Reference sequence: chr4
Reference sequence: chr5
Reference sequence: chr6
Reference sequence: chr7
Reference sequence: chr8
Reference sequence: chr9
Reference sequence: chrX
Reference sequence: chrY
Reference sequence: chr10
Reference sequence: chr11
Reference sequence: chr12
Reference sequence: chr13
Reference sequence: chr14
Reference sequence: chr15
Reference sequence: chr16
Reference sequence: chr17
Reference sequence: chr18
Reference sequence: chr19
Reference sequence: chr20
Reference sequence: chr21
Reference sequence: chr22
Starting segmentation algorithm...
Using "fastseg" for segmentation.
Error in .width_as_unnamed_integer(width, msg = "an end that is greater or equal to its start minus one") :
each range must have an end that is greater or equal to its start minus one
Setting the algorithm to DNACopy
gives the same error.
I've tried with the lifted over (GRCh37 -> GRCh38) version of the baits BED file provided by the vendor (the original one has the hg19 version) for the corresponding kit (here) instead of the targets file (containing all exons), but the error persists.
I appreciate any clue about this issue.
sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] BRGenomics_1.6.0 rtracklayer_1.54.0 cn.mops_1.40.0 GenomicRanges_1.46.1
[5] GenomeInfoDb_1.30.1 IRanges_2.28.0 S4Vectors_0.32.4 BiocGenerics_0.40.0
Sorry for the delay. Based on the error message, it looks like at least one of your end coordinates is lower than the corresponding start position. The Truseq bed file I downloaded from the link you provided, does not have this issue, but maybe it is introduced in the liftover step. Can you share the actual bed file you used?
Sure, you can download for the next week on this link: https://we.tl/t-xFqjQLfzKl
While there is interval in your bed file with higher start compared to end, some of the intervals are really small (5-10 bp). I would suggest you remove those and try again since read counting will not work well anyway for such small intervals. For a first test I would filter out all with a width of less than 100bp, you can try adding some of them back later if it works.