Errors using wavClusteR package - "range cannot be determined from the supplied arguments (too many NAs)"
3
1
Entering edit mode
S.k ▴ 10
@sk-10106
Last seen 8.0 years ago

I am working on PAR-CLIP analysis using the wavClusteR package. When I try to load a bam file, I get the following error:

 

> filename<-("test.bam")
> Bam <- readSortedBam(filename = filename)
> Bam
> countTable <- getAllSub( Bam, minCov = 10 )
Loading required package: doMC
Considering substitutions, n = 1999, processing in 2 chunks
   chunk #: 1
   chunk #: 2
Error in { :
  task 1 failed - "solving row 247: range cannot be determined from the supplied arguments (too many NAs)"
In addition: Warning message:
executing %dopar% sequentially: no parallel backend registered

 

Any ideas on what could be causing this? I suspect this it has something to do with GRanges. Here is the portion of the file that is being noted in the error message:

 

> Bam[247]
GRanges object with 1 range and 2 metadata columns:
      seqnames               ranges strand |
         <Rle>            <IRanges>  <Rle> |
  [1]     chr1 [91853066, 91853115]      + |
                                                    qseq          MD
                                          <DNAStringSet> <character>
  [1] ACTGAGCTCGCCTTAGGACACCTGCGTTACCGTTTGACAGGTGTACCGCC        45A3
  -------
  seqinfo: 25 sequences from an unspecified genome; no seqlengths

 

This seems pretty normal, though. Suggestions?

 

Session info:

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.3 (Final)

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] wavClusteR_2.4.1     Rsamtools_1.22.0     Biostrings_2.38.4
[4] XVector_0.10.0       GenomicRanges_1.22.4 GenomeInfoDb_1.6.3
[7] IRanges_2.4.8        S4Vectors_0.8.11     BiocGenerics_0.16.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.4                wmtsa_2.0-0
 [3] compiler_3.2.3             RColorBrewer_1.1-2
 [5] futile.logger_1.4.1        plyr_1.8.3
 [7] GenomicFeatures_1.22.13    bitops_1.0-6
 [9] futile.options_1.0.0       iterators_1.0.8
[11] tools_3.2.3                zlibbioc_1.16.0
[13] rpart_4.1-10               biomaRt_2.26.1
[15] mclust_5.2                 lattice_0.20-33
[17] RSQLite_1.0.0              gtable_0.2.0
[19] foreach_1.4.3              DBI_0.3.1
[21] gridExtra_2.2.1            cluster_2.0.3
[23] rtracklayer_1.30.4         stringr_1.0.0
[25] ifultools_2.0-1            ade4_1.7-4
[27] nnet_7.3-12                grid_3.2.3
[29] Biobase_2.30.0             AnnotationDbi_1.32.3
[31] survival_2.38-3            XML_3.98-1.4
[33] BiocParallel_1.4.3         foreign_0.8-66
[35] latticeExtra_0.6-28        Formula_1.2-1
[37] seqinr_3.1-3               ggplot2_2.1.0
[39] lambda.r_1.1.7             splus2R_1.2-0
[41] magrittr_1.5               splines_3.2.3
[43] Hmisc_3.17-3               MASS_7.3-45
[45] scales_0.4.0               codetools_0.2-14
[47] GenomicAlignments_1.6.3    SummarizedExperiment_1.0.2
[49] colorspace_1.2-6           stringi_1.0-1
[51] acepack_1.3-3.3            RCurl_1.95-4.8
[53] munsell_0.4.3
wavClusteR • 2.7k views
ADD COMMENT
2
Entering edit mode
@federicocomoglio-4524
Last seen 6.7 years ago
Switzerland

Hi Stephen,

to generate a substitution count table, wavClusteR splits a sorted Bam file by strand, flags all entries with an MD value other than perfect matches, and generates a GRangesList object where each slot contains the set of genomic regions that share the same MD. Elements of this list are then processed in chunks of size 1000 (using multiple cores if enabled). 

You sent me a chunk of the Bam file that was sufficient to generate the error you reported. This error is caused by the alignment:

$10^A38 
GRanges object with 1 range and 2 metadata columns:
      seqnames             ranges strand |                                               qseq          MD
         <Rle>          <IRanges>  <Rle> |                                     <DNAStringSet> <character>
  [1]    chr21 [9827478, 9827527]      + | CCGCAGGCGCGCAATTACCCACTCCCGACCCGGGGAGGTAGTGACGAAAA      10^A38
 

because the MD value 10^A38 (10 matches followed by deletion of one A and 38 matches) contains an indel. wavCluster does not support indels. How did you align you PAR-CLIP data? 

 

Let me know. 

Hope this helps,

Federico

ADD COMMENT
0
Entering edit mode

This was a paired-end alignment to mm10 with bowtie2, the command used was something like this:

bowtie2 --threads "$THREADS" --local -x "$Genome_path" -q -1 "${FASTQ_R1}.trim" -2 "${FASTQ_R2}.trim" | samtools view -@ "$THREADS" -Sb1 - | samtools sort -m 10G -@ "$THREADS" - "$SAMPLEID"
ADD REPLY
0
Entering edit mode

Thank you. The problem was then generated by Bowtie2, which allows gapped alignments. I would suggest realigning your data with bowtie. Let me know.

ADD REPLY
0
Entering edit mode

Federico, thanks for your help. I repeated the alignment on that test sample with bowtie, and was able to progress past the pipeline step in the OP which was previously giving an error. This is the alignment command I used (using only one of the paired reads):

zcat ${tmp_fastq1_gz} | bowtie "$tmp_genome_path" --threads 4 -v 2 -m 10 --best --strata - -S ${tmp_sampleID}.1.sam

However I am currently having issues downstream in the pipeline, at the filterClusters step. If I am unable to troubleshoot it, I will create a new support thread on here about it.

ADD REPLY
0
Entering edit mode

Sure, keep me posted.

ADD REPLY
0
Entering edit mode

Hi Federico,

Thanks for your help. I am having another issue, which I have posted here: Error with wavClusteR annotateClusters function

 

ADD REPLY
0
Entering edit mode
@federicocomoglio-4524
Last seen 6.7 years ago
Switzerland

Hi Stephen,

thank you for posting your issue on the support site. I realised the tag for the threads related to wavClusteR has been consistently misspelled to wavcluter. Is it possible to edit it?

Re. your issue, would it be possible for you to try out v2.5.2 in BioC devel and report back? As we discussed earlier today by email, the issue you reported has nothing to do with line 247 in your Bam file. Rather, it refers to the row 247 in the count table.

I will work on it and report back asap.

 

Let me know.

Federico

 

 

ADD COMMENT
0
Entering edit mode

Thanks for looking into this. I tried to change the tag but I think that the forum here is intentionally using lowercase for the tags.

ADD REPLY
0
Entering edit mode

Federico mentioned that there was a tag 'wavcluter' (note spelling); this tag has now been removed.

ADD REPLY
0
Entering edit mode

Thank you, Martin. I am still working on the issue and I hope to get back with a proper answer in 2-3 days.

ADD REPLY
0
Entering edit mode
@federicocomoglio-4524
Last seen 6.7 years ago
Switzerland

Hi Stephen,

did you have the chance to run your code with version 2.5.2 of the package?

ADD COMMENT
0
Entering edit mode

Sorry I haven't figured out yet how to force a download for v2.5.2, I see it listed here:

https://www.bioconductor.org/packages/3.3/bioc/html/wavClusteR.html

But when I run the commands listed :

source("https://bioconductor.org/biocLite.R")
biocLite("wavClusteR")

It gives me the 2.4.1 version instead. Perhaps because I am using Bioconductor 3.2 instead of Bioconductor 3.3 (developer version)? I haven't been able to find clear instructions on how to set that up either.

ADD REPLY
0
Entering edit mode

Federico,

I installed the developer version of R 3.3 for OS X and tried it again, and got the same error with the same data set, using version 2.5.2 of wavClusteR.

> countTable <- getAllSub( Bam, minCov = 10 )
Loading required package: doMC
Considering substitutions, n = 1999, processing in 2 chunks
   chunk #: 1
   chunk #: 2
Error in { :
  task 1 failed - "solving row 247: range cannot be determined from the supplied arguments (too many NAs)"
In addition: Warning message:
executing %dopar% sequentially: no parallel backend registered

 

> sessionInfo()
R version 3.3.0 beta (2016-04-21 r70529)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] wavClusteR_2.5.2      Rsamtools_1.23.8      Biostrings_2.39.14
 [4] XVector_0.11.8        GenomicRanges_1.23.27 GenomeInfoDb_1.7.6
 [7] IRanges_2.5.46        S4Vectors_0.9.51      BiocGenerics_0.17.5
[10] BiocInstaller_1.21.4

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.4.5               wmtsa_2.0-0
 [3] compiler_3.3.0              RColorBrewer_1.1-2
 [5] plyr_1.8.3                  GenomicFeatures_1.23.30
 [7] bitops_1.0-6                iterators_1.0.8
 [9] tools_3.3.0                 zlibbioc_1.17.1
[11] rpart_4.1-10                biomaRt_2.27.2
[13] mclust_5.2                  lattice_0.20-33
[15] RSQLite_1.0.0               gtable_0.2.0
[17] Matrix_1.2-5                foreach_1.4.3
[19] DBI_0.3.1                   gridExtra_2.2.1
[21] cluster_2.0.4               rtracklayer_1.31.10
[23] stringr_1.0.0               ifultools_2.0-1
[25] ade4_1.7-4                  nnet_7.3-12
[27] grid_3.3.0                  Biobase_2.31.3
[29] AnnotationDbi_1.33.12       survival_2.39-2
[31] XML_3.98-1.4                BiocParallel_1.5.21
[33] foreign_0.8-66              latticeExtra_0.6-28
[35] Formula_1.2-1               seqinr_3.1-3
[37] ggplot2_2.1.0               splus2R_1.2-0
[39] magrittr_1.5                splines_3.3.0
[41] Hmisc_3.17-3                MASS_7.3-45
[43] scales_0.4.0                codetools_0.2-14
[45] GenomicAlignments_1.7.21    SummarizedExperiment_1.1.25
[47] colorspace_1.2-6            stringi_1.0-1
[49] acepack_1.3-3.3             RCurl_1.95-4.8
[51] munsell_0.4.3
ADD REPLY
0
Entering edit mode

Thanks a lot, I will get back to you asap.

ADD REPLY

Login before adding your answer.

Traffic: 917 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6