first(aln) on GAlignmentPairs object always and only return 2 entries
2
0
Entering edit mode
@marco-blanchette-17000
Last seen 3.2 years ago
US/Santa Cruz/Dovetail Genomics

Am I misunderstanding the usage of first() and second() from the GAlignmentPairs. My understanding is that it should return the first and second mates for each pairs, right? I think it's pretty recent as I have used it before. Also, I'm seeing this with other bam files.

Many thanks

aln <- readGAlignmentPairs("https://dovetail-public.s3-us-west-2.amazonaws.com/TPC-29-37.bam")

length(aln)
# [1] 3164606

length(second(aln))
# [1] 3164606

length(first(aln))
# [1] 2

sessionInfo( )

 R version 4.0.3 (2020-10-10)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Ubuntu 18.04.5 LTS

 Matrix products: default
 BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

 locale:
  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
 [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats4    parallel  stats     graphics  grDevices utils     datasets
 [8] methods   base

 other attached packages:
  [1] coolR_0.0.0.9000            rhdf5_2.32.4
  [3] InteractionSet_1.16.0       tidyr_1.1.2
  [5] dplyr_1.0.2                 ggplot2_3.3.2
  [7] rtracklayer_1.48.0          GenomicAlignments_1.24.0
  [9] Rsamtools_2.4.0             Biostrings_2.56.0
 [11] XVector_0.28.0              SummarizedExperiment_1.18.2
 [13] DelayedArray_0.14.1         matrixStats_0.57.0
 [15] Biobase_2.48.0              GenomicRanges_1.40.0
 [17] GenomeInfoDb_1.24.2         IRanges_2.22.2
 [19] S4Vectors_0.26.1            BiocGenerics_0.34.0

 loaded via a namespace (and not attached):
  [1] Rcpp_1.0.5             compiler_4.0.3         pillar_1.4.6
  [4] bitops_1.0-6           tools_4.0.3            zlibbioc_1.34.0
  [7] lifecycle_0.2.0        tibble_3.0.3           gtable_0.3.0
 [10] lattice_0.20-41        pkgconfig_2.0.3        rlang_0.4.7
 [13] Matrix_1.2-18          GenomeInfoDbData_1.2.3 withr_2.3.0
 [16] generics_0.0.2         vctrs_0.3.4            tidyselect_1.1.0
 [19] grid_4.0.3             glue_1.4.2             R6_2.4.1
 [22] XML_3.99-0.5           BiocParallel_1.22.0    Rhdf5lib_1.10.1
 [25] purrr_0.3.4            magrittr_1.5           scales_1.1.1
 [28] ellipsis_0.3.1         colorspace_1.4-1       RCurl_1.98-1.2
 [31] munsell_0.5.0          crayon_1.3.4
readGAlignmentPairs GenomicAlignments • 936 views
ADD COMMENT
0
Entering edit mode

I wonder if first() is being masked by a similarly named function in another package? What about length(GenomicAlignments::first(aln))

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States

From ?readGAlignmentPairs

> bamfile <- system.file("extdata", "ex1.bam", package="Rsamtools",
                            mustWork=TRUE)
bamfile <- system.file("extdata", "ex1.bam", package="Rsamtools",
+                             mustWork=TRUE)
> galp1 <- readGAlignmentPairs(bamfile)
> length(galp1)
[1] 1572
> length(first(galp1))
[1] 1572
> length(second(galp1))
[1] 1572

You should probably check your install though, using

library(BiocManager)
valid()

That's an old version of GenomicAlignments for your version of R.

> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] GenomicAlignments_1.26.0    Rsamtools_2.6.0            
 [3] Biostrings_2.58.0           XVector_0.30.0             
 [5] SummarizedExperiment_1.20.0 Biobase_2.50.0             
 [7] MatrixGenerics_1.2.0        matrixStats_0.57.0         
 [9] GenomicRanges_1.42.0        GenomeInfoDb_1.26.1        
[11] IRanges_2.24.0              S4Vectors_0.28.0           
[13] BiocGenerics_0.36.0        

loaded via a namespace (and not attached):
 [1] lattice_0.20-41        crayon_1.3.4           bitops_1.0-6          
 [4] grid_4.0.0             zlibbioc_1.36.0        Matrix_1.2-18         
 [7] BiocParallel_1.24.1    tools_4.0.0            RCurl_1.98-1.2        
[10] DelayedArray_0.16.0    compiler_4.0.0         GenomeInfoDbData_1.2.4
ADD COMMENT
0
Entering edit mode
@marco-blanchette-17000
Last seen 3.2 years ago
US/Santa Cruz/Dovetail Genomics

Thanks James, didn't realize I was out of sync with the current BioC version. Didn't fix the issue though.

Thanks Martin, as usual you nailed it. This is a good reminder that I should have pasted a full working example, not lines of code from an already running session. Again, dplyr is throwing me a curve ball... As much as I like some of dplyr functionality, I think that it's behavior is unacceptable... The package should not try to work on any objects it get thrown... been happening way to frequently, anyhow, here is a full working example showing the issue:

library(GenomicAlignments)

aln <- readGAlignmentPairs("https://dovetail-public.s3-us-west-2.amazonaws.com/TPC-29-37.bam")

length(aln)
## [1] 3164606                                                                                                                                                                                            

length(first(aln))
## [1] 3164606                                                                                                                                                                                            

library(dplyr)

length(first(aln))
## [1] 2                                                                                                                                                                                                  

length(GenomicAlignments::first(aln))
## [1] 3164606

Thanks again. my solution from now is to NOT load dplyr on the stack and start using fully referenced method (ie dplyr::first())

Marco

 sessionInfo()
 R version 4.0.3 (2020-10-10)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Ubuntu 18.04.5 LTS

 Matrix products: default
 BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

 locale:
  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
 [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats4    parallel  stats     graphics  grDevices utils     datasets
 [8] methods   base

 other attached packages:
  [1] dplyr_1.0.2                 GenomicAlignments_1.24.0
  [3] Rsamtools_2.4.0             Biostrings_2.56.0
  [5] XVector_0.28.0              SummarizedExperiment_1.18.2
  [7] DelayedArray_0.14.1         matrixStats_0.57.0
  [9] Biobase_2.48.0              GenomicRanges_1.40.0
 [11] GenomeInfoDb_1.24.2         IRanges_2.22.2
 [13] S4Vectors_0.26.1            BiocGenerics_0.34.0

 loaded via a namespace (and not attached):
  [1] magrittr_2.0.1         zlibbioc_1.34.0        tidyselect_1.1.0
  [4] BiocParallel_1.22.0    lattice_0.20-41        R6_2.5.0
  [7] rlang_0.4.9            tools_4.0.3            grid_4.0.3
 [10] ellipsis_0.3.1         tibble_3.0.4           lifecycle_0.2.0
 [13] crayon_1.3.4           Matrix_1.2-18          GenomeInfoDbData_1.2.3
 [16] purrr_0.3.4            vctrs_0.3.5            bitops_1.0-6
 [19] RCurl_1.98-1.2         glue_1.4.2             pillar_1.4.7
 [22] compiler_4.0.3         generics_0.1.0         pkgconfig_2.0.3
ADD COMMENT

Login before adding your answer.

Traffic: 711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6