Visualising read alignments in
0
0
Entering edit mode
Joshua • 0
@joshua-25027
Last seen 3.0 years ago

Hello community,

I have some DNA sequencing data that I am analyzing using R (I am a fairly inexperienced user). For downstream analysis, I need to check if the reads found within a .bam file are aligned correctly. Currently I am performing the following:

library(Rsamtools)
library(GenomicAlignments)

param <- ScanBamParam(what="seq") #sets up the parameter to read

gal <- readGAlignments("file.bam", param=param) #opens the BAM file
qseq <- mcols(gal)$seq # extracts the sequencing data
qseq_on_ref <- sequenceLayer(qseq, cigar(gal)) # aligns the sequencing data
qseq_on_ref

This gives me the following (see picture - colours are automatically applied?)

>qseq_on_ref
DNAStringSet object of length 47135:
        width seq
    [1]   280 TTCACGCACGTGA-CTATGAACTCAGGAGTCGACGATTT...AGGCGGCCGTGACTCTAGATCATAATCAGCCATACCAC
    [2]   274 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTT...CTTAATAGGCGGCCGCGACTCTAGATCATAATCAGCCA
    [3]   280 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTT...AGGCGGCCGCGAATCTAGATCATAATCAGCCATACCAC
    [4]   280 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTT...AGGCGGCCGCGACTCTAGATCATAATCAGCCATACCAC
    [5]   280 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTT...AGGCGGCCGCGACTCTAGATCATAATCAGCCATACCAC
    ...   ... ...
[47131]   280 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTT...AGGCGGCCGCGACTCTAGATCATAATCAGCCATACCAC
[47132]   180 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTT...ATCGAGGGATTCATCGAAAACGGATGGGAAGGCATGAT
[47133]   280 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTT...AGGCGGCCGCGACTCTAGATCATAATCAGCCATACCAC
[47134]   280 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTT...AGGCGGCCGCGACTCTAGATCATAATCAGCCATACCAC
[47135]   280 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTT...AGGCGGCCGCGACTCTAGATCATAATCAGCCATACCAC

Coloured output of sequencing alignment in R

I am stuck here because I can't expand the ' ... ' in the middle of the sequence. I also can't display more than 10 sequences on the screen.

q_seq_on_ref[1:20]

The above only displays 1-5 then 16-20:

[1]   280 TTCACGCACGTGA-CTATGAACTCAGGAGTCGACGATTTTGACCTTGACATGCTCCCCGGGGGATC...GCCCAGGGCGAGGGCACGGCCGCTTAATAGGCGGCCGTGACTCTAGATCATAATCAGCCATACCAC
 [2]   274 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTTTGACCTTGACATGCTCCCCGGGGGATC...CAGAACGCCCAGGGCGAGGGCACGGCCGCTTAATAGGCGGCCGCGACTCTAGATCATAATCAGCCA
 [3]   280 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTTTGACCTTGACATGCTCCCCGGGGGATC...GCCCAGGGCGAGGGCACGGCCGCTTAATAGGCGGCCGCGAATCTAGATCATAATCAGCCATACCAC
 [4]   280 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTTTGACCTTGACATGCTCCCCGGGGGATC...GCCCAGGGCGAGGGCACGGCCGCTTAATAGGCGGCCGCGACTCTAGATCATAATCAGCCATACCAC
 [5]   280 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTTTGACCTTGACATGCTCCCCGGGGGATC...GCCCAGGGCGAGGGCACGGCCGCTTAATAGGCGGCCGCGACTCTAGATCATAATCAGCCATACCAC
 ...   ... ...
[16]   279 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTTTGACCTTGACATGCTCCCCGGGGGATC...CGCCCAGGGCGAGGGCACGGCCGCTTAATAGGCGGCCGCGACTCTAGATCATAATCAGCCATACCA
[17]   280 TTCACGCACGTGACCTATGATCTCAGGAGTCGACGATTTTGACCTTGACATGCTCCCCGGGGGATC...GCCCAGGGCGAGGGCACGGCCGCTTAATAGGCGGCCGCGACTCTAGATCATAATCAGCCATACCAC
[18]   280 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTTTGACCTTGACATGCTCCCCGGGGGATC...GCCCAGGGCGAGGGCACGGCCGCTTAATAGGCGGCCGCGACTCTAGATCATAATCAGCCATACCAC
[19]   280 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTTTGACCTTGACATGCTCCCCGGGGGATC...GCCCAGGGCGAGGGCACGGCCGCTTAATAGGCGGCCGCGACTCTAGATCATAATCAGCCATACCAC
[20]   280 TTCACGCACGTGACCTATGAACTCAGGAGTCGACGATTTTGACCTTGACATGCTCCCCGGGGGATC...GCCCAGGGCGAGGGCACGGCCGCTTAATAGGCGGCCGCGACTCTAGATCATAATCAGCCATACCAC

I suspect the answers to this will be fairly trivial - I am not used to working with this type of ouput in R.

Any suggestions on

  1. Viewing the full range (1 to 47135) of sequencing reads.
  2. Viewing full length of sequencing reads.
  3. Alternative methods to view data.

Would be greatly appreciated.

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] FastqCleaner_1.8.0          shinyBS_0.61                shiny_1.6.0                 stringr_1.4.0               ShortRead_1.48.0           
 [6] GenomicAlignments_1.26.0    SummarizedExperiment_1.20.0 Biobase_2.50.0              MatrixGenerics_1.2.1        matrixStats_0.58.0         
[11] Rsamtools_2.6.0             GenomicRanges_1.42.0        GenomeInfoDb_1.26.2         Biostrings_2.58.0           XVector_0.30.0             
[16] IRanges_2.24.1              S4Vectors_0.28.1            BiocParallel_1.24.1         BiocGenerics_0.36.0        

loaded via a namespace (and not attached):
 [1] bslib_0.2.4            lattice_0.20-41        htmltools_0.5.1.1      yaml_2.2.1             rlang_0.4.10           jquerylib_0.1.3       
 [7] later_1.1.0.1          withr_2.4.1            RColorBrewer_1.1-2     jpeg_0.1-8.1           GenomeInfoDbData_1.2.4 lifecycle_1.0.0       
[13] zlibbioc_1.36.0        hwriter_1.3.2          htmlwidgets_1.5.3      latticeExtra_0.6-29    fastmap_1.1.0          crosstalk_1.1.1       
[19] httpuv_1.5.5           Rcpp_1.0.6             xtable_1.8-4           promises_1.2.0.1       DT_0.17                BiocManager_1.30.10   
[25] cachem_1.0.4           DelayedArray_0.16.2    jsonlite_1.7.2         mime_0.10              png_0.1-7              digest_0.6.27         
[31] stringi_1.5.3          grid_4.0.3             tools_4.0.3            bitops_1.0-6           magrittr_2.0.1         sass_0.3.1            
[37] RCurl_1.98-1.2         crayon_1.4.1           ellipsis_0.3.1         Matrix_1.3-2           rstudioapi_0.13        R6_2.5.0              
[43] compiler_4.0.3
GenomicAlignments Rsamtools • 886 views
ADD COMMENT
0
Entering edit mode

Can you say more about what your goals are? Trying to look at 47,000 sequences in order to see if they are 'aligned correctly' doesn't seem like a useful exercise. I mean that's a lot of sequences. And what's the definition/criteria for correct alignments in this context, and how would you know if you have fulfilled those criteria by eyeballing a bunch of letters on your screen?

The idea behind lots of the functions in Bioconductor (like the show method for a DNAStringSet) is that A.) It's not useful to crank out a bunch of stuff on the console because how would you ever even look at that, and B.) when dealing with large data it's easy enough to say 'hmm, let me look at this' and end up with R compliantly printing out a metric gobton of data while you are maniacally punching on the keyboard trying to get it to just stop. So by definition showing the data will just provide the small snippet you see, which is probably more useful than any other alternative.

ADD REPLY

Login before adding your answer.

Traffic: 764 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6