sequence positions change in seqPattern plots with few matching sites
0
0
Entering edit mode
danlu • 0
@danlu-21300
Last seen 2.2 years ago

Hi,

I noticed that in the seqPattern plots, the positions of sequences seemed to change in plots where there are less matches. Below shows motif matches of the same input sequences, and from left to right the motifs have increasing complexity. The dots (motif match) in each plot should be a subset of dots (at the exact position) of any of the plot to its left. But the section above the top dash line (the line position is chosen randomly as reference) seems different across plots, and the section below the bottom dash line seemed consistent. From the biology of the sequences, the plot with CACACT is what I would expect. I only noticed this behavior with this particular motif, which has a lot more sequences with no matching site at all comparing to other motifs.

I tried all versions of R/Bioconductor our cluster has and the result is the same. Any suggestions would be appreciated.

Input sequences download

Script:

library(seqPattern)
library(Biostrings)

w=800
h=2500

plot_pattern = function(fa,nm) {

    plotPatternDensityMap(regionsSeq = fa, patterns = "YGGYMACACT", color = "gray", outFile = paste0(nm, "Density_ohler1_YGGYMACACT"), plotWidth = w, plotHeight =h, addReferenceLine=F, plotScale=F, cexAxis=6, xTicksAt=c(1,150,300), xTicks=c("-150","TSS","150"), addPatternLabel=F)

    plotPatternDensityMap(regionsSeq = fa, patterns = "GGTCACACT", color = "gray", outFile = paste0(nm, "Density_ohler1_GGTCACACT"), plotWidth = w, plotHeight =h, addReferenceLine=F, plotScale=F, cexAxis=6, xTicksAt=c(1,150,300), xTicks=c("-150","TSS","150"), addPatternLabel=F)

    plotPatternDensityMap(regionsSeq = fa, patterns = "TCACACT", color = "gray", outFile = paste0(nm, "Density_ohler1_TCACACT"), plotWidth = w, plotHeight =h, addReferenceLine=F, plotScale=F, cexAxis=6, xTicksAt=c(1,150,300), xTicks=c("-150","TSS","150"), addPatternLabel=F)

    plotPatternDensityMap(regionsSeq = fa, patterns = "CACACT", color = "gray", outFile = paste0(nm, "Density_ohler1_CACACT"), plotWidth = w, plotHeight =h, addReferenceLine=F, plotScale=F, cexAxis=6, xTicksAt=c(1,150,300), xTicks=c("-150","TSS","150"), addPatternLabel=F)

}


fa_list=list.files(pattern = '*.fa$')  # the list has only 1 .fa file in this case
print(fa_list)

for (nm in fa_list) {
    fa = readDNAStringSet(nm, format="fasta")
    plot_pattern(fa, nm)
}

Run log and sessionInfo:

Currently Loaded Modules:
  1) gcc/7.3.0-centos_7   2) r/3.5.0



Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
    colnames, colSums, dirname, do.call, duplicated, eval, evalq,
    Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
    lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
    rowSums, sapply, setdiff, sort, table, tapply, union, unique,
    unsplit, which, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

    strsplit

[1] "intq_mid_spgnchr_g1_g2BH_Up_g2BH_DOWN_no_break.fa"

Getting oligonucleotide occurrence matrix...

Calculating density...
->YGGYMACACT

Plotting...
->YGGYMACACT

Getting oligonucleotide occurrence matrix...

Calculating density...
->GGTCACACT

Plotting...
->GGTCACACT

Getting oligonucleotide occurrence matrix...

Calculating density...
->TCACACT

Plotting...
->TCACACT

Getting oligonucleotide occurrence matrix...

Calculating density...
->CACACT

Plotting...
->CACACT
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /usr/lib64/libblas.so.3.4.2
LAPACK: /scg/apps/software/r/3.5.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] Biostrings_2.50.2   XVector_0.22.0      IRanges_2.16.0     
[4] S4Vectors_0.20.1    BiocGenerics_0.28.0 seqPattern_1.14.0  

loaded via a namespace (and not attached):
[1] zlibbioc_1.28.0        compiler_3.5.0         GenomicRanges_1.34.0  
[4] GenomeInfoDbData_1.2.0 RCurl_1.95-4.12        KernSmooth_2.23-15    
[7] plotrix_3.7-5          GenomeInfoDb_1.18.2    bitops_1.0-6          
seqPattern • 208 views
ADD COMMENT

Login before adding your answer.

Traffic: 366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6