Question: sequence positions change in seqPattern plots with few matching sites
0
gravatar for danlu
3 months ago by
danlu0
danlu0 wrote:

Hi,

I noticed that in the seqPattern plots, the positions of sequences seemed to change in plots where there are less matches. Below shows motif matches of the same input sequences, and from left to right the motifs have increasing complexity. The dots (motif match) in each plot should be a subset of dots (at the exact position) of any of the plot to its left. But the section above the top dash line (the line position is chosen randomly as reference) seems different across plots, and the section below the bottom dash line seemed consistent. From the biology of the sequences, the plot with CACACT is what I would expect. I only noticed this behavior with this particular motif, which has a lot more sequences with no matching site at all comparing to other motifs.

I tried all versions of R/Bioconductor our cluster has and the result is the same. Any suggestions would be appreciated.

Input sequences download

Script:

library(seqPattern)
library(Biostrings)

w=800
h=2500

plot_pattern = function(fa,nm) {

    plotPatternDensityMap(regionsSeq = fa, patterns = "YGGYMACACT", color = "gray", outFile = paste0(nm, "Density_ohler1_YGGYMACACT"), plotWidth = w, plotHeight =h, addReferenceLine=F, plotScale=F, cexAxis=6, xTicksAt=c(1,150,300), xTicks=c("-150","TSS","150"), addPatternLabel=F)

    plotPatternDensityMap(regionsSeq = fa, patterns = "GGTCACACT", color = "gray", outFile = paste0(nm, "Density_ohler1_GGTCACACT"), plotWidth = w, plotHeight =h, addReferenceLine=F, plotScale=F, cexAxis=6, xTicksAt=c(1,150,300), xTicks=c("-150","TSS","150"), addPatternLabel=F)

    plotPatternDensityMap(regionsSeq = fa, patterns = "TCACACT", color = "gray", outFile = paste0(nm, "Density_ohler1_TCACACT"), plotWidth = w, plotHeight =h, addReferenceLine=F, plotScale=F, cexAxis=6, xTicksAt=c(1,150,300), xTicks=c("-150","TSS","150"), addPatternLabel=F)

    plotPatternDensityMap(regionsSeq = fa, patterns = "CACACT", color = "gray", outFile = paste0(nm, "Density_ohler1_CACACT"), plotWidth = w, plotHeight =h, addReferenceLine=F, plotScale=F, cexAxis=6, xTicksAt=c(1,150,300), xTicks=c("-150","TSS","150"), addPatternLabel=F)

}


fa_list=list.files(pattern = '*.fa$')  # the list has only 1 .fa file in this case
print(fa_list)

for (nm in fa_list) {
    fa = readDNAStringSet(nm, format="fasta")
    plot_pattern(fa, nm)
}

Run log and sessionInfo:

Currently Loaded Modules:
  1) gcc/7.3.0-centos_7   2) r/3.5.0



Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
    colnames, colSums, dirname, do.call, duplicated, eval, evalq,
    Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
    lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
    rowSums, sapply, setdiff, sort, table, tapply, union, unique,
    unsplit, which, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

    strsplit

[1] "intq_mid_spgnchr_g1_g2BH_Up_g2BH_DOWN_no_break.fa"

Getting oligonucleotide occurrence matrix...

Calculating density...
->YGGYMACACT

Plotting...
->YGGYMACACT

Getting oligonucleotide occurrence matrix...

Calculating density...
->GGTCACACT

Plotting...
->GGTCACACT

Getting oligonucleotide occurrence matrix...

Calculating density...
->TCACACT

Plotting...
->TCACACT

Getting oligonucleotide occurrence matrix...

Calculating density...
->CACACT

Plotting...
->CACACT
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /usr/lib64/libblas.so.3.4.2
LAPACK: /scg/apps/software/r/3.5.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] Biostrings_2.50.2   XVector_0.22.0      IRanges_2.16.0     
[4] S4Vectors_0.20.1    BiocGenerics_0.28.0 seqPattern_1.14.0  

loaded via a namespace (and not attached):
[1] zlibbioc_1.28.0        compiler_3.5.0         GenomicRanges_1.34.0  
[4] GenomeInfoDbData_1.2.0 RCurl_1.95-4.12        KernSmooth_2.23-15    
[7] plotrix_3.7-5          GenomeInfoDb_1.18.2    bitops_1.0-6          
seqpattern • 84 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by danlu0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 154 users visited in the last hour