Identify in tandem repeats with Bioconductor
1
0
Entering edit mode
@vinicius-henrique-da-silva-6713
Last seen 11 months ago
Brazil

I would like to identify the regions with repeated patterns in a given genome. Let's say that I need to identify [TA]n regions, were 'n' is a variable number of repeats.

I thought in a loop to resolve the problem, however, it will take a long time and will produce redundant regions. Thus, I would like to know if there is a efficient way to analyze that.

library("Biostrings")
G = readDNAStringSet("any.fa")

seqAll <- seq(from =1 , to =1000, by=1) 
ali <- NULL

for(k in 1:length(seqAll)){
nx <- seqAll[k]

patx <- paste(rep("AT",nx), sep="", collapse="")

ali[k] <- vmatchPattern(DNAString(patx), G, max.mismatch=0)
}
biostrings • 1.1k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States

Hi Vinicius,

You might want to check this post for a more efficient approach:

    A: Is there any package helps finding Tandem Repeats ?

Cheers,

H.

ADD COMMENT

Login before adding your answer.

Traffic: 979 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6