Given a multifasta file and for each sequence, shuffle middle part of the sequence leaving the flanks intact and the opposite
1
0
Entering edit mode
@dimitris-polychronopoulos-9192
Last seen 6.7 years ago
United Kingdom

Hi,

I was wondering whether there is any function from a package in Bioconductor which, given a multiFASTA file and for each sequence, shuffles the middle part of the sequence leaving the flanks intact and the opposite? I guess "middle" and "flanks" have to be specified by the user.

Thanks,

Dimitris

shuffle regioner seqinr • 1.2k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States

seqinr isn't s a Bioconductor package. You could however use Biostrings to read the fasta file

library(Biostrings)
dna = readDNAStringSet("your.fasta")

then select the part that you want to 'shuffle', split it into individual characters, sample the characters, and paste pack together

mid = subseq(dna, 5, 10)
shuffled = lapply(strsplit(mid, ""), function(elt) paste(sample(elt), collapse=""))

then update the original

subseq(dna, 5, 10) = shuffled

There are more flexible ways of choosing the 'middle', but that requires more information on how you'd like to define that.

If the purpose is to randomize the middle parts compared to the flanking, then maybe

subseq(dna, 5, 10) = sample(subseq(dna(5, 10))
ADD COMMENT

Login before adding your answer.

Traffic: 530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6