DNAStringSet to DNAStringSetList according to pattern in sequence names
1
0
Entering edit mode
s.ghignone ▴ 10
@sghignone-7573
Last seen 6.0 years ago
European Union/Italy/Turin/CNR

Given a very simple DNAStringSet, built like this:

afastafile <- DNAStringSet(c("GCAAATGGG", "CCCGGGTT", "AAAGGGTT", "TTTGGGCC"))
names(afastafile) <- c("ABC1_1", "ABC2_1", "ABC3_1", "ABC1_2")

I would get a DNAStringSetList where the list elements are grouped by a pattern in the sequence name;
in this example, I would get a list of 3 (ABC*) elements,  with the first element containing sequence #1 and #4 (ABC1_1 and ABC1_2), and so on...

dnastringset dnastringsetlist seqnames • 934 views
ADD COMMENT
2
Entering edit mode
s.ghignone ▴ 10
@sghignone-7573
Last seen 6.0 years ago
European Union/Italy/Turin/CNR

This code should work for the given example:

splitAsList(afastafile, levels(as.factor(gsub("_\\d", "", names(afastafile)))))

For datasets with more complex sequence naming schema, this is the working code (using "fct_inorder" from the package forcats):

( all.cds.list<-splitAsList(all.cds, fct_inorder(sub('(^[^_]+_[^_]+_[^_]+)_(.*)$', "\\2", names(all.cds)))) )

Hope it helps,

s.-

ADD COMMENT

Login before adding your answer.

Traffic: 667 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6