Exclude masked rownames in DNAMultipleAlignments
1
0
Entering edit mode
ben.ward ▴ 30
@benward-7169
Last seen 8.7 years ago
United Kingdom

Hi, why do sequence alignments with rowmasks still return the names of every row? Is there an easy way to get only the names of the sequences affected by the mask?

e.g.

origMAlign <-

+ readDNAMultipleAlignment(filepath =

+ system.file("extdata",

+ "msx2_mRNA.aln",

+ package="Biostrings"),

+ format="clustal")

rowmask(origMAlign, invert = TRUE) <- c(1,2)

rownames(origMAlign)

In the above example I'd like to get only the first two sequence names.

I can do it like so:

rownames(origMAlign)[which(!as.logical((coverage(rowmask(origMAlign)))))]

But that feels hacky.

Thanks,

Ben.

multiple sequences masks rowmask rownames biostrings • 1.3k views
ADD COMMENT
2
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States

I don't see an exported function that helps with this and the expert is out of town. Not sure this solution is any less hacky than yours:

rownames(origMAlign)[unlist(as.list(rowmask(origMAlign)))]

Valerie

ADD COMMENT
0
Entering edit mode

@Val: to get only the names of the masked sequences you need a minus before the unlist():

rownames(origMAlign)[-unlist(as.list(rowmask(origMAlign)))]

@Ben: a slightly simpler way is:

rownames(origMAlign)[-as.integer(rowmask(origMAlign))]

H.

ADD REPLY

Login before adding your answer.

Traffic: 543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6