1
0
Entering edit mode
ben.ward ▴ 30
@benward-7169
Last seen 5.8 years ago
United Kingdom

Hi, why do sequence alignments with rowmasks still return the names of every row? Is there an easy way to get only the names of the sequences affected by the mask?

e.g.

origMAlign <-

+ system.file("extdata",

+ "msx2_mRNA.aln",

+ package="Biostrings"),

+ format="clustal")

rowmask(origMAlign, invert = TRUE) <- c(1,2)

rownames(origMAlign)

In the above example I'd like to get only the first two sequence names.

I can do it like so:

rownames(origMAlign)[which(!as.logical((coverage(rowmask(origMAlign)))))]

But that feels hacky.

Thanks,

Ben.

2
Entering edit mode
@valerie-obenchain-4275
Last seen 9 days ago
United States

I don't see an exported function that helps with this and the expert is out of town. Not sure this solution is any less hacky than yours:

rownames(origMAlign)[unlist(as.list(rowmask(origMAlign)))]

Valerie

0
Entering edit mode

@Val: to get only the names of the masked sequences you need a minus before the unlist():

rownames(origMAlign)[-unlist(as.list(rowmask(origMAlign)))]

@Ben: a slightly simpler way is:

rownames(origMAlign)[-as.integer(rowmask(origMAlign))]

H.