Hi,
I was wondering if there is a way of specifying an argument to readDNAStringSet to ignore non-DNA characters? I have a set of .fasta files from IMGT that include their preferred gap notation, i.e. sequences that have some DNA strings followed by ..... followed by more DNA.
`>sequence1
GCTTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCTCCAGGAGCACCTCTGGGGGCACAGC.............GGCCCTGGGCTGCCTGGTCAAGGACTACTTCCCCGAACCGGTGACGGTGTC
So far I have used readDNAStringSet, followed by as.character, followed by str_remove, and finally DNAStringSet again. See below.
sequence1 <- readDNAStringSet("sequence1.fa")
sequence1 <- sequence1 |>
as.character() |>
str_remove(pattern = "\\.") |>
DNAStringSet() |>
`names<-`(names(sequence1))
Feels rather clunky. Is there a way of just skipping over (ignoring) non-DNA charcters when reading in? I've tried to add the seqtype because in other functions it seems to do what I need
readDNAStringSet("sequence1.fa", seqtype = "DNA")
but just get this error:
Error in readDNAStringSet("sequence1.fa", seqtype = "DNA") : unused argument (seqtype = "DNA")
Thanks, Miha

I would try to avoid converting to and from
characterobjects, you'll lose the main point of having objects asXString/XStringSet.'.'is a valid character for DNAString objects (you can see them all by printingDNA_ALPHABET), so it won't create an error here.This is also discouraged -- I would recommend reading the help pages for functions to see what the arguments are. In this case, you can see (by checking
?readDNAStringSet) thatreadDNAStringSetdoesn't accept an argument called "seqtype", so trying to pass it will create an error.There isn't currently a way to skip over gap characters when reading in the sequence. You can remove them without converting to
characterusingDECIPHER::RemoveGaps, e.g.:If you're trying to just skip over any character not in
DNA_ALPHABET, you won't be able to do that out of the box. Non-valid characters will throw an error, so you'll have to usereadBStringSetto read it in as aBStringSetand then work with it from there. If that's your use-case feel free to send more info; there's probably a better way but it'll depend on what you're trying to do. For this example you're sharing, the above should be enough.DECIPHER does the trick, thank you!