I have a dataframe with a column 'sequences'. It contains stretches of nucleotides in this column. I would like to create a new column which essentially introduces a mutation in the sequence, for example replacing "ACA" with "ATA". Importantly, I would like to do this at a specific position, for example, position 2. Therefore, the sequence: "ACAACA" would become "ATAACA". If the sequence did not contain the pattern "ACA" I would like the sequence to remain unchanged.
I can see the replace replaceAt() you can specify x (in this case a DNAstringSet object which is the 'sequences' column) and you can set the position (IRanges(1, 3) would be the range for position 1 to 3 in the sequence) and the replacement ("ATA") but it will replace any sequence at this position. Any idea of how to make this specific to a sequence? I think I could possibly write an ifelse statement with a grep/regex in it to achieve this, but eventually, I would like to build a loop and replace the static "ACA" and "ATA" with either vectors or dataframe columns with lists of mutations to iterate through. Any help would be very welcome!!
#example data for convenience Read <- c("1","2","3","4") Sequences <- c("ATACCCACG", "AAAGGGAAT", "GCCGATGCG", "ACCAAATCC") df <- data.frame(Read,Sequences) # Almost works df$Mut <- replaceAt(DNAStringSet(df$Sequences), IRanges(1, 3), "ATA") #sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041)