Error in .Call2("new_XStringSet_from_CHARACTER", class(x0), elementType(x0), : key 51 (char '3') not in lookup table
2
0
Entering edit mode
Eman • 0
@89baaf3e
Last seen 4 months ago

Hi,

I need to rename the DNA sequences of a phyloseq object with ASV1,2,3 and then attach the taxonomy to each new ASV name, so I could export the renamed sequences as a fasta file. I did this successfully with a phyloseq object of PacBio sequences generated from DADA2 in R, however, when I used the same code with a phyloseq object of MiSeq sequences generated from exported files from Qiime2, it did not work and I got the error below. I checked the sequences and they do not have character 3 as mentioned in the error. Also, this is a copy of displayed refseq that I want to rename

DNAStringSet object of length 300:
width seq                                                                                                                                                                                  names
[1]   422 AATTCCAGCTCCAGTAGCGTATATTAAAGTTGTTGCA...ATTGATCAAGAACGAAGGTTAGGGGATCAAAAACGAT dbb34b29572a04b85...
[2]   256 CACCGGCGGCCCGAGTGGTGATCGTGATTATTGGGTC...GACGGTGAGGGACGAAAGCTGGGGGCACGAACCGGAT 2e526fb2f04bd2256...
[3]   256 AATACCAGCACCCCGAGTGGTCGGGACGATTATTGGG...CGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGA 4d7c52dbb5a1ec9e1...
[4]   255 AATACCAGCACCCCGAGTGGTCGGGACGTTTATTGGG...TCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGG ccde686df510e9ac2...
[5]   254 AATACGAGAGGGGTAAGCATTATTCATCATTAATGGG...GACGCTGAGGTACGAAAGTATGGGGAGCAAAACGGAT 930ed6799694562ae...
...   ... ...
[296]   255 AATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGG...CTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGG 16a154a51bcfba90e...
[297]   253 TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCG...CTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGG 5673ae66819e3bdbb...
[298]   253 TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCG...CTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGG 6110f28a1de867a99...
[299]   253 TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCG...CTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGG 119b7061439276460...
[300]   253 TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCG...CTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGG 945184b6386c192c0...


Code should be placed in three backticks as shown below

dna <- Biostrings::DNAStringSet(taxa_names(ps.gen))
names(dna) <- taxa_names(ps.gen)
ps.gen <- merge_phyloseq(ps.gen, dna)
taxa_names(ps.gen) <- paste0("ASV", seq(ntaxa(ps.gen)))
ps.gen

I got this error from the first line of the code

Error in .Call2("new_XStringSet_from_CHARACTER", class(x0), elementType(x0), :
key 51 (char '3') not in lookup table


Biostrings phyloseq • 340 views
0
Entering edit mode

Are you sure you want to use taxa_names() in dna <- Biostrings::DNAStringSet(taxa_names(ps.gen)) ? I don't know the phyloseq package, but it seems more likely that you'd want to use a function that extracts the sequences rather than the names here.

0
Entering edit mode
@james-w-macdonald-5106
Last seen 19 hours ago
United States

You show a DNAStringSet that cannot be the result from your first line of code (because it errors out), so it is not possible for anybody to help other than to say that for sure there is a 3 in whatever taxa_names(ps.gen) returns.

> DNAStringSet("ATCCTCGCG3TTC")
Error in .Call2("new_XString_from_CHARACTER", class(x0), string, start,  :
key 51 (char '3') not in lookup table

0
Entering edit mode
Eman • 0
@89baaf3e
Last seen 4 months ago

I checked the sequences fasta file, and I did not find any numbers across the sequences. However, I am not sure whether the featureID/sequenceID could make a problem, but it should not (please see below)

>dbb34b29572a04b85dc4566611fe65ec
AATTCCAGCTCCAGTAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTCGAACTTCGGGCCTGGCGGGACGGTC
CGCCTTACGGTGTGTACTGTCCGGCCGGGTCTTACCTCTTGGTGAGCCCGTATGCCCTTTACTGGGTGTGCGGTGGAACC
AAGAATTTTACCTTGAGAAAATTAGAGTGTTCAAAGCAGGCATAAGCCCGAATACATTAGCATGGAATAATAGAATAGGA
CGTGCGGTTCTATTTTGTTGGTTTCTAGGATCGCCGTAATGATTAATAGGGACGGTCGGGGGCATTAGTATTCAGTTGCT
AGAGGTGAAATTCTTAGATTTACTGAAGACTAACTTCTGCGAAAGCATTTGCCAAGGACGTTTTCATTGATCAAGAACGA
AGGTTAGGGGATCAAAAACGAT
>2e526fb2f04bd2256bddbc35953efda6
CACCGGCGGCCCGAGTGGTGATCGTGATTATTGGGTCTAAAGGGTCCGTAGCCGGTTTGGTCAGTCCTCCGGGAAATCTG
ATAGCTCAACTATTAGGCTTTCGGGGGATACTGCCAGACTTGGAACCGGGAGAGGTAAGAGGTACTACAGGGGTAGGAGT
GAAATCTTGTAATCCCTGTGGGACCACCTGTGGCGAAGGCGTCTTACCAGAACGGGTTCGACGGTGAGGGACGAAAGCTG
GGGGCACGAACCGGAT


Also, I found my error is repeated with other users, but I could not find a clear solution for this issue on forums. I am happy to share the sequence file here if you do not mind.

0
Entering edit mode

You could share the sequence file if you like. If it's public, just point to where it is. But I don't think it's the names. Here's a fake fasta file I made:

> cat(readLines("fakefasta.fa"), sep = "\n")
> 3049fje0r9uejf03e49u
ACACATACATAGAGAGATCTCGATCGTAG
> 3485f9eufje9ufhe4
GCTCGCTAGCTGATCGATGTGATAGCTG

DNAStringSet object of length 2:
width seq                                               names
[1]    29 ACACATACATAGAGAGATCTCGATCGTAG                      3049fje0r9uejf03...
[2]    28 GCTCGCTAGCTGATCGATGTGATAGCTG                       3485f9eufje9ufhe4