Error ambiguity characters in sequences
Bruno • 0
Last seen 15 days ago

Hi all,

I am trying to compile the updated genome of the model plant Arabidopsis thaliana, from TAIR10. I am using the function forgeBSgenomeDataPkgFromNCBI but I am running to the error that the data contains ambiguity characters in sequences. I used Biostrings::replaceAmbiguities() but I am not sure how to save the updated version and I don't know what to do from that point.

forgeBSgenomeDataPkgFromNCBI(assembly_accession="GCF_000001735.4", pkg_maintainer="Bruno Guillotin", organism="Arabidopsis thaliana", destdir=tempdir())
Warning in .extract_NCBI_assembly_info(assembly_accession, chrominfo, organism = organism,  :
  "GCF_000001735.4" is a registered NCBI assembly for organism
  "Arabidopsis thaliana" --> ignoring supplied 'organism' argument
trying URL ''
Content type 'application/x-gzip' length 37482399 bytes (35.7 MB)
downloaded 35.7 MB

Error in .local(object, con, format, ...) : 
  One or more strings contain unsupported ambiguity characters.
Strings can contain only A, C, G, T or N.
See Biostrings::replaceAmbiguities().

#### i did 
filepath <- downloadGenomicSequencesFromNCBI("GCF_000001735.4", destdir=tempdir()) 
genomic_sequences <- readDNAStringSet(filepath) 
genomic_sequences2 <- replaceAmbiguities(genomic_sequences , new="N")
 #Then ?....

I would also like to rename the different strings of the DNAStringSet as each chromosome have names such as NC_003070.9 and not chr1, chr2 etc....

Thanks in advance and sorry if it is an obvious question. Bruno

BSgenomeForge • 65 views

