Question

Error ambiguity characters in sequences

0

Entering edit mode

Bruno • 0

@d209d072

Last seen 7 months ago

France

Hi all,

I am trying to compile the updated genome of the model plant Arabidopsis thaliana, from TAIR10. I am using the function forgeBSgenomeDataPkgFromNCBI but I am running to the error that the data contains ambiguity characters in sequences. I used Biostrings::replaceAmbiguities() but I am not sure how to save the updated version and I don't know what to do from that point.

forgeBSgenomeDataPkgFromNCBI(assembly_accession="GCF_000001735.4", pkg_maintainer="Bruno Guillotin", organism="Arabidopsis thaliana", destdir=tempdir())
Warning in .extract_NCBI_assembly_info(assembly_accession, chrominfo, organism = organism,  :
  "GCF_000001735.4" is a registered NCBI assembly for organism
  "Arabidopsis thaliana" --> ignoring supplied 'organism' argument
trying URL 'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/735/GCF_000001735.4_TAIR10.1/GCF_000001735.4_TAIR10.1_genomic.fna.gz'
Content type 'application/x-gzip' length 37482399 bytes (35.7 MB)
==================================================
downloaded 35.7 MB

Error in .local(object, con, format, ...) : 
  One or more strings contain unsupported ambiguity characters.
Strings can contain only A, C, G, T or N.
See Biostrings::replaceAmbiguities().

#### i did 
filepath <- downloadGenomicSequencesFromNCBI("GCF_000001735.4", destdir=tempdir()) 
genomic_sequences <- readDNAStringSet(filepath) 
genomic_sequences
genomic_sequences2 <- replaceAmbiguities(genomic_sequences , new="N")
 #Then ?....

I would also like to rename the different strings of the DNAStringSet as each chromosome have names such as NC_003070.9 and not chr1, chr2 etc....

Thanks in advance and sorry if it is an obvious question. Bruno

BSgenomeForge • 294 views

ADD COMMENT • link 7 months ago Bruno • 0