Assembly report error for hg19 in GenomeInfoDb
1
0
Entering edit mode
@roshniroy16-22026
Last seen 5.0 years ago

I am working with HTGTS Transloc-pipeline. While working with the example dataset, I managed to run the first two steps TranslocPreprocess.pl and TranslocWrapper.pl) and got the desired output files (tlx files)

TranslocPreprocess.pl tutorialmetadata.txt preprocess/ --read1 pooledR1.fq.gz --read2 pooledR2.fq.gz TranslocWrapper.pl tutorialmetadata.txt preprocess/ results/ --threads 2

In the next step, while running TranslocHotSpots.R with command-

$ TranslocHotSpots.R /scratch/royr6/results/RAG1CSRep1/RAG1CSRep1result.tlx /scratch/royr6/results/RAG1CSRep1/output I kept getting this error message-

Error in .make_assembly_report_URL(assembly_accession) :
  don't know where to find assembly report for GCF_000001405.13
Calls: Seqinfo ... FUN -> fetch_assembly_report -> .make_assembly_report_URL

On looking around, I find that this error is associated with GenomeInfoDb (https://rdrr.io/bioc/GenomeInfoDb/src/R/assembly-utils.R) as I get the same message when I type this in R-

> BiocManager::install(c("GenomeInfoDb","BSgenome"))
> options(download.file.method="libcurl")
> library("GenomeInfoDb")
> library("BSgenome)
> GenomeInfoDb::Seqinfo(genome = "hg19")

Error in .make_assembly_report_URL(assembly_accession) :
  don't know where to find assembly report for GCF_000001405.13

I am stuck and would really appreciate any help in this regard.

software error GenomeInfoDb HTGTS Transloc-pipeline TranslocHotspots.R • 3.3k views
ADD COMMENT
0
Entering edit mode

Hi,

2 problems with your post:

  1. The tag you used (software error) is too general. Please use a package specific tag (GenomeInfoDb in this case). This will help other users of the support site find questions/answers about the package and will notify the GenomeInfoDb maintainers that a questions was asked about this package.

  2. Please show the code you used that generates the error you got. Ideally you should try to provide a minimal self-contained working example. And also don't forget to provide your sessionInfo().

More details about these things in our Posting Guide (make sure you read it).

Thanks,

H.

ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 3 months ago
United States

This followup was posted through a different channel:

    > sessionInfo()
    R version 3.6.0 (2019-04-26)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: CentOS Linux 7 (Core)

    Matrix products: default
    BLAS/LAPACK: /usr/local/intel/compilers_and_libraries_2019.1.144/linux/mkl/lib/intel64_lin/libmkl_rt.so

    locale:
     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
     [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
     [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
     [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
     [9] LC_ADDRESS=C               LC_TELEPHONE=C
    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

    attached base packages:
    [1] stats4    parallel  stats     graphics  grDevices utils     datasets
    [8] methods   base

    other attached packages:
    [1] GenomeInfoDb_1.21.2 IRanges_2.18.3      S4Vectors_0.22.1
    [4] BiocGenerics_0.31.6

    loaded via a namespace (and not attached):
    [1] compiler_3.6.0         GenomeInfoDbData_1.2.1 RCurl_1.95-4.12
    [4] bitops_1.0-6

The code tries to read

url = "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/"
xx = RCurl::getURL(url)

and for me I get a directory listing

> cat(xx)
dr-xr-xr-x   2 ftp      anonymous     4096 Oct 13  2016 GCF_000001405.10_NCBI34
dr-xr-xr-x   2 ftp      anonymous     4096 Oct 13  2016 GCF_000001405.11_NCBI35
dr-xr-xr-x   2 ftp      anonymous     4096 Oct 13  2016 GCF_000001405.12_NCBI36
dr-xr-xr-x   2 ftp      anonymous     4096 Oct 13  2016 GCF_000001405.13_GRCh37
...

What do you get? (please use the 'ADD COMMENT' button below this post to reply...)

ADD COMMENT
0
Entering edit mode

Thank you for your reply Martin. When I type the commands-

url = "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/"
xx = RCurl::getURL(url)
cat(xx)

I get this HUUGE list of comments- (reduced a few lines due to space constraints)

http://www.w3.org/TR/html4/strict.dtd">
<html><head>
<meta type="copyright" content="Copyright (C) 1996-2016 The Squid Software Foundation and contributors">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Directory: <a href="<a href=" rel="nofollow">ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/</a>" rel="nofollow"><a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/" rel="nofollow">ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/</a>
<style type="text/css">

</body></html>
ADD REPLY
0
Entering edit mode

Well, I know what's going on but don't know how to solve it. Your institution is using 'squid', which is a proxy that caches web pages. For some reason, it has cached the ftp request as an html document, or is trying to say that it can't display the document, perhaps because a port is blocked -- you could try to cut and paste the url into a browser and see what happens...

I think my advice is to reach out to your local help desk with the minimal example

url = "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/"
xx = RCurl::getURL(url)

indicating that you are trying to use curl to access an ftp web site; the command line equivalent and expected output is simply

$ curl ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/
dr-xr-xr-x   2 ftp      anonymous     4096 Oct 13  2016 GCF_000001405.10_NCBI34
dr-xr-xr-x   2 ftp      anonymous     4096 Oct 13  2016 GCF_000001405.11_NCBI35
dr-xr-xr-x   2 ftp      anonymous     4096 Oct 13  2016 GCF_000001405.12_NCBI36
...
ADD REPLY
0
Entering edit mode

I will definitely get in touch with the help desk. Thank you for your feedback.

ADD REPLY
0
Entering edit mode

Hi Martin, I'm running into similar issues where a simple seqlevelsStyle(x) triggers this communication with ftp servers that won't work in certain proxy settings. Is there a way to avoid this communication? In my case, it's pretty much just renaming chr1, ..., chrX, chrY to 1,X,Y.

ADD REPLY
0
Entering edit mode

Just adding:

seqlevelsStyle(x) <- "NCBI"
Error in .form_assembly_report_url(assembly_accession) : 
  don't know where to find assembly report for GCF_000001405.38
seqlevelsStyle(x) <- "Ensembl" # works

(using latest stable with BiocManager::valid() passing.)

ADD REPLY

Login before adding your answer.

Traffic: 898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6