Package needs internet access
1
0
Entering edit mode
firestar ▴ 20
@rmf-13755
Last seen 5 days ago
Sweden

I am using this package BSgenome.Hsapiens.UCSC.hg38 in a container on a compute cluster without internet access. When using this package, it attempts to download data and fails:

Error in download.file(url, destfile, quiet = TRUE) :
  cannot open URL 'https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/chromInfo.txt.gz'
Calls: seqlevelsStyle<- ... .fetch_chrom_sizes_from_UCSC_database -> fetch_table_dump_from_UCSC -> fetch_table_from_url
Execution halted

Is there a way to point to a local path? What is the best way to deal with such issues in an offline environment?

Update: Added more complete code.

This is my code for running Cicero on Seurat objects based on this Signac vignette.

library(Seurat)
library(Signac)
library(cicero)
library(BSgenome.Hsapiens.UCSC.hg38)
library(dplyr)

seqlevelsStyle(BSgenome.Hsapiens.UCSC.hg38) <- "NCBI"
seqnames(BSgenome.Hsapiens.UCSC.hg38) <- BSgenome.Hsapiens.UCSC.hg38@seqinfo@seqnames

sf <- readRDS(file.path(path,"seurat.rds"))
mo <- SeuratWrappers::as.cell_data_set(sf)
co <- make_cicero_cds(mo, reduced_coordinates = reducedDims(mo)$UMAP)

# get the chromosome sizes from the Seurat object
genome <- as.data.frame(seqinfo(BSgenome.Hsapiens.UCSC.hg38)) %>%
  tibble::rownames_to_column("chr") %>%
  select(chr,seqlengths) %>%
  slice(1:25)

conns <- run_cicero(co, genomic_coords = genome, sample_num = 100)
ccans <- generate_ccans(conns)
links <- ConnectionsToLinks(conns = conns, ccans = ccans)
Links(sf) <- links
BSgenome BSgenome.Hsapiens.UCSC.hg38 Offline • 333 views
ADD COMMENT
0
Entering edit mode

It's not clear to me why (or that) seqlevelsStyle should be called on a BSgenome object. You will need to provide more code that precedes the error so we can understand what you are trying to do.

ADD REPLY
0
Entering edit mode

Updated with more code.

ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 4 days ago
United States

It's going to need internet access. When you switch to NCBI seqlevels, you are calling this function

.fetch_chrom_sizes_from_UCSC_database <- function(genome,
    goldenPath.url=getOption("UCSC.goldenPath.url"))
{
    col2class <- c(chrom="character", size="integer", fileName="NULL")
    ans <- fetch_table_dump_from_UCSC(genome, "chromInfo",
                                      col2class=col2class,
                                      goldenPath.url=goldenPath.url)
    ## Some sanity checks that should never fail.
    in_what <- paste0("\"chromInfo\" table for UCSC genome ", genome)
    .check_chrom_sizes(ans, in_what)
    ans
}

Which is in the GenomeInfoDb package. If you had a local UCSC genome browser DB running, you could use that, but it would be easier to just modify the seqlevels on that BSgenome object on a laptop or whatever and save as an RDS object that you could then, uh, put on a thumb drive and plug into your airgapped computer?

0
Entering edit mode

That worked! Thanks! I created the genome file locally and uploaded that. I didn't really think that BSgenome.Hsapiens.UCSC.hg38 was only used to create that genome file.

ADD REPLY
0
Entering edit mode

I am not sure what you mean by 'only used to create that genome file'. What's happening is you are converting the seqlevels of an existing BSgenome object from the default of UCSC to NCBI (+/- removing the Chr from all the chromosome names, but the haplotype and unplaced scaffold names get changed as well). I am assuming you are doing this because your Seurat data are based on either NCBI or Ensembl data, so the chromosome names don't have the prepended Chr?

> library(BSgenome.Hsapiens.UCSC.hg38)
> seqinfo(Hsapiens)
Seqinfo object with 711 sequences (1 circular) from hg38 genome:
  seqnames             seqlengths isCircular genome
  chr1                  248956422      FALSE   hg38
  chr2                  242193529      FALSE   hg38
  chr3                  198295559      FALSE   hg38
  chr4                  190214555      FALSE   hg38
  chr5                  181538259      FALSE   hg38
  ...                         ...        ...    ...
  chr22_KQ759761v1_alt     145162      FALSE   hg38
  chrX_KV766199v1_alt      188004      FALSE   hg38
  chrX_MU273395v1_alt      619716      FALSE   hg38
  chrX_MU273396v1_alt      294119      FALSE   hg38
  chrX_MU273397v1_alt      330493      FALSE   hg38
> seqlevelsStyle(Hsapiens) <- "NCBI"
> seqinfo(Hsapiens)
Seqinfo object with 711 sequences (1 circular) from 2 genomes (GRCh38.p14, GRCh38.p13):
  seqnames       seqlengths isCircular     genome
  1               248956422      FALSE GRCh38.p14
  2               242193529      FALSE GRCh38.p14
  3               198295559      FALSE GRCh38.p14
  4               190214555      FALSE GRCh38.p14
  5               181538259      FALSE GRCh38.p14
  ...                   ...        ...        ...
  HSCHR22_8_CTG1     145162      FALSE GRCh38.p14
  HSCHRX_3_CTG7      188004      FALSE GRCh38.p14
  HSCHRX_1_CTG14     619716      FALSE GRCh38.p14
  HSCHRX_2_CTG14     294119      FALSE GRCh38.p14
  HSCHRX_3_CTG3      330493      FALSE GRCh38.p14
ADD REPLY

Login before adding your answer.

Traffic: 761 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6