Question

recount3 - download bigwig files for TCGA data

1

Entering edit mode

Xuebing ▴ 10

@f6ce8073

Last seen 2.8 years ago

United States

How can I download all bigwig files for TCGA samples? I noticed an answer was provided previously for recount2 but it doesn’t work for recount3:

Recount2 Bigwigs for TCGA

Thanks!

recount3 recount TCGA • 1.5k views

ADD COMMENT • link updated 2.8 years ago by Leonardo Collado Torres ★ 1.1k • written 2.8 years ago by Xuebing ▴ 10

score 0 · Answer 1 · 2022-03-29

Hi,

Thank you for your interest in recount3 and recount2. The easiest option in recount3 to find the URLs for BigWig files is to use the recount3::create_rse() function which will include a colData() column called BigWigURL as shown at https://github.com/LieberInstitute/recount3/issues/21#issuecomment-1074156958. Here's a short extract:

as.data.frame(colData(rse)[1, c("external_id", "study", "BigWigURL")])
#>                                                 external_id study
#> GTEX-T6MN-0011-R1A-SM-32QOY.1 GTEX-T6MN-0011-R1A-SM-32QOY.1 BRAIN
#>                                                                                                                                                             BigWigURL
#> GTEX-T6MN-0011-R1A-SM-32QOY.1 http://duffel.rail.bio/recount3/human/data_sources/gtex/base_sums/IN/BRAIN/OY/gtex.base_sums.BRAIN_GTEX-T6MN-0011-R1A-SM-32QOY.1.ALL.bw

You could also use recount3::locate_url(), however as noted at https://github.com/LieberInstitute/recount3/issues/21#issuecomment-1074156958, that function doesn't guarantee that the result is a valid URL due to programmatic reasons from the data host side (IDIES at JHU).

Using recount3::create_rse() at the gene level might be a bit too much data to download for a large project such as TCGA (which is split by tissue as is GTEx), so you might prefer to dive into the internal code of recount3::create_rse_manual() and re-use it https://github.com/LieberInstitute/recount3/blob/6eb14b844062ebdf45fe5a356577e3ea0483c97e/R/create_rse_manual.R#L156-L165 after downloading the TCGA metadata files.

As you can see, there are a few different options, with different degrees of complexity.

Once you have located the URLs, you can use recount3::file_retrieve() which uses internally BiocFileCache::bfcrpath() https://github.com/LieberInstitute/recount3/blob/6eb14b844062ebdf45fe5a356577e3ea0483c97e/R/file_retrieve.R#L80 or download them through some other way including recount::download_retry() which uses internally downloader::download() https://github.com/leekgroup/recount/blob/10f29f9d44906f798aa3a7655ae40ac269c36ae5/R/download_retry.R#L39.

Best, Leo