Question: Inconsistent geo_char for rse_tx
0
gravatar for Jacques.van-Helden
10 months ago by
Jacques.van-Helden0 wrote:

There is a bug with the pheno tables of the rse_tx objects. It occurs with several recount IDs, but not all.

For several experiments, the "characteristics" column of the DataFrame returned by colData(rse_tx) contains strangely placed quotes which perturb the parsing. I paste below a minimal code that reproduces the bug. 

Did anyone face this bug before ? Is there a a trick to circumvent it ?

 

#### Gene-wise counts (this first part works fine) ####

## Download data in rse-gene format
recountID <- "SRP056295"
gene_url <- download_study(project = recountID, type = "rse-gene", download = TRUE)
print(gene_url)

## Load the rse_gene object in memory
load(file.path(recountID, 'rse_gene.Rdata'))

## Extract GEO characteristics from the rse_gene object
gene_geochar <- recount::geo_characteristics(colData(rse_gene))
head(gene_geochar)
table(gene_geochar)

#### Transcript-wise counts #####

## Download the rse-tx object
tx_url <- download_study(project = recountID, type = "rse-tx", download = TRUE)
print(tx_url)

## Inconsistency: the following line fails on Linux systems because the extension
## is RData for transcripts, whereas it is Rdata for genes.
## It works on Mac OS X because the system is flexible with file upper/lower cases.
load(file.path(recountID, 'rse_tx.Rdata'))

## This works on Linux as well as Mac OS X
load(file.path(recountID, 'rse_tx.RData'))

## Extract GEO characteristics from the rse_gene object
tx_geochar <- recount::geo_characteristics(colData(rse_tx))
head(tx_geochar)
table(tx_geochar)

## The bug apparently comes from the pheno table
head(colData(rse_tx)$characteristics)

bug recount • 222 views
ADD COMMENTlink modified 4 weeks ago • written 10 months ago by Jacques.van-Helden0

It would help if you tagged the package that this object comes from  not just 'bug' so the maintainers are notified 

ADD REPLYlink written 10 months ago by shepherl ♦♦ 1.7k

It's not a bug in the software per se, but instead seems to be malformed colData slots in some of the RangedSummarizedExperiments that you can download:

> head(colData(rse_gene)$characteristics)
CharacterList of length 6
[[1]] tissue: Bone marrow cell type: acute myeloid leukemia (AML) cells
[[2]] tissue: Bone marrow cell type: acute myeloid leukemia (AML) cells
[[3]] tissue: Bone marrow cell type: acute myeloid leukemia (AML) cells
[[4]] tissue: Bone marrow cell type: acute myeloid leukemia (AML) cells
[[5]] tissue: Bone marrow cell type: acute myeloid leukemia (AML) cells
[[6]] tissue: Bone marrow cell type: acute myeloid leukemia (AML) cells

> head(colData(rse_tx)$characteristics)
[1] "c(\"tissue: Bone marrow\", \"cell type: acute myeloid leukemia (AML) cells\")"
[2] "c(\"tissue: Bone marrow\", \"cell type: acute myeloid leukemia (AML) cells\")"
[3] "c(\"tissue: Bone marrow\", \"cell type: acute myeloid leukemia (AML) cells\")"
[4] "c(\"tissue: Bone marrow\", \"cell type: acute myeloid leukemia (AML) cells\")"
[5] "c(\"tissue: Bone marrow\", \"cell type: acute myeloid leukemia (AML) cells\")"
[6] "c(\"tissue: Bone marrow\", \"cell type: acute myeloid leukemia (AML) cells\")"

So those might need to be regenerated. In the interim you could convert the characteristics column in the rse_tx colData to a CharacterList, in which case it would work just like the rse_gene.

ADD REPLYlink written 10 months ago by James W. MacDonald51k
Answer: Inconsistent geo_char for rse_tx
0
gravatar for Leonardo Collado Torres
11 weeks ago by
United States
Leonardo Collado Torres710 wrote:

Hi,

I never got an email for this thread since the recount tag was not used initially.

In any case, I updated recount::geo_characteristics() in version 1.10.12 (BioC 3.9 -- current release) and 1.11.12 (BioC 3.10 -- current devel) such that now, with your code the following runs.

stopifnot(identical(
    geo_characteristics(colData(rse_gene)),
    geo_characteristics(colData(rse_tx))
))

Updating the R package was easier than updating the data itself for now.

Thanks @shepherl and @James W. MacDonald for your replies!

Best, Leo

PS The change is recorded at https://github.com/leekgroup/recount/commit/4a9e36f8b65461a829000040e1f422b51a778fd0 which was an implementation of James' answer:

if(is.character(pheno$characteristics)) {
    ## Solves https://support.bioconductor.org/p/116480/
    pheno$characteristics <- IRanges::CharacterList(
        lapply(lapply(pheno$characteristics, str2lang), eval)
    )
 }
ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by Leonardo Collado Torres710
Answer: Inconsistent geo_char for rse_tx
0
gravatar for Jacques.van-Helden
4 weeks ago by
Jacques.van-Helden0 wrote:

Hi Leonardo,

Thanks for the fix, the test now runs fine.

And thanks fro recount, a great package providing researchers with instant access to thousands of RNA-seq datasets.

Best regards,

Jacques

ADD COMMENTlink written 4 weeks ago by Jacques.van-Helden0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour