Question: Inconsistent geo_char for rse_tx
0
gravatar for Jacques.van-Helden
8 months ago by
Jacques.van-Helden0 wrote:

There is a bug with the pheno tables of the rse_tx objects. It occurs with several recount IDs, but not all.

For several experiments, the "characteristics" column of the DataFrame returned by colData(rse_tx) contains strangely placed quotes which perturb the parsing. I paste below a minimal code that reproduces the bug. 

Did anyone face this bug before ? Is there a a trick to circumvent it ?

 

#### Gene-wise counts (this first part works fine) ####

## Download data in rse-gene format
recountID <- "SRP056295"
gene_url <- download_study(project = recountID, type = "rse-gene", download = TRUE)
print(gene_url)

## Load the rse_gene object in memory
load(file.path(recountID, 'rse_gene.Rdata'))

## Extract GEO characteristics from the rse_gene object
gene_geochar <- recount::geo_characteristics(colData(rse_gene))
head(gene_geochar)
table(gene_geochar)

#### Transcript-wise counts #####

## Download the rse-tx object
tx_url <- download_study(project = recountID, type = "rse-tx", download = TRUE)
print(tx_url)

## Inconsistency: the following line fails on Linux systems because the extension
## is RData for transcripts, whereas it is Rdata for genes.
## It works on Mac OS X because the system is flexible with file upper/lower cases.
load(file.path(recountID, 'rse_tx.Rdata'))

## This works on Linux as well as Mac OS X
load(file.path(recountID, 'rse_tx.RData'))

## Extract GEO characteristics from the rse_gene object
tx_geochar <- recount::geo_characteristics(colData(rse_tx))
head(tx_geochar)
table(tx_geochar)

## The bug apparently comes from the pheno table
head(colData(rse_tx)$characteristics)

bug recount • 183 views
ADD COMMENTlink modified 16 days ago by Leonardo Collado Torres660 • written 8 months ago by Jacques.van-Helden0

It would help if you tagged the package that this object comes from  not just 'bug' so the maintainers are notified 

ADD REPLYlink written 8 months ago by shepherl ♦♦ 1.5k

It's not a bug in the software per se, but instead seems to be malformed colData slots in some of the RangedSummarizedExperiments that you can download:

> head(colData(rse_gene)$characteristics)
CharacterList of length 6
[[1]] tissue: Bone marrow cell type: acute myeloid leukemia (AML) cells
[[2]] tissue: Bone marrow cell type: acute myeloid leukemia (AML) cells
[[3]] tissue: Bone marrow cell type: acute myeloid leukemia (AML) cells
[[4]] tissue: Bone marrow cell type: acute myeloid leukemia (AML) cells
[[5]] tissue: Bone marrow cell type: acute myeloid leukemia (AML) cells
[[6]] tissue: Bone marrow cell type: acute myeloid leukemia (AML) cells

> head(colData(rse_tx)$characteristics)
[1] "c(\"tissue: Bone marrow\", \"cell type: acute myeloid leukemia (AML) cells\")"
[2] "c(\"tissue: Bone marrow\", \"cell type: acute myeloid leukemia (AML) cells\")"
[3] "c(\"tissue: Bone marrow\", \"cell type: acute myeloid leukemia (AML) cells\")"
[4] "c(\"tissue: Bone marrow\", \"cell type: acute myeloid leukemia (AML) cells\")"
[5] "c(\"tissue: Bone marrow\", \"cell type: acute myeloid leukemia (AML) cells\")"
[6] "c(\"tissue: Bone marrow\", \"cell type: acute myeloid leukemia (AML) cells\")"

So those might need to be regenerated. In the interim you could convert the characteristics column in the rse_tx colData to a CharacterList, in which case it would work just like the rse_gene.

ADD REPLYlink written 8 months ago by James W. MacDonald51k
Answer: Inconsistent geo_char for rse_tx
0
gravatar for Leonardo Collado Torres
16 days ago by
United States
Leonardo Collado Torres660 wrote:

Hi,

I never got an email for this thread since the recount tag was not used initially.

In any case, I updated recount::geo_characteristics() in version 1.10.12 (BioC 3.9 -- current release) and 1.11.12 (BioC 3.10 -- current devel) such that now, with your code the following runs.

stopifnot(identical(
    geo_characteristics(colData(rse_gene)),
    geo_characteristics(colData(rse_tx))
))

Updating the R package was easier than updating the data itself for now.

Thanks @shepherl and @James W. MacDonald for your replies!

Best, Leo

PS The change is recorded at https://github.com/leekgroup/recount/commit/4a9e36f8b65461a829000040e1f422b51a778fd0 which was an implementation of James' answer:

if(is.character(pheno$characteristics)) {
    ## Solves https://support.bioconductor.org/p/116480/
    pheno$characteristics <- IRanges::CharacterList(
        lapply(lapply(pheno$characteristics, str2lang), eval)
    )
 }
ADD COMMENTlink modified 16 days ago • written 16 days ago by Leonardo Collado Torres660
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 193 users visited in the last hour