There is a bug with the pheno tables of the rse_tx objects. It occurs with several recount IDs, but not all.
For several experiments, the "characteristics" column of the DataFrame returned by colData(rse_tx) contains strangely placed quotes which perturb the parsing. I paste below a minimal code that reproduces the bug.
Did anyone face this bug before ? Is there a a trick to circumvent it ?
#### Gene-wise counts (this first part works fine) ####
## Download data in rse-gene format
recountID <- "SRP056295"
gene_url <- download_study(project = recountID, type = "rse-gene", download = TRUE)
print(gene_url)
## Load the rse_gene object in memory
load(file.path(recountID, 'rse_gene.Rdata'))
## Extract GEO characteristics from the rse_gene object
gene_geochar <- recount::geo_characteristics(colData(rse_gene))
head(gene_geochar)
table(gene_geochar)
#### Transcript-wise counts #####
## Download the rse-tx object
tx_url <- download_study(project = recountID, type = "rse-tx", download = TRUE)
print(tx_url)
## Inconsistency: the following line fails on Linux systems because the extension
## is RData for transcripts, whereas it is Rdata for genes.
## It works on Mac OS X because the system is flexible with file upper/lower cases.
load(file.path(recountID, 'rse_tx.Rdata'))
## This works on Linux as well as Mac OS X
load(file.path(recountID, 'rse_tx.RData'))
## Extract GEO characteristics from the rse_gene object
tx_geochar <- recount::geo_characteristics(colData(rse_tx))
head(tx_geochar)
table(tx_geochar)
## The bug apparently comes from the pheno table
head(colData(rse_tx)$characteristics)
It would help if you tagged the package that this object comes from not just 'bug' so the maintainers are notified
It's not a bug in the software per se, but instead seems to be malformed
colData
slots in some of theRangedSummarizedExperiments
that you can download:So those might need to be regenerated. In the interim you could convert the characteristics column in the rse_tx
colData
to aCharacterList
, in which case it would work just like the rse_gene.