Getting pheno tables from recount datasets without any NA values in characteristics and geo_accession fields and also without duplicated row.names
Dear sir’s Bioconductor developers,

I have intrinsic question about the recount repository “datasets” I work to make advanced statistics analysis for the most of the recount dataset,

we noticed that the most of pheno tables in recount have the NA values for the characteristics and geo_accession fields!!!

Could you please anyone help me how I could getting to the pheno tables for all the projects in the recount without any NA values in a characteristics and geo_accession fields moreover that I faced also critical obstacle with duplicated “row.names” , could any one directive me how I can overcome to that essentially dogma, please.

Thank so much for any one will suggest or give me any practical guide  .


recount summarizedexperiment
Hi Mustafa,

Nearly 13k samples from the SRA ones don't have any characteristics or GEO accession numbers as shown with the code below. There's nothing we can really do about it. Sometimes updates in SRAdb include new GEO accession numbers. The issue with sample metadata being incomplete is a problem that Shannon Ellis and others have tried to address in different ways. Check, SHARQ beta and elsewhere.


Regarding the row.names issue, if you have some reproducible code then I bet other people could help you out. And if you could highlight what step is actually failing that'd be great too. In any case, if you are combining rows, you could set the row names to be unique before combining them. 





> library(recount)
> m <- all_metadata()
> table(sum($characteristics)) == 1)
37278 12821 
> table($geo_accession))
37395 12704 

