Is GTEx age and sex data available via the recount package?
1
0
Entering edit mode
abe • 0
@abe-18006
Last seen 5.7 years ago

Apologies for such a simple question. I’m able to locate age and sex data for TCGA data via the recount package, but I’m having trouble finding the same data for GTEx.

> download_study("SRP012682", type = "rse-gene")

> load(file.path("SRP012682", "rse_gene.Rdata"))

> colnames(colData(rse_gene))

[1] "project"

[2] "sample"

[3] "experiment"

[4] "run"

[5] "read_count_as_reported_by_sra"

[6] "reads_downloaded"

[7] "proportion_of_reads_reported_by_sra_downloaded"

[8] "paired_end"

[9] "sra_misreported_paired_end"

[10] "mapped_read_count"

[11] "auc"

[12] "sharq_beta_tissue"

[13] "sharq_beta_cell_type"

[14] "biosample_submission_date"

[15] "biosample_publication_date"

[16] "biosample_update_date"

[17] "avg_read_length"

[18] "geo_accession"

[19] "bigwig_file"

[20] "sampid"

[21] "smatsscr"

[22] "smcenter"

[23] "smpthnts"

[24] "smrin"

[25] "smts"

[26] "smtsd"

[27] "smubrid"

[28] "smtspax"

[29] "smtstptref"

[30] "smnabtch"

[31] "smnabtcht"

[32] "smnabtchd"

[33] "smgebtch"

[34] "smafrze"

[35] "smgtc"

[36] "sme2mprt"

[37] "smchmprs"

[38] "smntrart"

[39] "smnumgps"

[40] "smmaprt"

[41] "smexncrt"

[42] "sm550nrm"

[43] "smgnsdtc"

[44] "smunmprt"

[45] "sm350nrm"

[46] "smrdlgth"

[47] "smmncpb"

[48] "sme1mmrt"

[49] "smsflgth"

[50] "smestlbs"

[51] "smmppd"

[52] "smnterrt"

[53] "smrrnanm"

[54] "smrdttl"

[55] "smvqcfl"

[56] "smmncv"

[57] "smtrscpt"

[58] "smmppdpr"

[59] "smcglgth"

[60] "smgappct"

[61] "smunpdrd"

[62] "smntrnrt"

[63] "smmpunrt"

[64] "smexpeff"

[65] "smmppdun"

[66] "sme2mmrt"

[67] "sme2anti"

[68] "smaltalg"

[69] "sme2snse"

[70] "smmflgth"

[71] "sme1anti"

[72] "smspltrd"

[73] "smbsmmrt"

[74] "sme1snse"

[75] "sme1pcts"

[76] "smrrnart"

[77] "sme1mprt"

[78] "smnum5cd"

[79] "smdpmprt"

[80] "sme2pcts"

[81] "title"

[82] "characteristics"


# Search for sex data

> which(as.data.frame(colData(rse_gene))=="Male", arr.ind=TRUE)

    row col

# Double-check for additional metadata

> rse_gene

class: RangedSummarizedExperiment

dim: 58037 9662

metadata(0):

assays(1): counts

rownames(58037): ENSG00000000003.14 ENSG00000000005.5 ...

  ENSG00000283698.1 ENSG00000283699.1

rowData names(3): gene_id bp_length symbol

colnames(9662): SRR660824 SRR2166176 ... SRR612239 SRR615898

colData names(82): project sample ... title characteristics
recount gtex • 3.7k views
ADD COMMENT
0
Entering edit mode
@lcolladotor
Last seen 5 days ago
United States

Hi @abe,

Under the metadata section in https://f1000research.com/articles/6-1558/v1 we state that " we compiled metadata for GTEx using the v6 phenotype information available at gtexportal.org". This was also mentioned in the supplementary material of the recount2 paper https://www.nature.com/articles/nbt.3838#supplementary-information https://media.nature.com/original/nature-assets/nbt/journal/v35/n4/extref/nbt.3838-S1.pdf  "For GTEx we included the metadata from the file “GTEx Data V6 Annotations SampleAttributesDS.txt” available from http://www.gtexportal.org/home/datasets." Similarly, if you load the recount Bioconductor package and look at the help file for "all_metadata" you'll see: "Note that for subset = 'gtex', there are more variables than the ones we have for 'sra'. This information corresponds to file GTEx_Data_V6_Annotations_SampleAttributesDS.txt available at http://www.gtexportal.org/home/datasets. There you can find the information describing these variables."

If you go to the GTEx portal you can find a data dictionary for the variables. See https://imgur.com/a/sixZeNk The file you want to look at is called "GTEx_Data_V6_Annotations_SampleAttributesDD.xlsx". You can use the information from "sampid" to find the subject id and match it with the GTEx v6 subject annotation table "GTEx_Data_V6_Annotations_SubjectPhenotypesDS.txt" (also from the GTEx portal) https://storage.googleapis.com/gtex_analysis_v6p/annotations/GTEx_Data_V6_Annotations_SubjectPhenotypesDS.txt that has the sex and age information.

Best,

Leonardo

 

 

 

 

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 495 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6