Hi,
The text files are missing the gene ids. I realize this is an inconvenience if you don't want to use R.
This information is much more well organized in the RangedSummarizedExperiment objects (RSE) that you can download from https://jhubiostatistics.shinyapps.io/recount/ or via the recount Bioconductor package. See Figure 2 of https://f1000research.com/articles/6-1558/v1. Actually, that workflow and the recount vignette http://bioconductor.org/packages/release/bioc/vignettes/recount/inst/doc/recount-quickstart.html are the best places to get started and familiarized with recount.
Since the genes are all the same regardless of the study, you can use:
> library(recount)
> rowRanges(rse_gene_SRP009615)
GRanges object with 58037 ranges and 3 metadata columns:
seqnames ranges strand | gene_id bp_length symbol
<Rle> <IRanges> <Rle> | <character> <integer> <CharacterList>
ENSG00000000003.14 chrX [100627109, 100639991] - | ENSG00000000003.14 4535 TSPAN6
ENSG00000000005.5 chrX [100584802, 100599885] + | ENSG00000000005.5 1610 TNMD
ENSG00000000419.12 chr20 [ 50934867, 50958555] - | ENSG00000000419.12 1207 DPM1
ENSG00000000457.13 chr1 [169849631, 169894267] - | ENSG00000000457.13 6883 SCYL3
ENSG00000000460.16 chr1 [169662007, 169854080] + | ENSG00000000460.16 5967 C1orf112
... ... ... ... . ... ... ...
ENSG00000283695.1 chr19 [ 52865369, 52865429] - | ENSG00000283695.1 61 NA
ENSG00000283696.1 chr1 [161399409, 161422424] + | ENSG00000283696.1 997 NA
ENSG00000283697.1 chrX [149548210, 149549852] - | ENSG00000283697.1 1184 LOC101928917
ENSG00000283698.1 chr2 [112439312, 112469687] - | ENSG00000283698.1 940 NA
ENSG00000283699.1 chr10 [ 12653138, 12653197] - | ENSG00000283699.1 60 MIR4481
-------
seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths
And save that information in a text table.
Best,
Leonardo
> packageVersion('recount')
[1] ‘1.2.3’