Recount TCGA data
1
0
Entering edit mode
@rajesha1986-13918
Last seen 7.3 years ago

Hello, Thanks for providing great tool for accessing all these datasets. I downloaded gene counts (counts_gene.tsv) file for TCGA through https://jhubiostatistics.shinyapps.io/recount/ --> TCGA --> gene counts (duffel.rail.bio/recount/TCGA/counts_gene.tsv.gz). Then I was interested in Lung data so separated the counts for lung cancer related patients. However, in the count matrix, I see no gene IDs. I am new to this and please help whether I am missing some thing.

Thanks

Rajesha 

 

 

 

recount • 2.3k views
ADD COMMENT
0
Entering edit mode
@lcolladotor
Last seen 5 days ago
United States

Hi,

The text files are missing the gene ids. I realize this is an inconvenience if you don't want to use R.

This information is much more well organized in the RangedSummarizedExperiment objects (RSE) that you can download from https://jhubiostatistics.shinyapps.io/recount/ or via the recount Bioconductor package. See Figure 2 of https://f1000research.com/articles/6-1558/v1. Actually, that workflow and the recount vignette http://bioconductor.org/packages/release/bioc/vignettes/recount/inst/doc/recount-quickstart.html are the best places to get started and familiarized with recount.

Since the genes are all the same regardless of the study, you can use:

> library(recount)

> rowRanges(rse_gene_SRP009615)
GRanges object with 58037 ranges and 3 metadata columns:
                     seqnames                 ranges strand |            gene_id bp_length          symbol
                        <Rle>              <IRanges>  <Rle> |        <character> <integer> <CharacterList>
  ENSG00000000003.14     chrX [100627109, 100639991]      - | ENSG00000000003.14      4535          TSPAN6
   ENSG00000000005.5     chrX [100584802, 100599885]      + |  ENSG00000000005.5      1610            TNMD
  ENSG00000000419.12    chr20 [ 50934867,  50958555]      - | ENSG00000000419.12      1207            DPM1
  ENSG00000000457.13     chr1 [169849631, 169894267]      - | ENSG00000000457.13      6883           SCYL3
  ENSG00000000460.16     chr1 [169662007, 169854080]      + | ENSG00000000460.16      5967        C1orf112
                 ...      ...                    ...    ... .                ...       ...             ...
   ENSG00000283695.1    chr19 [ 52865369,  52865429]      - |  ENSG00000283695.1        61              NA
   ENSG00000283696.1     chr1 [161399409, 161422424]      + |  ENSG00000283696.1       997              NA
   ENSG00000283697.1     chrX [149548210, 149549852]      - |  ENSG00000283697.1      1184    LOC101928917
   ENSG00000283698.1     chr2 [112439312, 112469687]      - |  ENSG00000283698.1       940              NA
   ENSG00000283699.1    chr10 [ 12653138,  12653197]      - |  ENSG00000283699.1        60         MIR4481
  -------
  seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths

 

And save that information in a text table. 

Best,

Leonardo

> packageVersion('recount')
[1] ‘1.2.3’

 

ADD COMMENT

Login before adding your answer.

Traffic: 492 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6