Search
Question: Recount TCGA data
0
gravatar for rajesha1986
4 months ago by
rajesha19860 wrote:

Hello, Thanks for providing great tool for accessing all these datasets. I downloaded gene counts (counts_gene.tsv) file for TCGA through https://jhubiostatistics.shinyapps.io/recount/ --> TCGA --> gene counts (duffel.rail.bio/recount/TCGA/counts_gene.tsv.gz). Then I was interested in Lung data so separated the counts for lung cancer related patients. However, in the count matrix, I see no gene IDs. I am new to this and please help whether I am missing some thing.

Thanks

Rajesha 

 

 

 

ADD COMMENTlink modified 4 months ago by Leonardo Collado Torres590 • written 4 months ago by rajesha19860
0
gravatar for Leonardo Collado Torres
4 months ago by
United States
Leonardo Collado Torres590 wrote:

Hi,

The text files are missing the gene ids. I realize this is an inconvenience if you don't want to use R.

This information is much more well organized in the RangedSummarizedExperiment objects (RSE) that you can download from https://jhubiostatistics.shinyapps.io/recount/ or via the recount Bioconductor package. See Figure 2 of https://f1000research.com/articles/6-1558/v1. Actually, that workflow and the recount vignette http://bioconductor.org/packages/release/bioc/vignettes/recount/inst/doc/recount-quickstart.html are the best places to get started and familiarized with recount.

Since the genes are all the same regardless of the study, you can use:

> library(recount)

> rowRanges(rse_gene_SRP009615)
GRanges object with 58037 ranges and 3 metadata columns:
                     seqnames                 ranges strand |            gene_id bp_length          symbol
                        <Rle>              <IRanges>  <Rle> |        <character> <integer> <CharacterList>
  ENSG00000000003.14     chrX [100627109, 100639991]      - | ENSG00000000003.14      4535          TSPAN6
   ENSG00000000005.5     chrX [100584802, 100599885]      + |  ENSG00000000005.5      1610            TNMD
  ENSG00000000419.12    chr20 [ 50934867,  50958555]      - | ENSG00000000419.12      1207            DPM1
  ENSG00000000457.13     chr1 [169849631, 169894267]      - | ENSG00000000457.13      6883           SCYL3
  ENSG00000000460.16     chr1 [169662007, 169854080]      + | ENSG00000000460.16      5967        C1orf112
                 ...      ...                    ...    ... .                ...       ...             ...
   ENSG00000283695.1    chr19 [ 52865369,  52865429]      - |  ENSG00000283695.1        61              NA
   ENSG00000283696.1     chr1 [161399409, 161422424]      + |  ENSG00000283696.1       997              NA
   ENSG00000283697.1     chrX [149548210, 149549852]      - |  ENSG00000283697.1      1184    LOC101928917
   ENSG00000283698.1     chr2 [112439312, 112469687]      - |  ENSG00000283698.1       940              NA
   ENSG00000283699.1    chr10 [ 12653138,  12653197]      - |  ENSG00000283699.1        60         MIR4481
  -------
  seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths

 

And save that information in a text table. 

Best,

Leonardo

> packageVersion('recount')
[1] ‘1.2.3’

 

ADD COMMENTlink written 4 months ago by Leonardo Collado Torres590
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 288 users visited in the last hour