Search
Question: Recount TCGA data
0
gravatar for rajesha1986
10 weeks ago by
rajesha19860 wrote:

Hello, Thanks for providing great tool for accessing all these datasets. I downloaded gene counts (counts_gene.tsv) file for TCGA through https://jhubiostatistics.shinyapps.io/recount/ --> TCGA --> gene counts (duffel.rail.bio/recount/TCGA/counts_gene.tsv.gz). Then I was interested in Lung data so separated the counts for lung cancer related patients. However, in the count matrix, I see no gene IDs. I am new to this and please help whether I am missing some thing.

Thanks

Rajesha 

 

 

 

ADD COMMENTlink modified 10 weeks ago by Leonardo Collado Torres540 • written 10 weeks ago by rajesha19860
0
gravatar for Leonardo Collado Torres
10 weeks ago by
United States
Leonardo Collado Torres540 wrote:

Hi,

The text files are missing the gene ids. I realize this is an inconvenience if you don't want to use R.

This information is much more well organized in the RangedSummarizedExperiment objects (RSE) that you can download from https://jhubiostatistics.shinyapps.io/recount/ or via the recount Bioconductor package. See Figure 2 of https://f1000research.com/articles/6-1558/v1. Actually, that workflow and the recount vignette http://bioconductor.org/packages/release/bioc/vignettes/recount/inst/doc/recount-quickstart.html are the best places to get started and familiarized with recount.

Since the genes are all the same regardless of the study, you can use:

> library(recount)

> rowRanges(rse_gene_SRP009615)
GRanges object with 58037 ranges and 3 metadata columns:
                     seqnames                 ranges strand |            gene_id bp_length          symbol
                        <Rle>              <IRanges>  <Rle> |        <character> <integer> <CharacterList>
  ENSG00000000003.14     chrX [100627109, 100639991]      - | ENSG00000000003.14      4535          TSPAN6
   ENSG00000000005.5     chrX [100584802, 100599885]      + |  ENSG00000000005.5      1610            TNMD
  ENSG00000000419.12    chr20 [ 50934867,  50958555]      - | ENSG00000000419.12      1207            DPM1
  ENSG00000000457.13     chr1 [169849631, 169894267]      - | ENSG00000000457.13      6883           SCYL3
  ENSG00000000460.16     chr1 [169662007, 169854080]      + | ENSG00000000460.16      5967        C1orf112
                 ...      ...                    ...    ... .                ...       ...             ...
   ENSG00000283695.1    chr19 [ 52865369,  52865429]      - |  ENSG00000283695.1        61              NA
   ENSG00000283696.1     chr1 [161399409, 161422424]      + |  ENSG00000283696.1       997              NA
   ENSG00000283697.1     chrX [149548210, 149549852]      - |  ENSG00000283697.1      1184    LOC101928917
   ENSG00000283698.1     chr2 [112439312, 112469687]      - |  ENSG00000283698.1       940              NA
   ENSG00000283699.1    chr10 [ 12653138,  12653197]      - |  ENSG00000283699.1        60         MIR4481
  -------
  seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths

 

And save that information in a text table. 

Best,

Leonardo

> packageVersion('recount')
[1] ‘1.2.3’

 

ADD COMMENTlink written 10 weeks ago by Leonardo Collado Torres540
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 114 users visited in the last hour