Question: TCGA isoform expression quantification file - how to reshape data
0
gravatar for mario.zanfardino
5 weeks ago by
Naples (Italy)
mario.zanfardino150 wrote:

I have this R data.frame:

barcode                         miRNA_region                sum_read_count
TCGA-18-3406-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   11867
TCGA-18-3407-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   12821
TCGA-18-3408-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   9703
TCGA-18-3410-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   5376
TCGA-18-3411-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   7630
TCGA-18-3412-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   7301
TCGA-18-3414-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   11296
TCGA-18-3415-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   9681
TCGA-18-3416-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   5531
TCGA-18-3417-01A-01T-1442-13    hsa-let-7a-1_MIMAT0000062   9262
TCGA-18-3419-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   2650
TCGA-18-3421-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   10711
TCGA-18-4086-01A-01T-1557-13    hsa-let-7a-1_MIMAT0000062   7531
TCGA-18-4721-01A-01T-1442-13    hsa-let-7a-1_MIMAT0000062   9683
TCGA-18-5592-01A-01T-1634-13    hsa-let-7a-1_MIMAT0000062   7604
TCGA-18-5595-01A-01T-1634-13    hsa-let-7a-1_MIMAT0000062   5872
TCGA-21-1072-01A-01T-1557-13    hsa-let-7a-1_MIMAT0000062   11132

with also other miRNA_region not visualized here.

I want all values of miRNAregion column as new columns of data.frame. barcode column with unique values of barcode and, for each barcode the sumreadcount value (for each miRNAregion).

Any suggestions?

Thank you!

ADD COMMENTlink modified 5 weeks ago by Steve Lianoglou12k • written 5 weeks ago by mario.zanfardino150
Answer: TCGA isoform expression quantification file - how to reshape data
1
gravatar for Steve Lianoglou
5 weeks ago by
Denali
Steve Lianoglou12k wrote:

There are a number of ways to reshape data from wide to long and back again. Base-R has the reshape function, and the reshape2 and tidyr packages have been written to handle these types of things as well.

There is a plethora of tutorials around the web that will help you come up to speed with using those packages. Reading through and understanding them will be time well spent because these types of problems arise often.

That having been said, here are a few approaches you can use. Let's assume your data is in a data.frame named dat:

  • reshape2: reshape2::dcast(dat, barcode ~ miRNA_region, value.var = "sum_read_count")
  • tidyr: tidyr::spread(dat, miRNA_region, sum_read_count)
  • Base-R reshape: reshape(dat, direction = "wide", idvar = "barcode", timevar = "miRNA_region")

I've never really used base reshape, have been using reshape2 since a long time, but trying to shift towards using tidyr.

Lastly, for future reference, even though this question is about data used in a bioinformatics context, it's not really about any bioconductor software specifically, and would be better asked on stackoverflow or https://community.rstudio.com/

ADD COMMENTlink written 5 weeks ago by Steve Lianoglou12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 284 users visited in the last hour