TCGA isoform expression quantification file - how to reshape data
1
0
Entering edit mode
@mariozanfardino-15232
Last seen 3.7 years ago
Naples (Italy)

I have this R data.frame:

barcode                         miRNA_region                sum_read_count
TCGA-18-3406-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   11867
TCGA-18-3407-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   12821
TCGA-18-3408-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   9703
TCGA-18-3410-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   5376
TCGA-18-3411-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   7630
TCGA-18-3412-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   7301
TCGA-18-3414-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   11296
TCGA-18-3415-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   9681
TCGA-18-3416-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   5531
TCGA-18-3417-01A-01T-1442-13    hsa-let-7a-1_MIMAT0000062   9262
TCGA-18-3419-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   2650
TCGA-18-3421-01A-01T-0981-13    hsa-let-7a-1_MIMAT0000062   10711
TCGA-18-4086-01A-01T-1557-13    hsa-let-7a-1_MIMAT0000062   7531
TCGA-18-4721-01A-01T-1442-13    hsa-let-7a-1_MIMAT0000062   9683
TCGA-18-5592-01A-01T-1634-13    hsa-let-7a-1_MIMAT0000062   7604
TCGA-18-5595-01A-01T-1634-13    hsa-let-7a-1_MIMAT0000062   5872
TCGA-21-1072-01A-01T-1557-13    hsa-let-7a-1_MIMAT0000062   11132

with also other miRNA_region not visualized here.

I want all values of miRNAregion column as new columns of data.frame. barcode column with unique values of barcode and, for each barcode the sumreadcount value (for each miRNAregion).

Any suggestions?

Thank you!

data.frame tcga reshape data • 1.2k views
ADD COMMENT
1
Entering edit mode
@steve-lianoglou-2771
Last seen 14 months ago
United States

There are a number of ways to reshape data from wide to long and back again. Base-R has the reshape function, and the reshape2 and tidyr packages have been written to handle these types of things as well.

There is a plethora of tutorials around the web that will help you come up to speed with using those packages. Reading through and understanding them will be time well spent because these types of problems arise often.

That having been said, here are a few approaches you can use. Let's assume your data is in a data.frame named dat:

  • reshape2: reshape2::dcast(dat, barcode ~ miRNA_region, value.var = "sum_read_count")
  • tidyr: tidyr::spread(dat, miRNA_region, sum_read_count)
  • Base-R reshape: reshape(dat, direction = "wide", idvar = "barcode", timevar = "miRNA_region")

I've never really used base reshape, have been using reshape2 since a long time, but trying to shift towards using tidyr.

Lastly, for future reference, even though this question is about data used in a bioinformatics context, it's not really about any bioconductor software specifically, and would be better asked on stackoverflow or https://community.rstudio.com/

ADD COMMENT

Login before adding your answer.

Traffic: 605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6