Question

Filter Mutations?

0

Entering edit mode

ryan.hagenson • 0

@cd1dac3b

Last seen 23 months ago

United States

When I download a study with cBioPortalData, the mutations assay is placed into a RaggedExperiment. Is there a way to reconstruct the underlying data.frame similar to if I had read data_mutations.txt directly? Ultimately, what I want to do is filter by type of mutation (e.g., remove silent mutations) and percentage of patients affected. Without using the API, I did this via a filter on Variant_Classification, followed by grouping by Hugo_Symbol and counting Tumor_Sample_Barcode (i.e., patient ID) then filter for Hugo_Symbol entries with enough patients affected. If I can reconstruct the data.frame that would be great, but there might be a better alternative to produce a data.frame of just data from the original columns Hugo_Symbol, Tumor_Sample_Barcode, Variant_Classification, and Protein_position.

RaggedExperiment cBioPortalData • 1.2k views

ADD COMMENT • link updated 23 months ago by Marcel Ramos 700 • written 23 months ago by ryan.hagenson • 0

score 0 · Answer 1 · 2022-08-18

0

Entering edit mode

Marcel Ramos 700

@marcel-ramos-7325

Last seen 18 days ago

United States

Hi Ryan,

We use the RaggedExperiment class because it allows you to do much more than what a simple data.frame can do.

Have a look at the assay-functions example at:

https://code.bioconductor.org/browse/RaggedExperiment/blob/master/inst/scripts/assay-functions-Ex.R

To highlight some of the examples, you can filter by non-silent mutations and reduce to genic regions via qreduceAssay using gn as the reference:

  nonsilent <- function(scores, ranges, qranges)
        any(scores != "Silent")
mutations <- qreduceAssay(mre, gn, nonsilent, "Variant_Classification")

To summarize the percentages (e.g., about 13947 rows have 0 % mutations):

table(rowSums(!is.na(mutations)) / ncol(mutations))

                 0 0.0111111111111111 0.0222222222222222 0.0333333333333333 0.0444444444444444 0.0555555555555556 0.0666666666666667 0.0777777777777778 
             13947               5163               2034                838                356                166                123                 95 
0.0888888888888889                0.1  0.111111111111111  0.122222222222222  0.133333333333333  0.144444444444444  0.155555555555556  0.166666666666667 
                41                 43                 31                 27                 26                 10                 15                  9 
 0.177777777777778  0.188888888888889                0.2  0.211111111111111  0.222222222222222  0.233333333333333  0.244444444444444  0.255555555555556 
                15                  8                  7                 21                  7                  9                  5                  3 
 0.266666666666667  0.277777777777778  0.288888888888889                0.3  0.322222222222222  0.333333333333333  0.344444444444444  0.355555555555556 
                 4                  7                  4                  5                  1                  1                  1                  1 
 0.366666666666667  0.377777777777778  0.411111111111111  0.422222222222222  0.433333333333333  0.511111111111111  0.522222222222222 
                 1                  1                  3                  1                  1                  1                  2

You may be able to reconstruct the data.frame from the data but I am not aware of the specific data organization (i.e., shape, variables, etc.) that you are looking for. Perhaps mcols() provides what you are looking for.

Best regards,

Marcel

ADD COMMENT • link 23 months ago Marcel Ramos 700

0

Entering edit mode

I have previously seen that example that you highlight and it does not answer my question. The core of my question here is in concern to whether I can reconstruct the underlying data.frame when given a RaggedExperiement of mutation data. There may be methods in RaggedExperiment that replicate what I was doing on the data.frame when reading data_mutations.txt was directly, but my question was not about that.

You may be able to reconstruct the data.frame from the data but I am not aware of the specific data organization (i.e., shape, variables, etc.) that you are looking for.

The specific data organization that I am looking for is an equivalent data.frame built from a RaggedExperiment -- as far as I have seen the latter does not provide what I need, while the former is how I was solving the problem prior to exploring use of the cBioPortalData package.

ADD REPLY • link 23 months ago ryan.hagenson • 0

0

Entering edit mode

Hi Ryan,

We do not have a way to easily go backwards to a data.frame but it can be done. We provide these data structures because they make use of powerful GRanges / Bioconductor ecosystem.

Without a concrete example, I can only provide limited help as I don't know what kind of data.frame you are looking for.

If you're looking for the raw data, you could obtain that via the downloadStudy, and untarStudy functions in cBioPortalData.

Please be more specific with illustrative and reproducible examples.

See this link https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

Best regards,

Marcel

ADD REPLY • link 23 months ago Marcel Ramos 700

0

Entering edit mode

If you're looking for the raw data, you could obtain that via the downloadStudy, and untarStudy functions in cBioPortalData.

I was trying to avoid doing this if I could. I was hoping there was an "easy" way to go from RaggedExperiment to data.frame, but it seems that using the raw data via downloadStudy is the most fit solution.

Thank you for your help!

ADD REPLY • link 23 months ago ryan.hagenson • 0

0

Entering edit mode

I was looking into this more and it seems like you could use the as.data.frame method.

There will likely be some information loss with this conversion.

suppressPackageStartupMessages({library(RaggedExperiment)})
example("RaggedExperiment", echo = FALSE)
as.data.frame(as(re3, "GRangesList"))
#>   group group_name seqnames start end width strand score
#> 1     1    sample1     chr1     1  10    10      -     1
#> 2     1    sample1     chr1    11  18     8      +     2
#> 3     2    sample2     chr2     1  10    10      -     3
#> 4     2    sample2     chr2    11  18     8      +     4

ADD REPLY • link 23 months ago Marcel Ramos 700