Search
Question: Using GenomicDataCommons R package to identify cases with both DNA methylation and gene expression data
2
gravatar for RichardJActon
28 days ago by
RichardJActon30 wrote:

I'm trying to get a list of cases for which there is DNA methylation and gene expression data available for both normal and cancer tissue samples using the GenomicDataCommons R package (which I am using for the first time).

qCases <- cases() %>%
    filter( ~ samples.sample_type == "Solid Tissue Normal" | samples.sample_type == "Blood Derived Normal") %>%
    filter(~ files.type == 'gene_expression' & files.type == 'methylation_beta_value')
qCases %>% count()

> [1] 0

this returns no results

Examining the response() of qCases when only calling the first filter reveals that there are definitely cases with files that contain both gene expression and methylation beta value files.

qCases <- cases() %>%
filter( ~ samples.sample_type == "Solid Tissue Normal" | samples.sample_type == "Blood Derived Normal") %>%
GenomicDataCommons::select('files.type')
%>% response()

> $results
> files

> copy_number_segment, ----->###gene_expression###<-------, simple_somatic_mutation,
annotated_somatic_mutation, biospecimen_supplement, clinical_supplement, biospecimen_supplement,
biospecimen_supplement, mirna_expression, aligned_reads, clinical_supplement,
aggregated_somatic_mutation, slide_image, simple_somatic_mutation, copy_number_segment,
clinical_supplement, clinical_supplement, biospecimen_supplement, clinical_supplement,
clinical_supplement, clinical_supplement, biospecimen_supplement, copy_number_segment, aligned_reads,
annotated_somatic_mutation, biospecimen_supplement, biospecimen_supplement,
annotated_somatic_mutation, biospecimen_supplement, clinical_supplement,
----->###methylation_beta_value###<-----, masked_somatic_mutation, simple_somatic_mutation,
biospecimen_supplement, slide_image, copy_number_segment, annotated_somatic_mutation,
masked_somatic_mutation, clinical_supplement, aggregated_somatic_mutation,
aggregated_somatic_mutation, aggregated_somatic_mutation, mirna_expression, clinical_supplement,
gene_expression, masked_somatic_mutation, gene_expression, biospecimen_supplement, aligned_reads,
biospecimen_supplement, biospecimen_supplement, aligned_reads, simple_somatic_mutation, masked_somatic_mutation

My guess is that what is going wrong here is the filter is looking at individual files entries and thus no one file is both type gene_expression and type methylation_beta_value. Is there a way to filter for cases that have files with a given set of types?

I've been looking over the examples in the vignette but there don't seem to be any examples of composite queries like the one I'm trying to do. Any assistance would be appreciated!

NB Cross-posted from: https://www.biostars.org/p/344349/

ADD COMMENTlink modified 28 days ago by Sean Davis21k • written 28 days ago by RichardJActon30
2
gravatar for Sean Davis
28 days ago by
Sean Davis21k
United States
Sean Davis21k wrote:

There is a sort of unusual issue with using pipes (%>%) and filter. Unlike in dplyr, applying a second filter clears the first. I should definitely change this behavior, but in the meantime I think what you are looking for is:

qCases <- cases() %>%
    filter( ~ (samples.sample_type == "Solid Tissue Normal" | samples.sample_type == "Blood Derived Normal") & (files.type == 'gene_expression' | files.type == 'methylation_beta_value'))
qCases %>% count()
ADD COMMENTlink written 28 days ago by Sean Davis21k
1

Sean thanks so much for answering. I directed the user here from Biostars: https://www.biostars.org/p/344349/#344978

ADD REPLYlink written 28 days ago by Kevin Blighe10
1

Thanks Sean and Kevin, I was caught out by the 'filter()' behaviour, but picked up on it during a re-read of the vignette. However if I have understood the above correctly This will list all cases with either "Solid Tissue Normal" or "Blood Derived Normal" and either "gene_expression" __or__ "methylation_beta_value" files. The bit I'm having difficulty with is getting cases with "gene_expression" __and__ "methylation_beta_value" files. Switching '|' for '&' in this part of the expresion: '(files.type == "gene_expression" | files.type == "methylation_beta_value")' returns 0 rows when there are definitely some samples with both expression and methylation data.

ADD REPLYlink modified 28 days ago • written 28 days ago by RichardJActon30

I don't think that the API supports that query. Instead, simply do two separate queries, get the case ids(), and intersect them. Then, perform a third cases() query and supply the ids(). 

> qCases1 <- cases() %>%
+         filter( ~ (samples.sample_type == "Solid Tissue Normal" | samples.sample_type == "Blood Derived Normal") & (files.type == 'gene_expression'))
> ids1 = qCases1 %>% ids()
> qCases2 <- cases() %>%
+         filter( ~ (samples.sample_type == "Solid Tissue Normal" | samples.sample_type == "Blood Derived Normal") & (files.type == 'methylation_beta_value'))
> ids2 = qCases2 %>% ids()
> length(intersect(ids1,ids2))
[1] 10146
> qCases = cases() %>% filter(~ case_id %in% intersect(ids1,ids2))
> qCases %>% count()
[1] 10146
> 
ADD REPLYlink written 28 days ago by Sean Davis21k

Thanks again. I was hoping that I was missing something and API would support more complex conditional queries as I have some moderately complex requirements for the subset of samples i'm after.

ADD REPLYlink written 28 days ago by RichardJActon30

Thankfully, we have all the power of R at our disposal! 

ADD REPLYlink written 28 days ago by Sean Davis21k
1

Just a note that in the most recent devel version (1.5.8) of GenomicDataCommons, filter chaining is now supported. Each filter in the %>% chain is "AND"ed with the previous filters.

ADD REPLYlink modified 22 days ago • written 22 days ago by Sean Davis21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 358 users visited in the last hour