I'm trying to get a list of cases for which there is DNA methylation and gene expression data available for both normal and cancer tissue samples using the GenomicDataCommons R package (which I am using for the first time).
qCases <- cases() %>%
filter( ~ samples.sample_type == "Solid Tissue Normal" | samples.sample_type == "Blood Derived Normal") %>%
filter(~ files.type == 'gene_expression' & files.type == 'methylation_beta_value')
qCases %>% count()
> [1] 0
this returns no results
Examining the response()
of qCases
when only calling the first filter reveals that there are definitely cases with files that contain both gene expression and methylation beta value files.
qCases <- cases() %>%
filter( ~ samples.sample_type == "Solid Tissue Normal" | samples.sample_type == "Blood Derived Normal") %>%
GenomicDataCommons::select('files.type')
%>% response()
> $results
> files
> copy_number_segment, ----->###gene_expression###<-------, simple_somatic_mutation,
annotated_somatic_mutation, biospecimen_supplement, clinical_supplement, biospecimen_supplement,
biospecimen_supplement, mirna_expression, aligned_reads, clinical_supplement,
aggregated_somatic_mutation, slide_image, simple_somatic_mutation, copy_number_segment,
clinical_supplement, clinical_supplement, biospecimen_supplement, clinical_supplement,
clinical_supplement, clinical_supplement, biospecimen_supplement, copy_number_segment, aligned_reads,
annotated_somatic_mutation, biospecimen_supplement, biospecimen_supplement,
annotated_somatic_mutation, biospecimen_supplement, clinical_supplement,
----->###methylation_beta_value###<-----, masked_somatic_mutation, simple_somatic_mutation,
biospecimen_supplement, slide_image, copy_number_segment, annotated_somatic_mutation,
masked_somatic_mutation, clinical_supplement, aggregated_somatic_mutation,
aggregated_somatic_mutation, aggregated_somatic_mutation, mirna_expression, clinical_supplement,
gene_expression, masked_somatic_mutation, gene_expression, biospecimen_supplement, aligned_reads,
biospecimen_supplement, biospecimen_supplement, aligned_reads, simple_somatic_mutation, masked_somatic_mutation
My guess is that what is going wrong here is the filter is looking at individual files entries and thus no one file is both type gene_expression and type methylation_beta_value. Is there a way to filter for cases that have files with a given set of types?
I've been looking over the examples in the vignette but there don't seem to be any examples of composite queries like the one I'm trying to do. Any assistance would be appreciated!
NB Cross-posted from: https://www.biostars.org/p/344349/
Sean thanks so much for answering. I directed the user here from Biostars: https://www.biostars.org/p/344349/#344978
Thanks Sean and Kevin, I was caught out by the 'filter()' behaviour, but picked up on it during a re-read of the vignette. However if I have understood the above correctly This will list all cases with either "Solid Tissue Normal" or "Blood Derived Normal" and either "gene_expression" __or__ "methylation_beta_value" files. The bit I'm having difficulty with is getting cases with "gene_expression" __and__ "methylation_beta_value" files. Switching '|' for '&' in this part of the expresion: '(files.type == "gene_expression" | files.type == "methylation_beta_value")' returns 0 rows when there are definitely some samples with both expression and methylation data.
I don't think that the API supports that query. Instead, simply do two separate queries, get the case ids(), and intersect them. Then, perform a third cases() query and supply the ids().
Thanks again. I was hoping that I was missing something and API would support more complex conditional queries as I have some moderately complex requirements for the subset of samples i'm after.
Thankfully, we have all the power of R at our disposal!
Just a note that in the most recent devel version (1.5.8) of GenomicDataCommons,
filter
chaining is now supported. Each filter in the%>%
chain is "AND"ed with the previous filters.