Search
Question: Use biomaRt to get pre-calculated transcription factor binding sites (TFBS) for GRCh37
0
3 months ago by
goldberg.jm10
goldberg.jm10 wrote:

Hi All,

I wish to use Bioconductor/biomaRt to get pre-calculated transcription factor binding site (TFBS) results for GRCh37.

To do this (for GRCh38) at the ensembl biomart interface (http://www.ensembl.org/biomart/martview/), under "-CHOOSE DATABASE-" I select "ENSEMBL REGULATION 92", and under "-CHOOSE DATASET-" I select "Human Binding Motifs (GRCh38.p12)".

For a convenient "Filter" I check "Multiple regions...", and enter "1:0:20000". For this test I left "Attributes" at default.

The result is:
http://www.ensembl.org/biomart/martview/19ba1c438ef96a5100531e91647ab2b5?VIRTUALSCHEMANAME=default&ATTRIBUTES=hsapiens_motif_feature.default.binding_motifs.binding_matrix_id|hsapiens_motif_feature.default.binding_motifs.chromosome_name|hsapiens_motif_feature.default.binding_motifs.chromosome_start|hsapiens_motif_feature.default.binding_motifs.chromosome_end|hsapiens_motif_feature.default.binding_motifs.score|hsapiens_motif_feature.default.binding_motifs.feature_type_name&FILTERS=hsapiens_motif_feature.default.filters.chromosomal_region."1:0:20000"&VISIBLEPANEL=resultspanel

Here is my specific question: how do I write a Bioconductor/biomaRt query to get me to the equivalent of "ENSEMBL REGULATION 92/Human Binding Motifs" for GRCh37?

Thank you!

Jon

modified 3 months ago by James W. MacDonald48k • written 3 months ago by goldberg.jm10

I do know how to use biomaRt to access archived versions of "Ensembl Genes..." (see code below), just not for "Ensembl Regulation..."

useMart(host='grch37.ensembl.org',biomart='ENSEMBL_MART_ENSEMBL',dataset='hsapiens_gene_ensembl') #
1
3 months ago by
United States
James W. MacDonald48k wrote:

If you can't go to the Ensembl Biomart site directly and do the query (and so far as I can tell, you can't), then you cannot do the query using biomaRt either. The latter is just a programmatic way for querying the former, so won't do anything that isn't available at the website.

1

That said, you could consider using liftOver to convert the GRCh38 TFBS to the GRCh37 coordinates.

Thanks James. Before I try liftOver, I'll see if I can use TFBStools (https://bioconductor.org/packages/release/bioc/html/TFBSTools.html) to calculate the sites by applying PSSMs to the sequence.

Best,

Jon