Question: Use biomaRt to get pre-calculated transcription factor binding sites (TFBS) for GRCh37
3 months ago by
goldberg.jm10
goldberg.jm10 wrote:

Hi All,

I wish to use Bioconductor/biomaRt to get pre-calculated transcription factor binding site (TFBS) results for GRCh37.

To do this (for GRCh38) at the ensembl biomart interface (http://www.ensembl.org/biomart/martview/), under "-CHOOSE DATABASE-" I select "ENSEMBL REGULATION 92", and under "-CHOOSE DATASET-" I select "Human Binding Motifs (GRCh38.p12)".

For a convenient "Filter" I check "Multiple regions...", and enter "1:0:20000". For this test I left "Attributes" at default.

The result is:
http://www.ensembl.org/biomart/martview/19ba1c438ef96a5100531e91647ab2b5?VIRTUALSCHEMANAME=default&ATTRIBUTES=hsapiens_motif_feature.default.binding_motifs.binding_matrix_id|hsapiens_motif_feature.default.binding_motifs.chromosome_name|hsapiens_motif_feature.default.binding_motifs.chromosome_start|hsapiens_motif_feature.default.binding_motifs.chromosome_end|hsapiens_motif_feature.default.binding_motifs.score|hsapiens_motif_feature.default.binding_motifs.feature_type_name&FILTERS=hsapiens_motif_feature.default.filters.chromosomal_region."1:0:20000"&VISIBLEPANEL=resultspanel

Here is my specific question: how do I write a Bioconductor/biomaRt query to get me to the equivalent of "ENSEMBL REGULATION 92/Human Binding Motifs" for GRCh37?

Thank you!

Jon

modified 3 months ago by James W. MacDonald48k • written 3 months ago by goldberg.jm10

I do know how to use biomaRt to access archived versions of "Ensembl Genes..." (see code below), just not for "Ensembl Regulation..."

useMart(host='grch37.ensembl.org',biomart='ENSEMBL_MART_ENSEMBL',dataset='hsapiens_gene_ensembl') #
3 months ago by
United States
James W. MacDonald48k wrote:

If you can't go to the Ensembl Biomart site directly and do the query (and so far as I can tell, you can't), then you cannot do the query using biomaRt either. The latter is just a programmatic way for querying the former, so won't do anything that isn't available at the website.

That said, you could consider using liftOver to convert the GRCh38 TFBS to the GRCh37 coordinates.

Thanks James. Before I try liftOver, I'll see if I can use TFBStools (https://bioconductor.org/packages/release/bioc/html/TFBSTools.html) to calculate the sites by applying PSSMs to the sequence.

Best,

Jon