Search
Question: biomaRT biotype TFs only
0
8 months ago by
rbronste60
rbronste60 wrote:

Hi I am trying to retrieve from biomaRT a GRanges of only TFs with their gene identifier metadata and was not sure about how to do this, I assume its looking through the particular biotype but not sure if this applies to gene or specific transcripts? Thank you!

modified 8 months ago by Mike Smith2.9k • written 8 months ago by rbronste60
1
8 months ago by
Mike Smith2.9k
EMBL Heidelberg / de.NBI
Mike Smith2.9k wrote:

I'm not sure biotype is quite the right field to query.  Perhaps a particular GO annotation would be appropriate.  GO:0003700 is for 'DNA binding transcription factor activity'.  It probably doesn't necessarily mean that something annotated with that is a transcription factor, but I guess it would be hard for a transcription factor to fall outside that classification.

You can query for genes annotated directly with that term in biomaRt with something like:

library(biomaRt)
ensembl_mart = useMart("ensembl", dataset = "hsapiens_gene_ensembl")

results <- getBM(attributes = c("ensembl_gene_id",
"external_gene_name",
"chromosome_name",
"start_position",
"end_position"),
filters = "go",
values = "GO:0003700",
mart = ensembl_mart)

If you want things annotated with that term, or anything below it in the ontology then it's slightly different:

results <- getBM(attributes = c("ensembl_gene_id",
"external_gene_name",
"chromosome_name",
"start_position",
"end_position"),
filters = "go_parent_term",
values = "GO:0003700",
mart = ensembl_mart)

My aim here is to filter a full HT-seq list of counts for factors with DNA binding activity and do it in such a way as to grab only those above some cutoff, lets say just DNA binding factors with absolute counts above 100 for example. Do you think just importing that list as a data.frame into R and doing a simple intersect/matching operation would do it? Thanks again.

I guess filtering one list vs the other for

"external_gene_name"