biomaRT biotype TFs only
1
0
Entering edit mode
rbronste ▴ 60
@rbronste-12189
Last seen 23 months ago

Hi I am trying to retrieve from biomaRT a GRanges of only TFs with their gene identifier metadata and was not sure about how to do this, I assume its looking through the particular biotype but not sure if this applies to gene or specific transcripts? Thank you!

biomart TF biotype ensembl mus musculus • 618 views
ADD COMMENT
1
Entering edit mode
Mike Smith ★ 5.2k
@mike-smith
Last seen 5 hours ago
EMBL Heidelberg / de.NBI

I'm not sure biotype is quite the right field to query.  Perhaps a particular GO annotation would be appropriate.  GO:0003700 is for 'DNA binding transcription factor activity'.  It probably doesn't necessarily mean that something annotated with that is a transcription factor, but I guess it would be hard for a transcription factor to fall outside that classification.

You can query for genes annotated directly with that term in biomaRt with something like:

library(biomaRt)
ensembl_mart = useMart("ensembl", dataset = "hsapiens_gene_ensembl")

results <- getBM(attributes = c("ensembl_gene_id", 
                                "external_gene_name",
                                "chromosome_name", 
                                "start_position",
                                "end_position"),
                 filters = "go",
                 values = "GO:0003700",
                 mart = ensembl_mart)

If you want things annotated with that term, or anything below it in the ontology then it's slightly different:

results <- getBM(attributes = c("ensembl_gene_id", 
                                "external_gene_name",
                                "chromosome_name", 
                                "start_position",
                                "end_position"),
                 filters = "go_parent_term",
                 values = "GO:0003700",
                 mart = ensembl_mart)
ADD COMMENT
0
Entering edit mode

Very helpful thank you! 

My aim here is to filter a full HT-seq list of counts for factors with DNA binding activity and do it in such a way as to grab only those above some cutoff, lets say just DNA binding factors with absolute counts above 100 for example. Do you think just importing that list as a data.frame into R and doing a simple intersect/matching operation would do it? Thanks again.

ADD REPLY
0
Entering edit mode

I guess filtering one list vs the other for 

"external_gene_name"
ADD REPLY

Login before adding your answer.

Traffic: 259 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6