Search
Question: biomaRT biotype TFs only
0
gravatar for rbronste
6 months ago by
rbronste60
rbronste60 wrote:

Hi I am trying to retrieve from biomaRT a GRanges of only TFs with their gene identifier metadata and was not sure about how to do this, I assume its looking through the particular biotype but not sure if this applies to gene or specific transcripts? Thank you!

ADD COMMENTlink modified 6 months ago by Mike Smith2.8k • written 6 months ago by rbronste60
1
gravatar for Mike Smith
6 months ago by
Mike Smith2.8k
EMBL Heidelberg / de.NBI
Mike Smith2.8k wrote:

I'm not sure biotype is quite the right field to query.  Perhaps a particular GO annotation would be appropriate.  GO:0003700 is for 'DNA binding transcription factor activity'.  It probably doesn't necessarily mean that something annotated with that is a transcription factor, but I guess it would be hard for a transcription factor to fall outside that classification.

You can query for genes annotated directly with that term in biomaRt with something like:

library(biomaRt)
ensembl_mart = useMart("ensembl", dataset = "hsapiens_gene_ensembl")

results <- getBM(attributes = c("ensembl_gene_id", 
                                "external_gene_name",
                                "chromosome_name", 
                                "start_position",
                                "end_position"),
                 filters = "go",
                 values = "GO:0003700",
                 mart = ensembl_mart)

If you want things annotated with that term, or anything below it in the ontology then it's slightly different:

results <- getBM(attributes = c("ensembl_gene_id", 
                                "external_gene_name",
                                "chromosome_name", 
                                "start_position",
                                "end_position"),
                 filters = "go_parent_term",
                 values = "GO:0003700",
                 mart = ensembl_mart)
ADD COMMENTlink modified 6 months ago • written 6 months ago by Mike Smith2.8k

Very helpful thank you! 

My aim here is to filter a full HT-seq list of counts for factors with DNA binding activity and do it in such a way as to grab only those above some cutoff, lets say just DNA binding factors with absolute counts above 100 for example. Do you think just importing that list as a data.frame into R and doing a simple intersect/matching operation would do it? Thanks again.

ADD REPLYlink modified 6 months ago • written 6 months ago by rbronste60

I guess filtering one list vs the other for 

"external_gene_name"
ADD REPLYlink written 6 months ago by rbronste60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 323 users visited in the last hour