closest-features equivalent in Bioconductor for finding closest gene(s)
3
1
Entering edit mode
@dimitris-polychronopoulos-9192
Last seen 6.8 years ago
United Kingdom

Dear all,

I would like to get from a GRanges object, where I have stored genomic coordinates of my differentially expressed genes (using biomart and mm9), the closest gene(s) for every DE gene in my list. Using closest-features in unix I could do it easily:

closest-features total_genes_gr.bed all_mouse_genes_gr.bed

and then merge the dataframes.

But I would like to be consistent and do everything in R. Ideally, I would like to have those closest features / genes as a metadata column. Could you please suggest any ways of doing this?

Best wishes,

Dimitris

 

 

genomicranges granges iranges annotatepeak • 2.2k views
ADD COMMENT
2
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States

Should be straight-forward with ?nearest.

ADD COMMENT
1
Entering edit mode
@dimitris-polychronopoulos-9192
Last seen 6.8 years ago
United Kingdom

Nearest will give me the actual differentially expressed gene each time if i put as query the granges object of my diff expressed genes and as subject all mouse genes. Unless there is a special argument that gives you the second closest hit i will end up with the same list of genes

0
Entering edit mode

Could just call nearest with one argument, all the genes. It will ignore self-hits. After annotating, subset to the DE genes.

ADD REPLY
0
Entering edit mode

Many thanks Michael for your suggestion. I could not make it work,though, but I think i found a way using follow. More specifically:

genes_index <- follow(total_DE_genes_gr,all_mouse_genes_gr,select=c("all")) #This gives me the second "best hit"

nearest_genes <- all_mouse_genes_gr[subjectHits(genes_index)]  #This is the list of mouse genes close to my DE genes

When I am trying to create a metadata column, though, having those genes side by side, I get the following error:

total_DE_genes_gr$nearest_gene <- all_mouse_genes_gr[subjectHits(genes_index)]
Error in `[[<-`(`*tmp*`, name, value = <S4 object of class "GRanges">) :
  20945 elements in value to replace 20956 elements

Probably because not all of my DE genes have a nearest gene, and therefore NAs are introduced? Any ways to circumvent this?

Dimitris

 

 

ADD REPLY
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States

Find nearest gene to each gene:

total_DE_genes_gr$nearest_gene <- all_mouse_genes_gr[nearest(all_mouse_genes_gr)[match(total_DE_genes_gr, all_mouse_genes_gr)]]
ADD COMMENT
0
Entering edit mode

Thanks again, Michael. Maybe I miss something, but when I run this, it returns:

total_DE_genes_gr$nearest_gene <- all_mouse_genes_gr[nearest(all_mouse_genes_gr)[match(total_DE_genes_gr, all_mouse_genes_gr)]]
Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript contains NAs or out-of-bounds indices

 

 

 

ADD REPLY
1
Entering edit mode

Two possibilities:

1) You have genes in the DE set that are not in the total set.

2) There are genes all by themselves on a chromosome/strand.

Btw, you should pass ignore.strand=TRUE to nearest() unless you only want the nearest gene on the same strand.

There is no way to encode an NA GRanges element, so if there are cases where a gene has no neighbor, you will need to think about what to do.

ADD REPLY
0
Entering edit mode

Thanks Michael for the suggestions, it worked!

ADD REPLY

Login before adding your answer.

Traffic: 608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6