Search
Question: trying to annotate overlaps with external file information
1
gravatar for chrisclarkson100
6 months ago by
chrisclarkson10030 wrote:

I have found the overlapping regions between the genomic regions I am interested in and genes with 'intersectBed'- external of R. 

intersectBed -a regions_of_interest.bed -b genes.bed > overlap_genes.bed

So now the file looks like this

>head overlap_genes.bed

chr1    180110925    180113591    Region_665    0    0

chr1    180110925    180113591    Region_665    0    0

chr1    180110925    180113591    Region_665    0    0

chr11    43113461    43117525    Region_837    0    0


This however is no good to me as I also need to include columns in this bed file which specify the gene that is being overlapped and where it starts and ends....

head genes.bed​

 

chr1    109408122    109422169    +    Serpinb2    

chr1    174405505    174408706    +    Slamf9    

chr1    133724589    133743546    +    Slc41a1    

chr1    173847813    173848810    +    Slamf6    

i.e.. what I would like to see is:

head overlap_genes.bed
chr1    180110925    180113591    Region_665    0    0    gene_name .....

chr1    180110925    180113591    Region_665    0    0    another_gene 

chr1    180110925    180113591    Region_665    0    0    one_more_gene

So how do I bind the according gene_names and their start, end and orientation to the overlap_genes object? My first thought was to iterate through each row and check if there were overlaps between the i-th range intervals and if so- bind the concerned gene-name and extra info to that i-th overlap_genes row.

genes=read.table('genes.bed', sep='\t', header=F, stringsAsFactors=F)
overlap_genes=read.table('overlap_genes.bed', sep='\t', header=F, stringsAsFactors=F)

informed=data.frame()
for(i in nrow(overlap_genes)){
  apply(genes,1, function(x){tryCatch(

                            if(intersect(seq(x[2], x[3]), seq(overlap_genes[i,1], overlap_genes[i,2])))

                            {rbind(informed,cbind(overlap_genes[i,],x))}

                            , error=function(e) e)})
  }

This however has not worked for me ... The informed data frame comes up empty.... which is obviously wrong as the 'overlap_genes' is the product of being overlapped with genes so all of its rows should overlap with 'genes'... I have tried using Iranges but it seems to posses the same functionality as intersectBed but cannot retain extra columns from external data.....

Does anyone know of a way around this?? Thanks

 

ADD COMMENTlink modified 6 months ago by Michael Lawrence9.8k • written 6 months ago by chrisclarkson10030
3
gravatar for Michael Lawrence
6 months ago by
United States
Michael Lawrence9.8k wrote:

It looks like you want what comes from the bedtools -wb option. It would write out the gene ranges after the intersections. The HelloRanges package maps bedtools command lines to IRanges code. For example,

> bedtools_intersect("-a regions_of_interest.bed -b genes.bed -wb")
{
    genome <- Seqinfo(genome = NA_character_)
    gr_a <- import("regions_of_interest.bed", genome = genome)
    gr_b <- import("genes.bed", genome = genome)
    pairs <- findOverlapPairs(gr_a, gr_b, ignore.strand = TRUE)
    ans <- Pairs(pintersect(pairs, ignore.strand = TRUE), second(pairs))
    ans
}

That gives a Pairs object that pairs up the intersections and genes (as GRanges). You could also just stick the gene GRanges as a meta column on the intersection GRanges, like:

intersection <- first(ans)
intersection$genes <- second(ans)

Hopefully this gets you on the right path. Oh, and note the call to Seqinfo(). You should fill in your genome identifier, eg, "GRCh38" to get the full benefit of the framework.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Michael Lawrence9.8k

hi thank you- i should have looked through the intersectbed documentation and found out about the -wao option

ADD REPLYlink written 6 months ago by chrisclarkson10030
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 178 users visited in the last hour