Question

GenomicInteractions duplicate annotations

0

Entering edit mode

dogancan • 0

@dogancan-23548

Last seen 4.9 years ago

Hi,

I have 3 annotated feature for my GI object and when I annotated them with annotateInteractions() and checked the interactions with isInteractionType() I got duplicate interactions in the data.frame. For example if a connection between 2 annotated regions are in anchor1 and anchor2, respectively, I'm also seeing the same interaction in anchor2 and anchor1. How can I have only the unique interactions?

I'm calling specific interactions between two objects like this:

GIObject[isInteractionType(GIObject, "annotation1", "annotation2")]

Thank you very much.

duplicate GenomicInteractions • 1.4k views

ADD COMMENT • link 4.9 years ago dogancan • 0

score 0 · Answer 1 · 2020-05-19

0

Entering edit mode

liz.ingsimmons ▴ 140

@lizingsimmons-6935

Last seen 4.4 years ago

Germany

Hi,

Please use the GenomicInteractions tag for questions about GenomicInteractions, so that the maintainers get notified about them. It's only by chance that I was on the site to check something related to my own work today and saw this question.

I'm afraid that I'm still not really sure what you're trying to achieve, from this question and our email conversation. Nevertheless, I've put together some examples of how I would use linkOverlaps to make a data frame that contains information about all unique pairs of enhancers and promoters (for example) that are linked by interactions in a GenomicInteractions object. I was going to send this to you by email but I thought I'd put it here instead so that future questioners might also be able to find it.

These examples use the example data that's included in the package and used in the Hi-C analysis vignette.

library("GenomicInteractions")
data("hic_example_data")
data("mm9_refseq_promoters")
data("thymus_enhancers")

# remove names to avoid issues converting to data.frame later
thymus_enh$region_id <- names(thymus_enh)
names(thymus_enh) <- NULL
names(mm9_refseq_promoters) <- NULL

## APPROACH 1: linkOverlaps for a specific type of interaction (e.g., enhancer-promoter)

linked_regions <- linkOverlaps(hic_example_data, mm9_refseq_promoters, thymus_enh)

# use the indices of the interacting regions to extract data about them 
df1 <- as.data.frame(mm9_refseq_promoters[linked_regions$subject1])
names(df1) <- paste0("region1_", names(df1))
df2 <- as.data.frame(thymus_enh[linked_regions$subject2])
names(df2) <- paste0("region2_", names(df2))

linked_regions_df <- unique(cbind(df1, df2))
head(linked_regions_df)
#>   region1_seqnames region1_start region1_end region1_width region1_strand
#> 1            chr15      97549082    97554081          5000              -
#> 2            chr15      97549082    97554081          5000              -
#> 3            chr15      97549082    97554081          5000              -
#> 4            chr15      97549082    97554081          5000              -
#> 5            chr15      97549082    97554081          5000              -
#> 6            chr15      97549082    97554081          5000              -
#>   region1_tx_id region1_tx_name region1_id region1_geneSymbol region2_seqnames
#> 1         27426       NM_008902      19011              Endou            chr15
#> 2         27426       NM_008902      19011              Endou            chr15
#> 3         27426       NM_008902      19011              Endou            chr15
#> 4         27426       NM_008902      19011              Endou            chr15
#> 5         27426       NM_008902      19011              Endou            chr15
#> 6         27426       NM_008902      19011              Endou            chr15
#>   region2_start region2_end region2_width region2_strand
#> 1      97601250    97601749           500              *
#> 2      97604850    97605349           500              *
#> 3      97626150    97626649           500              *
#> 4      97632450    97632949           500              *
#> 5      97648750    97649249           500              *
#> 6      97668250    97668749           500              *
#>             region2_region_id
#> 1 ENH_chr15:97601250-97601749
#> 2 ENH_chr15:97604850-97605349
#> 3 ENH_chr15:97626150-97626649
#> 4 ENH_chr15:97632450-97632949
#> 5 ENH_chr15:97648750-97649249
#> 6 ENH_chr15:97668250-97668749

ADD COMMENT • link 4.9 years ago liz.ingsimmons ▴ 140

0

Entering edit mode

## APPROACH 2: linkOverlaps for any pair of features (e.g. enhancer-enhancer, enhancer-promoter, etc)

# make mcols the same so the GRanges can be combined easily
thymus_enh$region_type <- "enhancer"
mm9_refseq_promoters$region_type <- "promoter"
mm9_refseq_promoters$region_id <- mm9_refseq_promoters$tx_name
mcols(mm9_refseq_promoters) <- mcols(mm9_refseq_promoters)[c("region_id", "region_type")]

regions_of_interest <- c(thymus_enh, mm9_refseq_promoters)

linked_regions <- linkOverlaps(hic_example_data, regions_of_interest)

# use the indices of the interacting regions to extract data about them 
df1 <- as.data.frame(regions_of_interest[linked_regions$subject1])
names(df1) <- paste0("region1_", names(df1))
df2 <- as.data.frame(regions_of_interest[linked_regions$subject2])
names(df2) <- paste0("region2_", names(df2))

linked_regions_df <- unique(cbind(df1, df2))

linked_regions_df$interaction_type <- paste(linked_regions_df$region1_region_type, linked_regions_df$region2_region_type, sep = "-")
head(linked_regions_df)
#>   region1_seqnames region1_start region1_end region1_width region1_strand
#> 1            chr15      97601250    97601749           500              *
#> 2            chr15      97601250    97601749           500              *
#> 3            chr15      97601250    97601749           500              *
#> 4            chr15      97604850    97605349           500              *
#> 5            chr15      97604850    97605349           500              *
#> 6            chr15      97604850    97605349           500              *
#>             region1_region_id region1_region_type region2_seqnames
#> 1 ENH_chr15:97601250-97601749            enhancer            chr15
#> 2 ENH_chr15:97601250-97601749            enhancer            chr15
#> 3 ENH_chr15:97601250-97601749            enhancer            chr15
#> 4 ENH_chr15:97604850-97605349            enhancer            chr15
#> 5 ENH_chr15:97604850-97605349            enhancer            chr15
#> 6 ENH_chr15:97604850-97605349            enhancer            chr15
#>   region2_start region2_end region2_width region2_strand
#> 1      97545750    97546249           500              *
#> 2      97570150    97570649           500              *
#> 3      97594450    97594949           500              *
#> 4      97545750    97546249           500              *
#> 5      97570150    97570649           500              *
#> 6      97594450    97594949           500              *
#>             region2_region_id region2_region_type  interaction_type
#> 1 ENH_chr15:97545750-97546249            enhancer enhancer-enhancer
#> 2 ENH_chr15:97570150-97570649            enhancer enhancer-enhancer
#> 3 ENH_chr15:97594450-97594949            enhancer enhancer-enhancer
#> 4 ENH_chr15:97545750-97546249            enhancer enhancer-enhancer
#> 5 ENH_chr15:97570150-97570649            enhancer enhancer-enhancer
#> 6 ENH_chr15:97594450-97594949            enhancer enhancer-enhancer
table(linked_regions_df$interaction_type)
#> 
#> enhancer-enhancer promoter-enhancer promoter-promoter 
#>            166672            187554             69894

## APPROACH 3: annotateInteractions

# annotateInteractions assigns a class to each unique region in the GenomicInteractions object
# if a region overlaps multiple regions of interest, the class is assigned based 
# on the ordering given by the user

annotation.features <- list(promoter = mm9_refseq_promoters, enhancer = thymus_enh)
annotateInteractions(hic_example_data, annotation.features, id.col = "region_id")
#> Warning in annotateInteractions(hic_example_data, annotation.features, id.col =
#> "region_id"): Some features contain duplicate IDs which will result in duplicate
#> annotations
#> Annotating with promoter ...
#> Annotating with enhancer ...
categoriseInteractions(hic_example_data)
#>            category count
#> 1 promoter-promoter  5281
#> 2   promoter-distal   492
#> 3 promoter-enhancer  1302
#> 4     distal-distal   668
#> 5   distal-enhancer   115
#> 6 enhancer-enhancer   313

^{Created on 2020-05-19 by the reprex package (v0.3.0)}

As you can see, the numbers of interactions of each type returned by annotateInteractions and linkOverlaps are very different. This is because annotateInteractions is intended to assign a single class to each interacting anchor region (as explained in the vignette), not to identify all pairwise interactions between regions of interest.

ADD REPLY • link 4.9 years ago liz.ingsimmons ▴ 140

0

Entering edit mode

More directly related to the question you're asking here, is it possible that your GenomicInteractions object includes interactions between regionA-regionB and interactions between regionB-regionA? (see example below, where there is an interaction between chr1:1-10--chr1:11-20 that is also present as the reversed interaction between chr1:11-20--chr1:1-10). If so, you can use the swapAnchors and unique functions to remove these before using annotateInteractions. Again, see example below for how to do this.

library(GenomicInteractions)
gi <- GInteractions(anchor1 = GRanges(seqnames = "chr1", 
                                            ranges = IRanges(start = c(1, 1, 11),
                                                             end = c(10, 10, 20))),
                    anchor2 = GRanges(seqnames = "chr1",
                                            ranges = IRanges(start = c(11, 21, 1),
                                                             end = c(20, 30, 10))))
gi
#> GInteractions object with 3 interactions and 0 metadata columns:
#>       seqnames1   ranges1     seqnames2   ranges2
#>           <Rle> <IRanges>         <Rle> <IRanges>
#>   [1]      chr1      1-10 ---      chr1     11-20
#>   [2]      chr1      1-10 ---      chr1     21-30
#>   [3]      chr1     11-20 ---      chr1      1-10
#>   -------
#>   regions: 3 ranges and 0 metadata columns
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
swapAnchors(gi, mode = "order")
#> GInteractions object with 3 interactions and 0 metadata columns:
#>       seqnames1   ranges1     seqnames2   ranges2
#>           <Rle> <IRanges>         <Rle> <IRanges>
#>   [1]      chr1      1-10 ---      chr1     11-20
#>   [2]      chr1      1-10 ---      chr1     21-30
#>   [3]      chr1      1-10 ---      chr1     11-20
#>   -------
#>   regions: 3 ranges and 0 metadata columns
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
unique(swapAnchors(gi))
#> GInteractions object with 2 interactions and 0 metadata columns:
#>       seqnames1   ranges1     seqnames2   ranges2
#>           <Rle> <IRanges>         <Rle> <IRanges>
#>   [1]      chr1      1-10 ---      chr1     11-20
#>   [2]      chr1      1-10 ---      chr1     21-30
#>   -------
#>   regions: 3 ranges and 0 metadata columns
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

^{Created on 2020-05-19 by the reprex package (v0.3.0)}

If these examples don't help, perhaps you can create a reproducible example using the example data included in the package to demonstrate the problem. Otherwise it is very difficult to understand what you want to achieve and what the problem might be, without seeing your code or data.

ADD REPLY • link 4.9 years ago liz.ingsimmons ▴ 140

0

Entering edit mode

Thank you. That would work for me.

ADD REPLY • link 4.9 years ago dogancan • 0

0

Entering edit mode

Great! Could you please mark the answer as accepted if it solves your problem? Thanks!

ADD REPLY • link 4.9 years ago liz.ingsimmons ▴ 140