Question

A/B compartments computation: control settings

0

Entering edit mode

Konstantin Okonechnikov ▴ 40

@konstantin-okonechnikov-11325

Last seen 3.4 years ago

Hi! I am using HiTC on HiC-Pro results to compute the A/B compartments. The package documentation is detailed, but for this topic in the tutorial there is only one example command showing this procedure. Because of that I have some additional questions.

My code is the following:

conFile <- "hic_results/matrix/s1/raw/50000/s1_50000.matrix"
binsFile <- "hic_results/matrix/s1/raw/50000/s1_50000_abs.bed"
# generate object
hicRes <- importC( conFile, binsFile, rm.trans=TRUE)
# example chromosome 11
hic.binned <- binningC(hicRes$chr11chr11, binsize=250000, method="mean")
# main function
pc <- pca.hic(hic.binned, normPerExpected=TRUE, method="loess", npc=1)

Here are my questions:

I am using the raw contacts as input, since the loess normalization is applied for the comparison procedure. But would it be more suitable to use iced normalized data?
What is the optimal bin size to select for compartments computation? According to existing studies 250-500KBp is typically used. Results similarity was observed between these bin sizes. I also tried smaller, but it failed with errors: "Contact map looks big. Use mean method instead..."and "Empty correlation matrix".
How to use gene coords to assign compartments? Simply GRanges is enough? There are no examples in tutorials.

Would be grateful for the help in these aspects.

HiTC • 1.3k views

ADD COMMENT • link updated 2.8 years ago by ahmed.abbaselmahdi • 0 • written 4.8 years ago by Konstantin Okonechnikov ▴ 40

0

Entering edit mode

Dear all, I am trying to use the same commands exactly as above,

It gives me the following errors. Any suggestions?

Error in Matrix::sparseMatrix(i = pos1, j = pos2, x = cdata[, 3], dims = c(length(ygi), : NA's in (i,j) are not allowed
Traceback:

1. importC(conFile, xgi = binsFile, rm.trans = TRUE)
2. Matrix::sparseMatrix(i = pos1, j = pos2, x = cdata[, 3], dims = c(length(ygi), 
 .     length(xgi)), dimnames = list(id(ygi), id(xgi)))
3. stop("NA's in (i,j) are not allowed")

Thanks

ADD REPLY • link 2.8 years ago ahmed.abbaselmahdi • 0

score 1 · Answer 1 · 2019-07-04

Hi,

Please find below my answers.

I am using the raw contacts as input, since the loess normalization is applied for the comparison procedure. But would it be more suitable to use iced normalized data?

Yes, using iced normalized maps should be better, although I do not expect to have huge changes on the correlation maps.

What is the optimal bin size to select for compartments computation? According to existing studies 250-500KBp is typically used. Results similarity was observed between these bin sizes. I also tried smaller, but it failed with errors: "Contact map looks big. Use mean method instead..."and "Empty correlation matrix".

That's a good question ! and actually, there is no magic answer. Chromosome compartments are expected to be quite large ... so to me, 250Kb is fine. But I know that in some recent studies, people tend to look at compartments at finer scale ... Regarding the message "Contact map looks big. Use mean method instead...", this is just because you have two ways to estimate the expected contact frequency based on genomic distance. The first method is based on a loess regression, and the second one uses the mean of contact for a given range of genomic distance. The loess was mainly used a couple of years ago for low resolution data, or for 5C data. I would recommend to simply use the mean of contact per genomic distances ... this is faster, and usually works well. Finally, the last message "Empty correlation matrix" means that, at high resolution, the data are too sparse to allow the computation of correlation maps ...

How to use gene coords to assign compartments? Simply GRanges is enough? There are no examples in tutorials.

Again, this could be discussed ... the PCA will give you two compartments, but will not tell you which one is active/inactive. To do so, people usually simply use the number of genes in each compartment types. As compartments A are expected to be active, we just assign as A compartments the one which are enriched in genes (but this is done regardless their expression level ...). So yes, a simple GRanges with gene coordinates is used and overlapped with compartment positions ... If you have ChIP-seq profiles with your Hi-C data, you should validate the compartment types by looking at active/repressive mark enrichments.

Hope it helps Nicolas