Question: Applying norm.factors in RNA-seq analysis to other expression data by using edgeR
0
27 days ago by
Germany, Dresden, TUD
2002ymx020 wrote:

Hi edgeR authors， I would like to apply the norm.factors calculated in gene expressions by using edgeR pipeline to other expression data, for example small RNA and transposon expression in the same samples. After puting counts and group into DGEList object, I assigned norm.factors to samples$norm.factors and then estimate dispersion. But it seemed that this custom norm.factors didn't work for dispersion estimation, because I got a weird BCV plot with only a few variables (should be 1500 variables) and a lot of "Inf" in the fit$df.prior. I also tried calcNormFactors(x), then replaced the norm.factors by norm.factors from RNA-seq analysis. Still, it did not wok. We have different kind of expression data (gene expression, transposon, small RNA and so on) from the same samples. we would like to use the same norm.factors calculated in RNA-seq DE analysis to other expression data to account for lib.size and composition bias. How can I do this?

Regard. Mingxing Yang

edger • 94 views
modified 27 days ago by Gordon Smyth38k • written 27 days ago by 2002ymx020
Answer: Applying norm.factors in RNA-seq analysis to other expression data by using edge
1
27 days ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

Why would you copy norm.factors from one type of expression data to another? There is no need to do that and it is never a correct thing to do.

edgeR should be run separately, with a separate calcNormFactors step, on each kind of expression data that you have. The norm.factors will be specific to each data type.

Thank you for your time, Gordon. Maybe I was not clear in my question post. These different expression data are actually from the same samples and from same sequencing files (.fasta). I just got the different expression data by using different aligners to different genomic regions for specific levels of expressions (for example RNA and transposon). I supposed that I should normalized the different expression data by using the same norm.factors, because they are from the same sequencing files. Could you give me your comments?

Best, Mingxing Yang

1

I already understood you the first time, and the answer remains the same. There are no circumstances in which it is appropriate to copy norm.factors from one dataset to another, whether derived from the same samples or not.

What I would have appreciated from you is some explanation of what you thought you were achieving by copying the norm.factors. Why would you think that is necessary or good?

Thank you, Gordon. I think the read depths are the same. Even the library composition could be different, but the trimmed mean could be similar. So I supposed using the same norm.factors could be good. If I run separate calcNormFactors function on each expression data, I could get different norm.factors for the same samples. It would be difficult to understand that the same samples have different norm.factors for different type of data. Best, mingxing

Read depth, library size and composition effects are all different things. There is no reason to think that different kinds of expression should share the same norm.factors.

Many thanks. Now I understand why norm.factors is data-specific. The aim of calcNormFactors function to adjust expressions in each samples to a level at which all samples are comparable by norm.factors*lib.size. If I use a fixed norm.factors, then the normalized data are still uncomparable, because the lib.size are different. Best, Mingxing YANG