**0**wrote:

Hi everybody,

I am working with RNASeq data.

I am using* **DESeq2* and I would like to correct for the GC content and length bias using *cqn*.

When I tested the normalized counts with NOISeq to test if the biases were corrected, I noticed that when I divided out the normalization factors by the geometric mean (as suggested in DESeq2 manual), the cqn correction becomes ineffective.

Instead when I used the normalization factors without dividing out the geometric mean (different scale) the biases are significantly reduced

Could you help me with this? I cannot understand why this is not working.

Here is the code I used

#> head(uCovar_filt) # length GC_content #ENSBTAG00000000005 2310 43.65 #ENSBTAG00000000009 1543 65.02 #ENSBTAG00000000010 1348 48.38 #ENSBTAG00000000012 1227 38.70 #ENSBTAG00000000013 4478 37.35 #ENSBTAG00000000014 1044 50.95 #> head(countData) # 6 8 10 11 16 20 27 28 29 33 42 14 17 #ENSBTAG00000000005 222 160 252 230 448 380 306 259 333 420 240 218 514 #ENSBTAG00000000009 103 139 138 97 57 198 161 57 138 149 60 89 303 #ENSBTAG00000000010 290 593 458 423 653 718 608 402 527 686 459 260 641 #ENSBTAG00000000012 390 369 318 245 425 342 372 274 392 391 376 217 291 #ENSBTAG00000000013 1667 2637 1862 1537 2001 2292 1730 1445 2182 2125 1686 940 2225 #ENSBTAG00000000014 614 1814 1276 751 959 1867 1733 865 1325 1777 882 674 1177 # 19 21 25 32 34 38 #ENSBTAG00000000005 316 336 236 315 269 334 #ENSBTAG00000000009 171 140 121 97 78 117 #ENSBTAG00000000010 601 498 416 377 335 408 #NSBTAG00000000012 285 213 286 295 264 291 #ENSBTAG00000000013 1858 1538 1606 1538 1456 1813 #ENSBTAG00000000014 1628 1465 1273 969 670 759 library(DESeq2) # input data design=~covariate1+covariate2+covariate3 dds <- DESeqDataSetFromMatrix(countData = countData,colData = pheno,design =design) #estimate size factors to be used in cqn dds=estimateSizeFactors(dds) sizefactors=sizeFactors(dds) #run cqn to generate the normalization factors matrix to account for GC and length bias library(cqn) cqnObject=cqn(counts = countData,x=uCovar_filt$GC_content, lengths = uCovar_filt$length, sizeFactors =sizefactors) #extract normalization factors as suggested in DESeq2 manual cqnOffset <- cqnObject$glm.offset cqnNormFactors <- exp(cqnOffset) #generate the normalization factor matrices with and without dividing out the geometric mean normFactors=cqnNormFactors normFactors_sameScale <- cqnNormFactors / exp(rowMeans(log(cqnNormFactors))) #create other two DESeq objects using the normalization facors with (sameScale)and without dividing out the geometric mean. # create dds1 using normFactors_sameScale dds1 <- DESeqDataSetFromMatrix(countData = countData,colData = pheno,design =design) normalizationFactors(dds1)=normFactors_sameScale # create dds2 using normFactors dds2 <- DESeqDataSetFromMatrix(countData = countData,colData = pheno,design =design) normalizationFactors(dds2)=normFactors #extract the normalized counts #dds contains the size factors generated by DESEq2 #dds1 contains the normalization factor matrix generated with cqn, dividing out the geometric means #dds2 contains the normalization factor matrix generated wit cqn, WITHOUT dividing out the geometric means counts_normalized=counts(dds,normalized=TRUE) #(Fig A) counts_nobias_sameScale=counts(dds1,normalized=TRUE) #(Fig B) counts_nobias=counts(dds2,normalized=TRUE) #(Fig C) #I tested these three count matrices with NOISeq. I attached the plots generated (Fig A, Fig B, Fig C)

Fig A (normalization with DESeq2)

Fig B (bias correction with cqn dividing out the geometric mean)

Fig C (bias correction with cqn WITHOUT dividing out the geometric mean)

Thank you.

Best regards,

Gianluca

**20k**• written 18 months ago by glmazzo •

**0**