between lane normalization upper quantile
1
0
Entering edit mode
elesofi • 0
@30f67af1
Last seen 9 months ago
Belgium

Hi

I'm comparing the outputs of the betweenLaneNormalization(which="upper") function and the calcNormFactors(method="upperquartile") function. Technically both functions are based on the same method from Bullard. But I get completely opposite library sizes after I perform the normalization. Which one is right? Am I doing something wrong?

library(RUVSeq)
library(edgeR)
library(zebrafishRNASeq)
data(zfGenes)

filter <- apply(zfGenes, 1, function(x) length(x[x>5])>=2)
filtered <- zfGenes[filter,]
genes <- rownames(filtered)[grep("^ENS", rownames(filtered))]
spikes <- rownames(filtered)[grep("^ERCC", rownames(filtered))]

x <- as.factor(rep(c("Ctl", "Trt"), each=3))
set <- newSeqExpressionSet(as.matrix(filtered),
                           phenoData = data.frame(x, row.names=colnames(filtered)),)

set <- betweenLaneNormalization(set, which="upper",offset = T)
![enter image description here][1]
libsizes1 <- colSums(set@assayData$normalizedCounts)

design <- model.matrix(~x, data=pData(set))
y <- DGEList(counts=counts(set), group=x)
y <- calcNormFactors(y, method="upperquartile")

libsizes2 <- y$samples$lib.size*y$samples$norm.factors

plot(libsizes1,libsizes2)
    sessionInfo( )
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8 
[2] LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
 [1] zebrafishRNASeq_1.18.0      RUVSeq_1.32.0              
 [3] EDASeq_2.32.0               ShortRead_1.56.1           
 [5] GenomicAlignments_1.34.1    SummarizedExperiment_1.28.0
 [7] MatrixGenerics_1.10.0       matrixStats_1.2.0          
 [9] Rsamtools_2.14.0            GenomicRanges_1.50.2       
[11] Biostrings_2.66.0           GenomeInfoDb_1.34.9        
[13] XVector_0.38.0              IRanges_2.32.0             
[15] S4Vectors_0.36.2            BiocParallel_1.32.6        
[17] Biobase_2.58.0              BiocGenerics_0.44.0        
[19] edgeR_3.40.2                limma_3.54.2               

loaded via a namespace (and not attached):
 [1] bitops_1.0-7           bit64_4.0.5           
 [3] filelock_1.0.3         RColorBrewer_1.1-3    
 [5] progress_1.2.3         httr_1.4.7            
 [7] tools_4.2.0            utf8_1.2.4            
 [9] R6_2.5.1               DBI_1.2.2             
[11] tidyselect_1.2.0       prettyunits_1.2.0     
[13] bit_4.0.5              curl_5.2.1            
[15] compiler_4.2.0         cli_3.6.2             
[17] xml2_1.3.6             DelayedArray_0.24.0   
[19] rtracklayer_1.58.0     rappdirs_0.3.3        
[21] stringr_1.5.1          digest_0.6.34         
[23] rmarkdown_2.25         R.utils_2.12.3        
[25] jpeg_0.1-10            pkgconfig_2.0.3       
[27] htmltools_0.5.7        dbplyr_2.4.0          
[29] fastmap_1.1.1          rlang_1.1.3           
[31] rstudioapi_0.15.0      RSQLite_2.3.5         
[33] BiocIO_1.8.0           generics_0.1.3        
[35] hwriter_1.3.2.1        dplyr_1.1.4           
[37] R.oo_1.26.0            RCurl_1.98-1.14       
[39] magrittr_2.0.3         GenomeInfoDbData_1.2.9
[41] interp_1.1-6           Matrix_1.6-5          
[43] Rcpp_1.0.12            fansi_1.0.6           
[45] lifecycle_1.0.4        R.methodsS3_1.8.2     
[47] stringi_1.8.3          yaml_2.3.8            
[49] MASS_7.3-56            zlibbioc_1.44.0       
[51] BiocFileCache_2.6.1    grid_4.2.0            
[53] blob_1.2.4             parallel_4.2.0        
[55] crayon_1.5.2           deldir_2.0-4          
[57] lattice_0.20-45        GenomicFeatures_1.50.4
[59] hms_1.1.3              KEGGREST_1.38.0       
[61] locfit_1.5-9.9         knitr_1.45            
[63] pillar_1.9.0           rjson_0.2.21          
[65] codetools_0.2-18       biomaRt_2.54.1        
[67] XML_3.99-0.16.1        glue_1.7.0            
[69] evaluate_0.23          latticeExtra_0.6-30   
[71] png_0.1-8              vctrs_0.6.5           
[73] cachem_1.0.8           xfun_0.42             
[75] aroma.light_3.28.0     restfulr_0.0.15       
[77] tibble_3.2.1           AnnotationDbi_1.60.2  
[79] memoise_2.0.1
EDASeq edgeR • 596 views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 16 hours ago
WEHI, Melbourne, Australia

You seem to be assuming that you can simply add up the normalizedCounts values from betweenLaneNormalization to get normalized library sizes. I am not familiar with EDASeq, but I suspect that assumption might not be correct. The normalizedCounts values might perhaps be counts-per-million, in which case the column sums wouldn't have any meaning. Unfortunately, the help page for betweenLaneNormalization doesn't give any information about the output object. According to Section 6.1 of the EDASeq vignette, you can extract log library sizes using the offset function.

ADD COMMENT
1
Entering edit mode

Gordon is correct. The column sums of the normalizedCounts matrix are meaningless and not intended to be used as size factors.

You can use offst(set) to extract a matrix of offsets to be added to downstream negative binomial models (e.g., in edgeR's glmFit) as illustrated in the EDASeq vignette. Note that the offsets extracted from EDASeq need to be multiplied by -1 before using them in edgeR (see section 7.3 of the EDASeq vignette for details).

ADD REPLY

Login before adding your answer.

Traffic: 664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6