Question

EdgeR norm.factors input

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Dear Gordon, Thank you so much for your comments. This is exactly what I did for total read count normalization, I used norm.factors = 1 for total count (TC) normalization. Then here comes the question. As I mentioned in my previous post, I would like to compare the performance of different normalization methods. Besides that, I also would like to compare the results of normalized data with the results of raw count (RC) data (without taking care of any normalization). According to our previous discussion, I skiped the normalization step for RC, but the results were the same for TC and RC. Should I use norm.factors = 1/lib.size for RC? One more question, I have also considered the normalization method provided in DESeq package. For this normalization method, what should be my input of correct factor (norm.factors)? I have figured out the relation between the scaling factor (sizeFactors ) of DESeq package and the correct factor (norm.factors) of edgeR which is given as below: lib.size*norm.factors/mean(lib.size*norm.factors)=sizeFactors Now I know the lib.size and sizeFactors, I try to figure out what the norm.factors is for DESeq normalization method. This equation system involves n unknown variables with n-1 independent equations. Let X=norm.factors=(X1,X2,...,Xn)^T, lib.size=N=(N1,N2,...,Nn) and sizeFactors = S=(S1,S2,...,Sn), then X2=X1*(S2/S1)*(N1/N2) . . . Xn=X1*(Sn/S1)*(N1/Nn) Here * means the regular product. I need one more condition to find these unknown variables (X1,X2,...,Xn). Do you happenly know whether there is extra requirement that norm.factors needs to satisfy? Thank you! Yanzhu ---------------------------------------------------------- edgeR always takes the total read count into account, so norm.factors = 1 is equivalent to total read count normalization. Please read the section on normalization in the edgeR User's Guide. Best wishes Gordon > Date: Mon, 10 Feb 2014 11:06:31 -0800 (PST) > From: "Yanzhu [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, mlinyzh at gmail.com > Subject: [BioC] EdgeR norm.factor input > > > Dear Gordon, > > Thank you so much for your comments. > > One more question about the first question asked in my previous post > where I asked about how to supply the correct factor in the > normalization step. > > I would like use the total read count normalization method to normalize > the data then use the edgeR to test my multi-factor models as in my > previous post. The total read count normalization is given as > > X_ij/(N_j/mean(N))=X_ij*mean(N)/N_j, > > where X_ij is the read count of gene i sample j, N_j is the library size > of sample j, and mean(N) is the mean of library sizes over all samples. > My question is what is the input for y$samples$norm.factors? Can I do as > the following: y$samples$norm.factors = N/mean(N)? Where N is the vector > of library size of all samples, and mean(N) is the mean of library sizes > over all sample. Or could you please give me some suggestion? Thank you! > > > > Yanzhu > > --------------------------------------------------- > > Date: Fri, 7 Feb 2014 07:25:17 -0800 (PST) >> From: "Yanzhu [guest]" <guest at="" bioconductor.org=""> >> To: bioconductor at r-project.org, mlinyzh at gmail.com >> Subject: [BioC] EdgeR multi-factor testing questions >> >> >> Dear Gordon, >> >> Thank you so much for your comments. I have updated my code and get the >> different results for TMM and Upper quartile normalization methods. >> >> I have two more question regarding the normalization issue. I have tried >> different normalization methods and would like to compare their >> performance. My questions are: >> >> 1. In the users' guide 2.5.6, it mentions that normalization takes the >> form of correction factors that enter into the statistical model. Such >> correction factors are usually computed internally by edgeR functions, >> but it is also possible for a user to supply them.I would like to supply >> the correct factor to edgeR, how could I do this? > > Just enter in your own values: > > y$samples$norm.factors <- yourvalues > >> 2. I also would like to compare the testing results of normalized data >> with the results of raw data (without normalizing the data)? Could I >> just skip the the normalization step as below? > > Yes. > > Gordon > >> group<-paste(L,S,R,sep=".") >> design<-model.matrix(~L+R+S+L:R+L:S+R:S+L:R:S) >> y<-DGEList(counts=counts,group=group) >> #y<-calcNormFactors(y,method="upperquartile",p=0.75) ##skip this step >> >> y<-estimateGLMCommonDisp(y,design) >> y<-estimateGLMTagwiseDisp(y,design) >> >> fiteUQ_LRS<-glmFit(y,design,offset=offset ) >> >> Thanks. >> >> >> Yanzhu >> >> -- output of sessionInfo(): > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base > -- Sent via the guest posting facility at bioconductor.org.

Normalization edgeR DESeq Normalization edgeR DESeq • 3.0k views

ADD COMMENT • link updated 10.2 years ago by Gordon Smyth 50k • written 10.2 years ago by Guest User ★ 13k

score 0 · Answer 1 · 2014-02-11

Hi guys, I just got this weird error with CNTools package. Basically I tried to map TCGA SNP6 level3 segmentation data to gene level. I have TCGA SNP6 level3 segmentation data fpor more than 16000 samples. All processed well except for a block of 300 samples, which gave me the following error message: > library(CNTools)> load("tcgaSNP6_df.RData")> sampleData <- data.fram e(ID=df$sample,chrom=df$chromosome,loc.start=df$start,loc.end=df$stop, num.mark=df$count,seg.mean=df$mean)> sampleData$ID <- as.character(sampleData$ID)> sampleData$chrom <- as.character(sampleData$chrom)> geneInfo <- read.delim("geneMap.txt")> sessionInfo()R version 3.0.2 (2013-09-25)Platform: x86_64-unknown- linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US [4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US [7] LC_PAPER=en_US LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C attached base packages:[1] tools stats graphics grDevices utils datasets methods [8] base other attached packages:[1] CNTools_1.18.0 genefilter_1.44.0 loaded via a namespace (and not attached): [1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 [4] BiocGenerics_0.8.0 DBI_0.2-7 IRanges_1.20.0 [7] parallel_3.0.2 RSQLite_0.11.4 splines_3.0.2 [10] stats4_3.0.2 survival_2.37-4 XML_3.98-1.1 [13] xtable_1.7-1 > cnseg <- CNSeg(sampleData[which(is.element(sampleData[, "ID"], unique(sampleData[, "ID"])[7201:7500])), ])> rdByGene <- getRS(cnseg, by = "gene", imput = FALSE, XY = FALSE, geneMap = geneInfo, what = "median") *** caught segfault ***address (nil), cause 'unknown' Traceback: 1: .C("getratios", as.character(map[, mapChrom]), as.double(map[, mapStart]), as.double(map[, mapEnd]), as.integer(nrow(map)), as.character(segData[, segChrom]), as.double(segData[, segStart]), as.double(segData[, segEnd]), as.integer(nrow(segData)), as.double(segData[, segMean]), as.character(what), as.double(segged), PACKAGE = "CNTools") 2: FUN(X[[1L]], ...) 3: lapply(splited, getGeneSegMean) 4: do.call("cbind", args = lapply(splited, getGeneSegMean)) 5: cbind(map, do.call("cbind", args = lapply(splited, getGeneSegMean))) 6: getReducedSeg(segList(segData), geneMap, what = what, segID = id(segData), segChrom = chromosome(segData), segStart = start(segData), segEnd = end(segData), segMean = segMean(segData), mapChrom = mapChrom, mapStart = mapStart, mapEnd = mapEnd) 7: seg2RS(object, by, imput, XY, geneMap, what = what, mapChrom = mapChrom, mapStart = mapStart, mapEnd = mapEnd) 8: getRS(cnseg, by = "gene", imput = FALSE, XY = FALSE, geneMap = geneInfo, what = "median") 9: getRS(cnseg, by = "gene", imput = FALSE, XY = FALSE, geneMap = geneInfo, what = "median") Possible actions:1: abort (with core dump, if enabled)2: normal R exit3: exit R without saving workspace4: exit R saving workspaceSelection: > Any suggestion? Thanks a lot for the help! Ying [[alternative HTML version deleted]]

score 0 · Answer 2 · 2014-02-12

> Yanzhu [guest] guest at bioconductor.org > Tue Feb 11 15:38:03 CET 2014 > > Dear Gordon, > > Thank you so much for your comments. This is exactly what I did for > total read count normalization, I used norm.factors = 1 for total > count (TC) normalization. > > Then here comes the question. As I mentioned in my previous post, I > would like to compare the performance of different normalization > methods. Besides that, I also would like to compare the results of > normalized data with the results of raw count (RC) data (without > taking care of any normalization). According to our previous > discussion, I skiped the normalization step for RC, but the results > were the same for TC and RC. Well of course. As I told you, edgeR always takes the total count into account, and the norm.factors are equal to 1 by default. > Should I use > > norm.factors = 1/lib.size > > for RC? Ignoring the library sizes is obviously crazy, and edgeR does not provide you with options to do crazy analyses. I will not provide advice as to how do an analysis that can never be the right thing to do. > One more question, I have also considered the normalization method > provided in DESeq package. For this normalization method, what should > be my input of correct factor (norm.factors)? I have figured out the > relation between the scaling factor (sizeFactors ) of DESeq package > and the correct factor (norm.factors) of edgeR which is given as > below: Have you read the help page for calcNormFactors? It explains that the DESeq normalization is provided as an option: y <- calcNormFactors(y,method="RLE") Gordon > lib.size*norm.factors/mean(lib.size*norm.factors)=sizeFactors > > Now I know the lib.size and sizeFactors, I try to figure out what the > norm.factors is for DESeq normalization method. This equation system > involves n unknown variables with n-1 independent equations. Let X= > norm.factors=(X1,X2,...,Xn)^T, lib.size=N=(N1,N2,...,Nn) and > sizeFactors = S=(S1,S2,...,Sn), then > > X2=X1*(S2/S1)*(N1/N2) > . > . > . > Xn=X1*(Sn/S1)*(N1/Nn) > > Here * means the regular product. I need one more condition to find > these unknown variables (X1,X2,...,Xn). Do you happenly know whether > there is extra requirement that norm.factors needs to satisfy? > > Thank you! > > > Yanzhu > > ---------------------------------------------------------- > > edgeR always takes the total read count into account, so > > norm.factors = 1 > > is equivalent to total read count normalization. > > Please read the section on normalization in the edgeR User's Guide. > > Best wishes > Gordon > > > > Date: Mon, 10 Feb 2014 11:06:31 -0800 (PST) > > From: "Yanzhu [guest]" <guest at="" bioconductor.org=""> > > To: bioconductor at r-project.org, mlinyzh at gmail.com > > Subject: [BioC] EdgeR norm.factor input > > > > > > Dear Gordon, > > > > Thank you so much for your comments. > > > > One more question about the first question asked in my previous post > > where I asked about how to supply the correct factor in the > > normalization step. > > > > I would like use the total read count normalization method to normalize > > the data then use the edgeR to test my multi-factor models as in my > > previous post. The total read count normalization is given as > > > > X_ij/(N_j/mean(N))=X_ij*mean(N)/N_j, > > > > where X_ij is the read count of gene i sample j, N_j is the library size > > of sample j, and mean(N) is the mean of library sizes over all samples. > > My question is what is the input for y$samples$norm.factors? Can I do as > > the following: y$samples$norm.factors = N/mean(N)? Where N is the vector > > of library size of all samples, and mean(N) is the mean of library sizes > > over all sample. Or could you please give me some suggestion? Thank you! > > > > > > > > Yanzhu > > > > --------------------------------------------------- > > > > Date: Fri, 7 Feb 2014 07:25:17 -0800 (PST) > >> From: "Yanzhu [guest]" <guest at="" bioconductor.org=""> > >> To: bioconductor at r-project.org, mlinyzh at gmail.com > >> Subject: [BioC] EdgeR multi-factor testing questions > >> > >> > >> Dear Gordon, > >> > >> Thank you so much for your comments. I have updated my code and get the > >> different results for TMM and Upper quartile normalization methods. > >> > >> I have two more question regarding the normalization issue. I have tried > >> different normalization methods and would like to compare their > >> performance. My questions are: > >> > >> 1. In the users' guide 2.5.6, it mentions that normalization takes the > >> form of correction factors that enter into the statistical model. Such > >> correction factors are usually computed internally by edgeR functions, > >> but it is also possible for a user to supply them.I would like to supply > >> the correct factor to edgeR, how could I do this? > > > > Just enter in your own values: > > > > y$samples$norm.factors <- yourvalues > > > >> 2. I also would like to compare the testing results of normalized data > >> with the results of raw data (without normalizing the data)? Could I > >> just skip the the normalization step as below? > > > > Yes. > > > > Gordon > > > >> group<-paste(L,S,R,sep=".") > >> design<-model.matrix(~L+R+S+L:R+L:S+R:S+L:R:S) > >> y<-DGEList(counts=counts,group=group) > >> #y<-calcNormFactors(y,method="upperquartile",p=0.75) ##skip this step > >> > >> y<-estimateGLMCommonDisp(y,design) > >> y<-estimateGLMTagwiseDisp(y,design) > >> > >> fiteUQ_LRS<-glmFit(y,design,offset=offset ) > >> > >> Thanks. > >> > >> > >> Yanzhu > >> > >> > > > -- output of sessionInfo(): > > > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}