EdgeR norm.factors input
2
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.2 years ago
Dear Gordon, Thank you so much for your comments. This is exactly what I did for total read count normalization, I used norm.factors = 1 for total count (TC) normalization. Then here comes the question. As I mentioned in my previous post, I would like to compare the performance of different normalization methods. Besides that, I also would like to compare the results of normalized data with the results of raw count (RC) data (without taking care of any normalization). According to our previous discussion, I skiped the normalization step for RC, but the results were the same for TC and RC. Should I use norm.factors = 1/lib.size for RC? One more question, I have also considered the normalization method provided in DESeq package. For this normalization method, what should be my input of correct factor (norm.factors)? I have figured out the relation between the scaling factor (sizeFactors ) of DESeq package and the correct factor (norm.factors) of edgeR which is given as below: lib.size*norm.factors/mean(lib.size*norm.factors)=sizeFactors Now I know the lib.size and sizeFactors, I try to figure out what the norm.factors is for DESeq normalization method. This equation system involves n unknown variables with n-1 independent equations. Let X=norm.factors=(X1,X2,...,Xn)^T, lib.size=N=(N1,N2,...,Nn) and sizeFactors = S=(S1,S2,...,Sn), then X2=X1*(S2/S1)*(N1/N2) . . . Xn=X1*(Sn/S1)*(N1/Nn) Here * means the regular product. I need one more condition to find these unknown variables (X1,X2,...,Xn). Do you happenly know whether there is extra requirement that norm.factors needs to satisfy? Thank you! Yanzhu ---------------------------------------------------------- edgeR always takes the total read count into account, so norm.factors = 1 is equivalent to total read count normalization. Please read the section on normalization in the edgeR User's Guide. Best wishes Gordon > Date: Mon, 10 Feb 2014 11:06:31 -0800 (PST) > From: "Yanzhu [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, mlinyzh at gmail.com > Subject: [BioC] EdgeR norm.factor input > > > Dear Gordon, > > Thank you so much for your comments. > > One more question about the first question asked in my previous post > where I asked about how to supply the correct factor in the > normalization step. > > I would like use the total read count normalization method to normalize > the data then use the edgeR to test my multi-factor models as in my > previous post. The total read count normalization is given as > > X_ij/(N_j/mean(N))=X_ij*mean(N)/N_j, > > where X_ij is the read count of gene i sample j, N_j is the library size > of sample j, and mean(N) is the mean of library sizes over all samples. > My question is what is the input for y$samples$norm.factors? Can I do as > the following: y$samples$norm.factors = N/mean(N)? Where N is the vector > of library size of all samples, and mean(N) is the mean of library sizes > over all sample. Or could you please give me some suggestion? Thank you! > > > > Yanzhu > > --------------------------------------------------- > > Date: Fri, 7 Feb 2014 07:25:17 -0800 (PST) >> From: "Yanzhu [guest]" <guest at="" bioconductor.org=""> >> To: bioconductor at r-project.org, mlinyzh at gmail.com >> Subject: [BioC] EdgeR multi-factor testing questions >> >> >> Dear Gordon, >> >> Thank you so much for your comments. I have updated my code and get the >> different results for TMM and Upper quartile normalization methods. >> >> I have two more question regarding the normalization issue. I have tried >> different normalization methods and would like to compare their >> performance. My questions are: >> >> 1. In the users' guide 2.5.6, it mentions that normalization takes the >> form of correction factors that enter into the statistical model. Such >> correction factors are usually computed internally by edgeR functions, >> but it is also possible for a user to supply them.I would like to supply >> the correct factor to edgeR, how could I do this? > > Just enter in your own values: > > y$samples$norm.factors <- yourvalues > >> 2. I also would like to compare the testing results of normalized data >> with the results of raw data (without normalizing the data)? Could I >> just skip the the normalization step as below? > > Yes. > > Gordon > >> group<-paste(L,S,R,sep=".") >> design<-model.matrix(~L+R+S+L:R+L:S+R:S+L:R:S) >> y<-DGEList(counts=counts,group=group) >> #y<-calcNormFactors(y,method="upperquartile",p=0.75) ##skip this step >> >> y<-estimateGLMCommonDisp(y,design) >> y<-estimateGLMTagwiseDisp(y,design) >> >> fiteUQ_LRS<-glmFit(y,design,offset=offset ) >> >> Thanks. >> >> >> Yanzhu >> >> -- output of sessionInfo(): > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base > -- Sent via the guest posting facility at bioconductor.org.
Normalization edgeR DESeq Normalization edgeR DESeq • 3.2k views
ADD COMMENT
0
Entering edit mode
ying chen ▴ 340
@ying-chen-5085
Last seen 10.2 years ago
Hi guys, I just got this weird error with CNTools package. Basically I tried to map TCGA SNP6 level3 segmentation data to gene level. I have TCGA SNP6 level3 segmentation data fpor more than 16000 samples. All processed well except for a block of 300 samples, which gave me the following error message: > library(CNTools)> load("tcgaSNP6_df.RData")> sampleData <- data.fram e(ID=df$sample,chrom=df$chromosome,loc.start=df$start,loc.end=df$stop, num.mark=df$count,seg.mean=df$mean)> sampleData$ID <- as.character(sampleData$ID)> sampleData$chrom <- as.character(sampleData$chrom)> geneInfo <- read.delim("geneMap.txt")> sessionInfo()R version 3.0.2 (2013-09-25)Platform: x86_64-unknown- linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US [4] LC_COLLATE=en_US LC_MONETARY=en_US LC_MESSAGES=en_US [7] LC_PAPER=en_US LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C attached base packages:[1] tools stats graphics grDevices utils datasets methods [8] base other attached packages:[1] CNTools_1.18.0 genefilter_1.44.0 loaded via a namespace (and not attached): [1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 [4] BiocGenerics_0.8.0 DBI_0.2-7 IRanges_1.20.0 [7] parallel_3.0.2 RSQLite_0.11.4 splines_3.0.2 [10] stats4_3.0.2 survival_2.37-4 XML_3.98-1.1 [13] xtable_1.7-1 > cnseg <- CNSeg(sampleData[which(is.element(sampleData[, "ID"], unique(sampleData[, "ID"])[7201:7500])), ])> rdByGene <- getRS(cnseg, by = "gene", imput = FALSE, XY = FALSE, geneMap = geneInfo, what = "median") *** caught segfault ***address (nil), cause 'unknown' Traceback: 1: .C("getratios", as.character(map[, mapChrom]), as.double(map[, mapStart]), as.double(map[, mapEnd]), as.integer(nrow(map)), as.character(segData[, segChrom]), as.double(segData[, segStart]), as.double(segData[, segEnd]), as.integer(nrow(segData)), as.double(segData[, segMean]), as.character(what), as.double(segged), PACKAGE = "CNTools") 2: FUN(X[[1L]], ...) 3: lapply(splited, getGeneSegMean) 4: do.call("cbind", args = lapply(splited, getGeneSegMean)) 5: cbind(map, do.call("cbind", args = lapply(splited, getGeneSegMean))) 6: getReducedSeg(segList(segData), geneMap, what = what, segID = id(segData), segChrom = chromosome(segData), segStart = start(segData), segEnd = end(segData), segMean = segMean(segData), mapChrom = mapChrom, mapStart = mapStart, mapEnd = mapEnd) 7: seg2RS(object, by, imput, XY, geneMap, what = what, mapChrom = mapChrom, mapStart = mapStart, mapEnd = mapEnd) 8: getRS(cnseg, by = "gene", imput = FALSE, XY = FALSE, geneMap = geneInfo, what = "median") 9: getRS(cnseg, by = "gene", imput = FALSE, XY = FALSE, geneMap = geneInfo, what = "median") Possible actions:1: abort (with core dump, if enabled)2: normal R exit3: exit R without saving workspace4: exit R saving workspaceSelection: > Any suggestion? Thanks a lot for the help! Ying [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia
> Yanzhu [guest] guest at bioconductor.org > Tue Feb 11 15:38:03 CET 2014 > > Dear Gordon, > > Thank you so much for your comments. This is exactly what I did for > total read count normalization, I used norm.factors = 1 for total > count (TC) normalization. > > Then here comes the question. As I mentioned in my previous post, I > would like to compare the performance of different normalization > methods. Besides that, I also would like to compare the results of > normalized data with the results of raw count (RC) data (without > taking care of any normalization). According to our previous > discussion, I skiped the normalization step for RC, but the results > were the same for TC and RC. Well of course. As I told you, edgeR always takes the total count into account, and the norm.factors are equal to 1 by default. > Should I use > > norm.factors = 1/lib.size > > for RC? Ignoring the library sizes is obviously crazy, and edgeR does not provide you with options to do crazy analyses. I will not provide advice as to how do an analysis that can never be the right thing to do. > One more question, I have also considered the normalization method > provided in DESeq package. For this normalization method, what should > be my input of correct factor (norm.factors)? I have figured out the > relation between the scaling factor (sizeFactors ) of DESeq package > and the correct factor (norm.factors) of edgeR which is given as > below: Have you read the help page for calcNormFactors? It explains that the DESeq normalization is provided as an option: y <- calcNormFactors(y,method="RLE") Gordon > lib.size*norm.factors/mean(lib.size*norm.factors)=sizeFactors > > Now I know the lib.size and sizeFactors, I try to figure out what the > norm.factors is for DESeq normalization method. This equation system > involves n unknown variables with n-1 independent equations. Let X= > norm.factors=(X1,X2,...,Xn)^T, lib.size=N=(N1,N2,...,Nn) and > sizeFactors = S=(S1,S2,...,Sn), then > > X2=X1*(S2/S1)*(N1/N2) > . > . > . > Xn=X1*(Sn/S1)*(N1/Nn) > > Here * means the regular product. I need one more condition to find > these unknown variables (X1,X2,...,Xn). Do you happenly know whether > there is extra requirement that norm.factors needs to satisfy? > > Thank you! > > > Yanzhu > > ---------------------------------------------------------- > > edgeR always takes the total read count into account, so > > norm.factors = 1 > > is equivalent to total read count normalization. > > Please read the section on normalization in the edgeR User's Guide. > > Best wishes > Gordon > > > > Date: Mon, 10 Feb 2014 11:06:31 -0800 (PST) > > From: "Yanzhu [guest]" <guest at="" bioconductor.org=""> > > To: bioconductor at r-project.org, mlinyzh at gmail.com > > Subject: [BioC] EdgeR norm.factor input > > > > > > Dear Gordon, > > > > Thank you so much for your comments. > > > > One more question about the first question asked in my previous post > > where I asked about how to supply the correct factor in the > > normalization step. > > > > I would like use the total read count normalization method to normalize > > the data then use the edgeR to test my multi-factor models as in my > > previous post. The total read count normalization is given as > > > > X_ij/(N_j/mean(N))=X_ij*mean(N)/N_j, > > > > where X_ij is the read count of gene i sample j, N_j is the library size > > of sample j, and mean(N) is the mean of library sizes over all samples. > > My question is what is the input for y$samples$norm.factors? Can I do as > > the following: y$samples$norm.factors = N/mean(N)? Where N is the vector > > of library size of all samples, and mean(N) is the mean of library sizes > > over all sample. Or could you please give me some suggestion? Thank you! > > > > > > > > Yanzhu > > > > --------------------------------------------------- > > > > Date: Fri, 7 Feb 2014 07:25:17 -0800 (PST) > >> From: "Yanzhu [guest]" <guest at="" bioconductor.org=""> > >> To: bioconductor at r-project.org, mlinyzh at gmail.com > >> Subject: [BioC] EdgeR multi-factor testing questions > >> > >> > >> Dear Gordon, > >> > >> Thank you so much for your comments. I have updated my code and get the > >> different results for TMM and Upper quartile normalization methods. > >> > >> I have two more question regarding the normalization issue. I have tried > >> different normalization methods and would like to compare their > >> performance. My questions are: > >> > >> 1. In the users' guide 2.5.6, it mentions that normalization takes the > >> form of correction factors that enter into the statistical model. Such > >> correction factors are usually computed internally by edgeR functions, > >> but it is also possible for a user to supply them.I would like to supply > >> the correct factor to edgeR, how could I do this? > > > > Just enter in your own values: > > > > y$samples$norm.factors <- yourvalues > > > >> 2. I also would like to compare the testing results of normalized data > >> with the results of raw data (without normalizing the data)? Could I > >> just skip the the normalization step as below? > > > > Yes. > > > > Gordon > > > >> group<-paste(L,S,R,sep=".") > >> design<-model.matrix(~L+R+S+L:R+L:S+R:S+L:R:S) > >> y<-DGEList(counts=counts,group=group) > >> #y<-calcNormFactors(y,method="upperquartile",p=0.75) ##skip this step > >> > >> y<-estimateGLMCommonDisp(y,design) > >> y<-estimateGLMTagwiseDisp(y,design) > >> > >> fiteUQ_LRS<-glmFit(y,design,offset=offset ) > >> > >> Thanks. > >> > >> > >> Yanzhu > >> > >> > > > -- output of sessionInfo(): > > > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT

Login before adding your answer.

Traffic: 677 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6