Help needed with CopyNumber analysis for Affymetrix 250K arrays

0

Entering edit mode

Christian.Stratowa@vie.boehringer-ingel… ▴ 270

@christianstratowavieboehringer-ingelheimcom-545

Last seen 11.5 years ago

Dear all, Until now I have done all CopyNumber (and LOH) analysis using Affymetrix CNAT4. However, I would prefer to use Bioconductor for this purpose, thus I have a couple of questions: 1, Normalization and summarization of mapping array 50K and 250K CEL- files: Currently, there seem to be only two packages available, which are able to read mapping array CEL-files, namely: package "oligo" and packages "PLASQ" and "PLASQ500K", respectively. Using package "oligo" I can do: > library(oligo) > snprma250 <- justSNPRMA(cels250, phenoData=pheno250) Then I get the normalized intensities: > asTA250 <- antisenseThetaA(snprma250) > asTB250 <- antisenseThetaB(snprma250) > sTA250 <- senseThetaA(snprma250) > sTB250 <- senseThetaB(snprma250) Using package "PLASQ500K" I can do: > library(PLASQ500K) > ref <- celExtNorm("SND", "Sty") > sam <- celExtract("STD", "Sty") I get a matrix of normalized probe intensities for reference (ref) and samples (sam). Are there other packages available which can use mapping array CEL- files? 2, Genotyping: Package "oligo" can be used for genotyping: > crlmmOut250 <- justCRLMM(cels250, phenoData=pheno250) > genocall250 <- calls(crlmmOut250) > genoconf250 <- callsConfidence(crlmmOut250) However, the following results in an error: > snprma250 <- justSNPRMA(cels250, phenoData=pheno250) > crlmmOut250 <- crlmm(snprma250, correctionFile="outputEM.rda") see: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080128/50495 06c/att achment.pl Package "PLASQ500K" could also be used for genotyping: > geno <- EMSNP(???) Although I did not try it, this function seems to have a huge memory problem, see below. 3, CopyNumber analysis: Although there seem to be some packages which could use the output from the Affymetrix CNAT4 results, it seems that there is currently no package able to do copynumber analysis for Affymetrix mapping arrays. Is this correct? 3a, CNRLMM: In a Johns Hopkins Tech Report, Paper 122, 2006, Wang, Caravalho et al describe a new copynumber algorithm, which they want to make available at Bioconductor. Does anybody know when the CNRLMM algorithm will be available? 3b, PLASQ500K I tried to compute parent-specific copy number using PLASQ500K: > library(PLASQ500K) > psCN <- pscn(StyFolder="STD",normStyFolder="SND",betasSty=NULL,quantSty=NULL,b etasSty File="betasSty.Rdata",rawCNStyfile="rawCNSty.Rdata") Using only 18 250K Sty CEL-files it was impossible to finish this calculation. On a 32GB RAM Linux server the job got killed, since function EMSNP() which is called from function getBetas() used up all RAM. Starting the computation on our 64GB RAM Linux server, function EMSNP() could be executed, nevertheless, we had to kill the job, when it reached memory consumption of 74GB!!! at a later stage! 3c, Compute raw copy numbers for unpaired copynumber analysis: Using the results from justSNPRMA() I tried to compute the copynumbers in the following way: # Reference files snprma250ref <- justSNPRMA(cels250ref, phenoData=pheno250ref) # Sample files snprma250sam <- justSNPRMA(cels250sam, phenoData=pheno250sam) ## separate allels combined as in CNAT4, see cnat_4_algorithm_whitepaper.pdf, page 9: # TCN(sumLog) = log2(SamA/RefA) + log2(SamB/RefB) # Reference for allele A: # allele A as array ref250A <- array(NA, dim=c(nrow(antisenseThetaA(snprma250ref)),ncol(antisenseThetaA(snprma2 50ref)) , 2), dimnames=list(rownames(antisenseThetaA(snprma250ref)),colnames(antisen seTheta A(snprma250ref)),c("antisense","sense"))) ref250A[,,1] <- antisenseThetaA(snprma250ref) ref250A[,,2] <- senseThetaA(snprma250ref) # Reference A: rowMeans over sense and antisense strand refA <- sapply(1:dim(ref250A)[2],function(x)rowMeans(ref250A[,x,],na.rm=T)) colnames(refA) <- colnames(ref250A) # Reference for allele B: # allele B as array ref250B <- array(NA, dim=c(nrow(antisenseThetaB(snprma250ref)),ncol(antisenseThetaB(snprma2 50ref)) , 2), dimnames=list(rownames(antisenseThetaB(snprma250ref)),colnames(antisen seTheta B(snprma250ref)),c("antisense","sense"))) ref250B[,,1] <- antisenseThetaB(snprma250ref) ref250B[,,2] <- senseThetaB(snprma250ref) # Reference B: rowMeans over sense and antisense strand refB <- sapply(1:dim(ref250B)[2],function(x)rowMeans(ref250B[,x,],na.rm=T)) colnames(refB) <- colnames(ref250B) # Sample for allele A: # allele A as array sam250A <- array(NA, dim=c(nrow(antisenseThetaA(snprma250sam)),ncol(antisenseThetaA(snprma2 50sam)) , 2), dimnames=list(rownames(antisenseThetaA(snprma250sam)),colnames(antisen seTheta A(snprma250sam)),c("antisense","sense"))) sam250A[,,1] <- antisenseThetaA(snprma250sam) sam250A[,,2] <- senseThetaA(snprma250sam) # Sample A: rowMeans over sense and antisense strand samA <- sapply(1:dim(sam250A)[2],function(x)rowMeans(sam250A[,x,],na.rm=T)) colnames(samA) <- colnames(sam250A) # Sample for allele B: # allele B as array sam250B <- array(NA, dim=c(nrow(antisenseThetaB(snprma250sam)),ncol(antisenseThetaB(snprma2 50sam)) , 2), dimnames=list(rownames(antisenseThetaB(snprma250sam)),colnames(antisen seTheta B(snprma250sam)),c("antisense","sense"))) sam250B[,,1] <- antisenseThetaB(snprma250sam) sam250B[,,2] <- senseThetaB(snprma250sam) # Sample B: rowMeans over sense and antisense strand samB <- sapply(1:dim(sam250B)[2],function(x)rowMeans(sam250B[,x,],na.rm=T)) colnames(samB) <- colnames(sam250B) # Total CopyNumber TCN(sumLog), see cnat_4_algorithm_whitepaper.pdf, page 9 TCN.sL <- (samA - rowMeans(refA)) + (samB - rowMeans(refB)) # real copy number is: cn = 2^(2^cn) ?? (or 2^(cn+1) ??) cn.sL <- 2^(2^TCN.sL) headcn.sL) # CEU_NA06993_STY.CEL CEU_NA06994_STY.CEL CEU_NA07022_STY.CEL #SNP_A-1780271 1.801377 3.034645 2.314986 #SNP_A-1780274 2.017805 2.494345 2.370112 #SNP_A-1780277 1.558268 2.446690 2.983195 #SNP_A-1780278 1.879762 1.859002 1.697422 #SNP_A-1780283 2.064631 1.639300 1.912674 #SNP_A-1780290 2.142572 2.738094 2.029215 # or alternatively: cn = 2^cnA + 2^cnB ?? cn <- 2^(samA - rowMeans(refA)) + 2^(samB - rowMeans(refB)) head(cn) # CEU_NA06993_STY.CEL CEU_NA06994_STY.CEL CEU_NA07022_STY.CEL #SNP_A-1780271 1.859447 2.786287 2.369363 #SNP_A-1780274 2.160573 2.315243 2.271201 #SNP_A-1780277 3.203198 2.453341 2.773667 #SNP_A-1780278 1.908932 1.990323 1.748716 #SNP_A-1780283 2.046767 1.691257 1.937375 #SNP_A-1780290 2.098621 2.416547 2.020832 - Is this computation correct? - Is this way to compute the copynumbers a valuable option? - Are there any alternatives to compute the copynumbers using R packages? Thank you in advance Best regards Christian ============================================== Christian Stratowa, PhD Boehringer Ingelheim Austria Dept NCE Lead Discovery - Bioinformatics Dr. Boehringergasse 5-11 A-1121 Vienna, Austria Tel.: ++43-1-80105-2470 Fax: ++43-1-80105-2782 email: christian.stratowa at vie.boehringer-ingelheim.com

Normalization probe copynumber Normalization probe copynumber • 2.5k views

ADD COMMENT • link updated 18.1 years ago by Henrik Bengtsson ★ 2.4k • written 18.1 years ago by Christian.Stratowa@vie.boehringer-ingel… ▴ 270

0

Entering edit mode

Vincent J. Carey, Jr. 6.7k

@vincent-j-carey-jr-4

Last seen 4 days ago

United States

> > However, the following results in an error: > > snprma250 <- justSNPRMA(cels250, phenoData=pheno250) > > crlmmOut250 <- crlmm(snprma250, correctionFile="outputEM.rda") > see: > https://stat.ethz.ch/pipermail/bioconductor/attachments/20080128/504 9506c/att > achment.pl Can you put a sessionInfo() for the problem with this current computation? The pd.mapping package versions are up to 0.4.1 in devel, and i think you need to use them. The information transmitted in this electronic communica...{{dropped:9}}

ADD COMMENT • link 18.1 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Dear Vincent Updating to oligo_1.3.7 and pd.mapping_0.4.1 did not help. I get the same error as before which I can repeat with justCRLMM: > crlmmOut <- crlmm(snprmaResults, correctionFile="outputEM.rda") M correction not provided. Calculating. Will take several minutes. Fitting mixture model to 3 arrays. Epsilon must reach 50. Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: String or BLOB exceeded size limit) > tmp <- justCRLMM(cels, phenoData=pheno, batch_size=58960) Normalization: 100.00 percent done. Genotyping: 000.00 percent done.Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: String or BLOB exceeded size limit) > sessionInfo() R version 2.6.1 (2007-11-26) x86_64-unknown-linux-gnu locale: C attached base packages: [1] splines tools stats graphics grDevices utils datasets [8] methods base other attached packages: [1] pd.mapping50k.xba240_0.4.1 oligo_1.3.7 [3] oligoClasses_1.1.10 affxparser_1.10.1 [5] AnnotationDbi_1.0.6 preprocessCore_1.0.0 [7] RSQLite_0.6-4 DBI_0.2-4 [9] Biobase_1.16.1 loaded via a namespace (and not attached): [1] rcompgen_0.1-17 Best regards Christian ============================================== Christian Stratowa, PhD Boehringer Ingelheim Austria Dept NCE Lead Discovery - Bioinformatics Dr. Boehringergasse 5-11 A-1121 Vienna, Austria Tel.: ++43-1-80105-2470 Fax: ++43-1-80105-2782 email: christian.stratowa at vie.boehringer-ingelheim.com -----Original Message----- From: Vincent Carey 525-2265 [mailto:stvjc@channing.harvard.edu] Sent: Monday, February 04, 2008 6:33 PM To: Stratowa,Dr.,Christian FEX BIG-AT-V Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Help needed with CopyNumber analysis for Affymetrix 250K arrays > > However, the following results in an error: > > snprma250 <- justSNPRMA(cels250, phenoData=pheno250) crlmmOut250 <- > > crlmm(snprma250, correctionFile="outputEM.rda") > see: > https://stat.ethz.ch/pipermail/bioconductor/attachments/20080128/50495 > 06c/att > achment.pl Can you put a sessionInfo() for the problem with this current computation? The pd.mapping package versions are up to 0.4.1 in devel, and i think you need to use them. The information transmitted in this electronic communica...{{dropped:14}}

ADD REPLY • link 18.1 years ago Christian.Stratowa@vie.boehringer-ingel… ▴ 270

0

Entering edit mode

Henrik Bengtsson ★ 2.4k

@henrik-bengtsson-4333

Last seen 22 months ago

United States

Hi Christian, On Feb 4, 2008 1:45 AM, <christian.stratowa at="" vie.boehringer-ingelheim.com=""> wrote: > Dear all, > > Until now I have done all CopyNumber (and LOH) analysis using Affymetrix > CNAT4. > However, I would prefer to use Bioconductor for this purpose, thus I have a > couple of questions: > > > 1, Normalization and summarization of mapping array 50K and 250K CEL-files: > > Currently, there seem to be only two packages available, which are able to > read mapping array CEL-files, namely: > package "oligo" and packages "PLASQ" and "PLASQ500K", respectively. > > Using package "oligo" I can do: > > library(oligo) > > snprma250 <- justSNPRMA(cels250, phenoData=pheno250) > Then I get the normalized intensities: > > asTA250 <- antisenseThetaA(snprma250) > > asTB250 <- antisenseThetaB(snprma250) > > sTA250 <- senseThetaA(snprma250) > > sTB250 <- senseThetaB(snprma250) > > Using package "PLASQ500K" I can do: > > library(PLASQ500K) > > ref <- celExtNorm("SND", "Sty") > > sam <- celExtract("STD", "Sty") > I get a matrix of normalized probe intensities for reference (ref) and > samples (sam). > > Are there other packages available which can use mapping array CEL- files? The aroma.affymetrix package [http://www.braju.com/R/aroma.affymetrix/] works of CEL (and CDF) files. See URL for examples and documentation. > > > 2, Genotyping: > > Package "oligo" can be used for genotyping: > > crlmmOut250 <- justCRLMM(cels250, phenoData=pheno250) > > genocall250 <- calls(crlmmOut250) > > genoconf250 <- callsConfidence(crlmmOut250) > > However, the following results in an error: > > snprma250 <- justSNPRMA(cels250, phenoData=pheno250) > > crlmmOut250 <- crlmm(snprma250, correctionFile="outputEM.rda") > see: > https://stat.ethz.ch/pipermail/bioconductor/attachments/20080128/504 9506c/att > achment.pl > > Package "PLASQ500K" could also be used for genotyping: > > geno <- EMSNP(???) > Although I did not try it, this function seems to have a huge memory problem, > see below. > > > 3, CopyNumber analysis: > > Although there seem to be some packages which could use the output from the > Affymetrix CNAT4 results, it seems that there is currently no package able to > do copynumber analysis for Affymetrix mapping arrays. Is this correct? The aroma.affymetrix package can estimate paired & non-paired total raw copy numbers, cf. H. Bengtsson; R. Irizarry; B. Carvalho; T.P. Speed, Estimation and assessment of raw copy numbers at the single locus level, Bioinformatics, 2008. [pmid: 18204055] The package also implement other methods for estimating raw CNs. Currently GLAD and CBS are the supported segmentation methods, and it is not that hard to add wrappers for other segmentation methods. The aroma.affymetrix package does not do genotyping; at one stage there was a wrapper to call CRLMM in 'oligo' but due to lack of time it has become obsolete. It's on the (way to long) to do list to fix that. Best, Henrik > > 3a, CNRLMM: > In a Johns Hopkins Tech Report, Paper 122, 2006, Wang, Caravalho et al > describe > a new copynumber algorithm, which they want to make available at > Bioconductor. > Does anybody know when the CNRLMM algorithm will be available? > > 3b, PLASQ500K > I tried to compute parent-specific copy number using PLASQ500K: > > library(PLASQ500K) > > psCN <- > pscn(StyFolder="STD",normStyFolder="SND",betasSty=NULL,quantSty=NULL ,betasSty > File="betasSty.Rdata",rawCNStyfile="rawCNSty.Rdata") > > Using only 18 250K Sty CEL-files it was impossible to finish this > calculation. > On a 32GB RAM Linux server the job got killed, since function EMSNP() which > is > called from function getBetas() used up all RAM. Starting the computation on > our 64GB RAM Linux server, function EMSNP() could be executed, nevertheless, > we had to kill the job, when it reached memory consumption of 74GB!!! at a > later stage! > > > 3c, Compute raw copy numbers for unpaired copynumber analysis: > Using the results from justSNPRMA() I tried to compute the copynumbers > in the following way: > > # Reference files > snprma250ref <- justSNPRMA(cels250ref, phenoData=pheno250ref) > > # Sample files > snprma250sam <- justSNPRMA(cels250sam, phenoData=pheno250sam) > > ## separate allels combined as in CNAT4, see cnat_4_algorithm_whitepaper.pdf, > page 9: > # TCN(sumLog) = log2(SamA/RefA) + log2(SamB/RefB) > > # Reference for allele A: > # allele A as array > ref250A <- array(NA, > dim=c(nrow(antisenseThetaA(snprma250ref)),ncol(antisenseThetaA(snprm a250ref)) > , 2), > > dimnames=list(rownames(antisenseThetaA(snprma250ref)),colnames(antis enseTheta > A(snprma250ref)),c("antisense","sense"))) > ref250A[,,1] <- antisenseThetaA(snprma250ref) > ref250A[,,2] <- senseThetaA(snprma250ref) > > # Reference A: rowMeans over sense and antisense strand > refA <- sapply(1:dim(ref250A)[2],function(x)rowMeans(ref250A[,x,],na.rm=T)) > colnames(refA) <- colnames(ref250A) > > # Reference for allele B: > # allele B as array > ref250B <- array(NA, > dim=c(nrow(antisenseThetaB(snprma250ref)),ncol(antisenseThetaB(snprm a250ref)) > , 2), > > dimnames=list(rownames(antisenseThetaB(snprma250ref)),colnames(antis enseTheta > B(snprma250ref)),c("antisense","sense"))) > ref250B[,,1] <- antisenseThetaB(snprma250ref) > ref250B[,,2] <- senseThetaB(snprma250ref) > > # Reference B: rowMeans over sense and antisense strand > refB <- sapply(1:dim(ref250B)[2],function(x)rowMeans(ref250B[,x,],na.rm=T)) > colnames(refB) <- colnames(ref250B) > > # Sample for allele A: > # allele A as array > sam250A <- array(NA, > dim=c(nrow(antisenseThetaA(snprma250sam)),ncol(antisenseThetaA(snprm a250sam)) > , 2), > > dimnames=list(rownames(antisenseThetaA(snprma250sam)),colnames(antis enseTheta > A(snprma250sam)),c("antisense","sense"))) > sam250A[,,1] <- antisenseThetaA(snprma250sam) > sam250A[,,2] <- senseThetaA(snprma250sam) > > # Sample A: rowMeans over sense and antisense strand > samA <- sapply(1:dim(sam250A)[2],function(x)rowMeans(sam250A[,x,],na.rm=T)) > colnames(samA) <- colnames(sam250A) > > # Sample for allele B: > # allele B as array > sam250B <- array(NA, > dim=c(nrow(antisenseThetaB(snprma250sam)),ncol(antisenseThetaB(snprm a250sam)) > , 2), > > dimnames=list(rownames(antisenseThetaB(snprma250sam)),colnames(antis enseTheta > B(snprma250sam)),c("antisense","sense"))) > sam250B[,,1] <- antisenseThetaB(snprma250sam) > sam250B[,,2] <- senseThetaB(snprma250sam) > > # Sample B: rowMeans over sense and antisense strand > samB <- sapply(1:dim(sam250B)[2],function(x)rowMeans(sam250B[,x,],na.rm=T)) > colnames(samB) <- colnames(sam250B) > > # Total CopyNumber TCN(sumLog), see cnat_4_algorithm_whitepaper.pdf, page 9 > TCN.sL <- (samA - rowMeans(refA)) + (samB - rowMeans(refB)) > > # real copy number is: cn = 2^(2^cn) ?? (or 2^(cn+1) ??) > cn.sL <- 2^(2^TCN.sL) > headcn.sL) > # CEU_NA06993_STY.CEL CEU_NA06994_STY.CEL CEU_NA07022_STY.CEL > #SNP_A-1780271 1.801377 3.034645 2.314986 > #SNP_A-1780274 2.017805 2.494345 2.370112 > #SNP_A-1780277 1.558268 2.446690 2.983195 > #SNP_A-1780278 1.879762 1.859002 1.697422 > #SNP_A-1780283 2.064631 1.639300 1.912674 > #SNP_A-1780290 2.142572 2.738094 2.029215 > > # or alternatively: cn = 2^cnA + 2^cnB ?? > cn <- 2^(samA - rowMeans(refA)) + 2^(samB - rowMeans(refB)) > head(cn) > # CEU_NA06993_STY.CEL CEU_NA06994_STY.CEL CEU_NA07022_STY.CEL > #SNP_A-1780271 1.859447 2.786287 2.369363 > #SNP_A-1780274 2.160573 2.315243 2.271201 > #SNP_A-1780277 3.203198 2.453341 2.773667 > #SNP_A-1780278 1.908932 1.990323 1.748716 > #SNP_A-1780283 2.046767 1.691257 1.937375 > #SNP_A-1780290 2.098621 2.416547 2.020832 > > - Is this computation correct? > > - Is this way to compute the copynumbers a valuable option? > > - Are there any alternatives to compute the copynumbers using R packages? > > > Thank you in advance > Best regards > Christian > > ============================================== > Christian Stratowa, PhD > Boehringer Ingelheim Austria > Dept NCE Lead Discovery - Bioinformatics > Dr. Boehringergasse 5-11 > A-1121 Vienna, Austria > Tel.: ++43-1-80105-2470 > Fax: ++43-1-80105-2782 > email: christian.stratowa at vie.boehringer-ingelheim.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 18.1 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

Dear Henrik I have just downloaded and installed the aroma.affymetrix package, however, it seems that the package enforces a certain naming convention and directory structure, so I need some time to test it. Best regards Christian ============================================== Christian Stratowa, PhD Boehringer Ingelheim Austria Dept NCE Lead Discovery - Bioinformatics Dr. Boehringergasse 5-11 A-1121 Vienna, Austria Tel.: ++43-1-80105-2470 Fax: ++43-1-80105-2782 email: christian.stratowa at vie.boehringer-ingelheim.com -----Original Message----- From: henrik.bengtsson@gmail.com [mailto:henrik.bengtsson@gmail.com] On Behalf Of Henrik Bengtsson Sent: Tuesday, February 05, 2008 12:29 AM To: Stratowa,Dr.,Christian FEX BIG-AT-V Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Help needed with CopyNumber analysis for Affymetrix 250K arrays Hi Christian, On Feb 4, 2008 1:45 AM, <christian.stratowa at="" vie.boehringer-="" ingelheim.com=""> wrote: > Dear all, > > Until now I have done all CopyNumber (and LOH) analysis using > Affymetrix CNAT4. However, I would prefer to use Bioconductor for this > purpose, thus I have a couple of questions: > > > 1, Normalization and summarization of mapping array 50K and 250K > CEL-files: > > Currently, there seem to be only two packages available, which are > able to read mapping array CEL-files, namely: package "oligo" and > packages "PLASQ" and "PLASQ500K", respectively. > > Using package "oligo" I can do: > > library(oligo) > > snprma250 <- justSNPRMA(cels250, phenoData=pheno250) > Then I get the normalized intensities: > > asTA250 <- antisenseThetaA(snprma250) > > asTB250 <- antisenseThetaB(snprma250) > > sTA250 <- senseThetaA(snprma250) > > sTB250 <- senseThetaB(snprma250) > > Using package "PLASQ500K" I can do: > > library(PLASQ500K) > > ref <- celExtNorm("SND", "Sty") > > sam <- celExtract("STD", "Sty") > I get a matrix of normalized probe intensities for reference (ref) and > samples (sam). > > Are there other packages available which can use mapping array > CEL-files? The aroma.affymetrix package [http://www.braju.com/R/aroma.affymetrix/] works of CEL (and CDF) files. See URL for examples and documentation. > > > 2, Genotyping: > > Package "oligo" can be used for genotyping: > > crlmmOut250 <- justCRLMM(cels250, phenoData=pheno250) genocall250 <- > > calls(crlmmOut250) genoconf250 <- callsConfidence(crlmmOut250) > > However, the following results in an error: > > snprma250 <- justSNPRMA(cels250, phenoData=pheno250) crlmmOut250 <- > > crlmm(snprma250, correctionFile="outputEM.rda") > see: > https://stat.ethz.ch/pipermail/bioconductor/attachments/20080128/50495 > 06c/att > achment.pl > > Package "PLASQ500K" could also be used for genotyping: > > geno <- EMSNP(???) > Although I did not try it, this function seems to have a huge memory > problem, see below. > > > 3, CopyNumber analysis: > > Although there seem to be some packages which could use the output > from the Affymetrix CNAT4 results, it seems that there is currently no > package able to do copynumber analysis for Affymetrix mapping arrays. > Is this correct? The aroma.affymetrix package can estimate paired & non-paired total raw copy numbers, cf. H. Bengtsson; R. Irizarry; B. Carvalho; T.P. Speed, Estimation and assessment of raw copy numbers at the single locus level, Bioinformatics, 2008. [pmid: 18204055] The package also implement other methods for estimating raw CNs. Currently GLAD and CBS are the supported segmentation methods, and it is not that hard to add wrappers for other segmentation methods. The aroma.affymetrix package does not do genotyping; at one stage there was a wrapper to call CRLMM in 'oligo' but due to lack of time it has become obsolete. It's on the (way to long) to do list to fix that. Best, Henrik > > 3a, CNRLMM: > In a Johns Hopkins Tech Report, Paper 122, 2006, Wang, Caravalho et al > describe a new copynumber algorithm, which they want to make available > at Bioconductor. > Does anybody know when the CNRLMM algorithm will be available? > > 3b, PLASQ500K > I tried to compute parent-specific copy number using PLASQ500K: > > library(PLASQ500K) > > psCN <- > pscn(StyFolder="STD",normStyFolder="SND",betasSty=NULL,quantSty=NULL,b > etasSty > File="betasSty.Rdata",rawCNStyfile="rawCNSty.Rdata") > > Using only 18 250K Sty CEL-files it was impossible to finish this > calculation. On a 32GB RAM Linux server the job got killed, since > function EMSNP() which is > called from function getBetas() used up all RAM. Starting the computation on > our 64GB RAM Linux server, function EMSNP() could be executed, nevertheless, > we had to kill the job, when it reached memory consumption of 74GB!!! at a > later stage! > > > 3c, Compute raw copy numbers for unpaired copynumber analysis: Using > the results from justSNPRMA() I tried to compute the copynumbers in > the following way: > > # Reference files > snprma250ref <- justSNPRMA(cels250ref, phenoData=pheno250ref) > > # Sample files > snprma250sam <- justSNPRMA(cels250sam, phenoData=pheno250sam) > > ## separate allels combined as in CNAT4, see > cnat_4_algorithm_whitepaper.pdf, page 9: # TCN(sumLog) = > log2(SamA/RefA) + log2(SamB/RefB) > > # Reference for allele A: > # allele A as array > ref250A <- array(NA, > dim=c(nrow(antisenseThetaA(snprma250ref)),ncol(antisenseThetaA(snprma2 > 50ref)) > , 2), > > dimnames=list(rownames(antisenseThetaA(snprma250ref)),colnames(antisen > seTheta > A(snprma250ref)),c("antisense","sense"))) > ref250A[,,1] <- antisenseThetaA(snprma250ref) > ref250A[,,2] <- senseThetaA(snprma250ref) > > # Reference A: rowMeans over sense and antisense strand > refA <- > sapply(1:dim(ref250A)[2],function(x)rowMeans(ref250A[,x,],na.rm=T)) > colnames(refA) <- colnames(ref250A) > > # Reference for allele B: > # allele B as array > ref250B <- array(NA, > dim=c(nrow(antisenseThetaB(snprma250ref)),ncol(antisenseThetaB(snprma2 > 50ref)) > , 2), > > dimnames=list(rownames(antisenseThetaB(snprma250ref)),colnames(antisen > seTheta > B(snprma250ref)),c("antisense","sense"))) > ref250B[,,1] <- antisenseThetaB(snprma250ref) > ref250B[,,2] <- senseThetaB(snprma250ref) > > # Reference B: rowMeans over sense and antisense strand > refB <- > sapply(1:dim(ref250B)[2],function(x)rowMeans(ref250B[,x,],na.rm=T)) > colnames(refB) <- colnames(ref250B) > > # Sample for allele A: > # allele A as array > sam250A <- array(NA, > dim=c(nrow(antisenseThetaA(snprma250sam)),ncol(antisenseThetaA(snprma2 > 50sam)) > , 2), > > dimnames=list(rownames(antisenseThetaA(snprma250sam)),colnames(antisen > seTheta > A(snprma250sam)),c("antisense","sense"))) > sam250A[,,1] <- antisenseThetaA(snprma250sam) > sam250A[,,2] <- senseThetaA(snprma250sam) > > # Sample A: rowMeans over sense and antisense strand > samA <- > sapply(1:dim(sam250A)[2],function(x)rowMeans(sam250A[,x,],na.rm=T)) > colnames(samA) <- colnames(sam250A) > > # Sample for allele B: > # allele B as array > sam250B <- array(NA, > dim=c(nrow(antisenseThetaB(snprma250sam)),ncol(antisenseThetaB(snprma2 > 50sam)) > , 2), > > dimnames=list(rownames(antisenseThetaB(snprma250sam)),colnames(antisen > seTheta > B(snprma250sam)),c("antisense","sense"))) > sam250B[,,1] <- antisenseThetaB(snprma250sam) > sam250B[,,2] <- senseThetaB(snprma250sam) > > # Sample B: rowMeans over sense and antisense strand > samB <- > sapply(1:dim(sam250B)[2],function(x)rowMeans(sam250B[,x,],na.rm=T)) > colnames(samB) <- colnames(sam250B) > > # Total CopyNumber TCN(sumLog), see cnat_4_algorithm_whitepaper.pdf, > page 9 TCN.sL <- (samA - rowMeans(refA)) + (samB - rowMeans(refB)) > > # real copy number is: cn = 2^(2^cn) ?? (or 2^(cn+1) ??) cn.sL <- > 2^(2^TCN.sL) > headcn.sL) > # CEU_NA06993_STY.CEL CEU_NA06994_STY.CEL CEU_NA07022_STY.CEL > #SNP_A-1780271 1.801377 3.034645 2.314986 > #SNP_A-1780274 2.017805 2.494345 2.370112 > #SNP_A-1780277 1.558268 2.446690 2.983195 > #SNP_A-1780278 1.879762 1.859002 1.697422 > #SNP_A-1780283 2.064631 1.639300 1.912674 > #SNP_A-1780290 2.142572 2.738094 2.029215 > > # or alternatively: cn = 2^cnA + 2^cnB ?? > cn <- 2^(samA - rowMeans(refA)) + 2^(samB - rowMeans(refB)) > head(cn) > # CEU_NA06993_STY.CEL CEU_NA06994_STY.CEL CEU_NA07022_STY.CEL > #SNP_A-1780271 1.859447 2.786287 2.369363 > #SNP_A-1780274 2.160573 2.315243 2.271201 > #SNP_A-1780277 3.203198 2.453341 2.773667 > #SNP_A-1780278 1.908932 1.990323 1.748716 > #SNP_A-1780283 2.046767 1.691257 1.937375 > #SNP_A-1780290 2.098621 2.416547 2.020832 > > - Is this computation correct? > > - Is this way to compute the copynumbers a valuable option? > > - Are there any alternatives to compute the copynumbers using R > packages? > > > Thank you in advance > Best regards > Christian > > ============================================== > Christian Stratowa, PhD > Boehringer Ingelheim Austria > Dept NCE Lead Discovery - Bioinformatics > Dr. Boehringergasse 5-11 > A-1121 Vienna, Austria > Tel.: ++43-1-80105-2470 > Fax: ++43-1-80105-2782 > email: christian.stratowa at vie.boehringer-ingelheim.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 18.1 years ago Christian.Stratowa@vie.boehringer-ingel… ▴ 270

0

Entering edit mode

On Feb 5, 2008 6:50 AM, <christian.stratowa at="" vie.boehringer-ingelheim.com=""> wrote: > Dear Henrik > > I have just downloaded and installed the aroma.affymetrix package, however, > it seems that the package enforces a certain naming convention and directory > structure, so I need some time to test it. Yes. That directory structure is there in order to help you - it was designed so that the package can detect common user mistakes but also make it easier for the user to see what analyses have been done from looking at the directory names. It's very straightforward once you get started. /Henrik > > Best regards > Christian > > ============================================== > Christian Stratowa, PhD > Boehringer Ingelheim Austria > Dept NCE Lead Discovery - Bioinformatics > Dr. Boehringergasse 5-11 > A-1121 Vienna, Austria > Tel.: ++43-1-80105-2470 > Fax: ++43-1-80105-2782 > email: christian.stratowa at vie.boehringer-ingelheim.com > > > > -----Original Message----- > From: henrik.bengtsson at gmail.com [mailto:henrik.bengtsson at gmail.com] On > Behalf Of Henrik Bengtsson > Sent: Tuesday, February 05, 2008 12:29 AM > To: Stratowa,Dr.,Christian FEX BIG-AT-V > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Help needed with CopyNumber analysis for Affymetrix 250K > arrays > > > > Hi Christian, > > On Feb 4, 2008 1:45 AM, <christian.stratowa at="" vie.boehringer-="" ingelheim.com=""> > wrote: > > Dear all, > > > > Until now I have done all CopyNumber (and LOH) analysis using > > Affymetrix CNAT4. However, I would prefer to use Bioconductor for this > > purpose, thus I have a couple of questions: > > > > > > 1, Normalization and summarization of mapping array 50K and 250K > > CEL-files: > > > > Currently, there seem to be only two packages available, which are > > able to read mapping array CEL-files, namely: package "oligo" and > > packages "PLASQ" and "PLASQ500K", respectively. > > > > Using package "oligo" I can do: > > > library(oligo) > > > snprma250 <- justSNPRMA(cels250, phenoData=pheno250) > > Then I get the normalized intensities: > > > asTA250 <- antisenseThetaA(snprma250) > > > asTB250 <- antisenseThetaB(snprma250) > > > sTA250 <- senseThetaA(snprma250) > > > sTB250 <- senseThetaB(snprma250) > > > > Using package "PLASQ500K" I can do: > > > library(PLASQ500K) > > > ref <- celExtNorm("SND", "Sty") > > > sam <- celExtract("STD", "Sty") > > I get a matrix of normalized probe intensities for reference (ref) and > > samples (sam). > > > > Are there other packages available which can use mapping array > > CEL-files? > > The aroma.affymetrix package [http://www.braju.com/R/aroma.affymetrix/] works > of CEL (and CDF) files. See URL for examples and documentation. > > > > > > > 2, Genotyping: > > > > Package "oligo" can be used for genotyping: > > > crlmmOut250 <- justCRLMM(cels250, phenoData=pheno250) genocall250 <- > > > calls(crlmmOut250) genoconf250 <- callsConfidence(crlmmOut250) > > > > However, the following results in an error: > > > snprma250 <- justSNPRMA(cels250, phenoData=pheno250) crlmmOut250 <- > > > crlmm(snprma250, correctionFile="outputEM.rda") > > see: > > https://stat.ethz.ch/pipermail/bioconductor/attachments/20080128/50495 > > 06c/att > > achment.pl > > > > Package "PLASQ500K" could also be used for genotyping: > > > geno <- EMSNP(???) > > Although I did not try it, this function seems to have a huge memory > > problem, see below. > > > > > > 3, CopyNumber analysis: > > > > Although there seem to be some packages which could use the output > > from the Affymetrix CNAT4 results, it seems that there is currently no > > package able to do copynumber analysis for Affymetrix mapping arrays. > > Is this correct? > > The aroma.affymetrix package can estimate paired & non-paired total raw copy > numbers, cf. > > H. Bengtsson; R. Irizarry; B. Carvalho; T.P. Speed, Estimation and assessment > of raw copy numbers at the single locus level, Bioinformatics, 2008. [pmid: > 18204055] > > The package also implement other methods for estimating raw CNs. Currently > GLAD and CBS are the supported segmentation methods, and it is not that hard > to add wrappers for other segmentation methods. > > The aroma.affymetrix package does not do genotyping; at one stage there was a > wrapper to call CRLMM in 'oligo' but due to lack of time it has become > obsolete. It's on the (way to long) to do list to fix that. > > Best, > > Henrik > > > > > 3a, CNRLMM: > > In a Johns Hopkins Tech Report, Paper 122, 2006, Wang, Caravalho et al > > describe a new copynumber algorithm, which they want to make available > > at Bioconductor. > > Does anybody know when the CNRLMM algorithm will be available? > > > > 3b, PLASQ500K > > I tried to compute parent-specific copy number using PLASQ500K: > > > library(PLASQ500K) > > > psCN <- > > pscn(StyFolder="STD",normStyFolder="SND",betasSty=NULL,quantSty=NULL,b > > etasSty > > File="betasSty.Rdata",rawCNStyfile="rawCNSty.Rdata") > > > > Using only 18 250K Sty CEL-files it was impossible to finish this > > calculation. On a 32GB RAM Linux server the job got killed, since > > function EMSNP() which is > > called from function getBetas() used up all RAM. Starting the computation > on > > our 64GB RAM Linux server, function EMSNP() could be executed, > nevertheless, > > we had to kill the job, when it reached memory consumption of 74GB!!! at a > > later stage! > > > > > > 3c, Compute raw copy numbers for unpaired copynumber analysis: Using > > the results from justSNPRMA() I tried to compute the copynumbers in > > the following way: > > > > # Reference files > > snprma250ref <- justSNPRMA(cels250ref, phenoData=pheno250ref) > > > > # Sample files > > snprma250sam <- justSNPRMA(cels250sam, phenoData=pheno250sam) > > > > ## separate allels combined as in CNAT4, see > > cnat_4_algorithm_whitepaper.pdf, page 9: # TCN(sumLog) = > > log2(SamA/RefA) + log2(SamB/RefB) > > > > # Reference for allele A: > > # allele A as array > > ref250A <- array(NA, > > dim=c(nrow(antisenseThetaA(snprma250ref)),ncol(antisenseThetaA(snprma2 > > 50ref)) > > , 2), > > > > dimnames=list(rownames(antisenseThetaA(snprma250ref)),colnames(antisen > > seTheta > > A(snprma250ref)),c("antisense","sense"))) > > ref250A[,,1] <- antisenseThetaA(snprma250ref) > > ref250A[,,2] <- senseThetaA(snprma250ref) > > > > # Reference A: rowMeans over sense and antisense strand > > refA <- > > sapply(1:dim(ref250A)[2],function(x)rowMeans(ref250A[,x,],na.rm=T)) > > colnames(refA) <- colnames(ref250A) > > > > # Reference for allele B: > > # allele B as array > > ref250B <- array(NA, > > dim=c(nrow(antisenseThetaB(snprma250ref)),ncol(antisenseThetaB(snprma2 > > 50ref)) > > , 2), > > > > dimnames=list(rownames(antisenseThetaB(snprma250ref)),colnames(antisen > > seTheta > > B(snprma250ref)),c("antisense","sense"))) > > ref250B[,,1] <- antisenseThetaB(snprma250ref) > > ref250B[,,2] <- senseThetaB(snprma250ref) > > > > # Reference B: rowMeans over sense and antisense strand > > refB <- > > sapply(1:dim(ref250B)[2],function(x)rowMeans(ref250B[,x,],na.rm=T)) > > colnames(refB) <- colnames(ref250B) > > > > # Sample for allele A: > > # allele A as array > > sam250A <- array(NA, > > dim=c(nrow(antisenseThetaA(snprma250sam)),ncol(antisenseThetaA(snprma2 > > 50sam)) > > , 2), > > > > dimnames=list(rownames(antisenseThetaA(snprma250sam)),colnames(antisen > > seTheta > > A(snprma250sam)),c("antisense","sense"))) > > sam250A[,,1] <- antisenseThetaA(snprma250sam) > > sam250A[,,2] <- senseThetaA(snprma250sam) > > > > # Sample A: rowMeans over sense and antisense strand > > samA <- > > sapply(1:dim(sam250A)[2],function(x)rowMeans(sam250A[,x,],na.rm=T)) > > colnames(samA) <- colnames(sam250A) > > > > # Sample for allele B: > > # allele B as array > > sam250B <- array(NA, > > dim=c(nrow(antisenseThetaB(snprma250sam)),ncol(antisenseThetaB(snprma2 > > 50sam)) > > , 2), > > > > dimnames=list(rownames(antisenseThetaB(snprma250sam)),colnames(antisen > > seTheta > > B(snprma250sam)),c("antisense","sense"))) > > sam250B[,,1] <- antisenseThetaB(snprma250sam) > > sam250B[,,2] <- senseThetaB(snprma250sam) > > > > # Sample B: rowMeans over sense and antisense strand > > samB <- > > sapply(1:dim(sam250B)[2],function(x)rowMeans(sam250B[,x,],na.rm=T)) > > colnames(samB) <- colnames(sam250B) > > > > # Total CopyNumber TCN(sumLog), see cnat_4_algorithm_whitepaper.pdf, > > page 9 TCN.sL <- (samA - rowMeans(refA)) + (samB - rowMeans(refB)) > > > > # real copy number is: cn = 2^(2^cn) ?? (or 2^(cn+1) ??) cn.sL <- > > 2^(2^TCN.sL) > > headcn.sL) > > # CEU_NA06993_STY.CEL CEU_NA06994_STY.CEL CEU_NA07022_STY.CEL > > #SNP_A-1780271 1.801377 3.034645 2.314986 > > #SNP_A-1780274 2.017805 2.494345 2.370112 > > #SNP_A-1780277 1.558268 2.446690 2.983195 > > #SNP_A-1780278 1.879762 1.859002 1.697422 > > #SNP_A-1780283 2.064631 1.639300 1.912674 > > #SNP_A-1780290 2.142572 2.738094 2.029215 > > > > # or alternatively: cn = 2^cnA + 2^cnB ?? > > cn <- 2^(samA - rowMeans(refA)) + 2^(samB - rowMeans(refB)) > > head(cn) > > # CEU_NA06993_STY.CEL CEU_NA06994_STY.CEL CEU_NA07022_STY.CEL > > #SNP_A-1780271 1.859447 2.786287 2.369363 > > #SNP_A-1780274 2.160573 2.315243 2.271201 > > #SNP_A-1780277 3.203198 2.453341 2.773667 > > #SNP_A-1780278 1.908932 1.990323 1.748716 > > #SNP_A-1780283 2.046767 1.691257 1.937375 > > #SNP_A-1780290 2.098621 2.416547 2.020832 > > > > - Is this computation correct? > > > > - Is this way to compute the copynumbers a valuable option? > > > > - Are there any alternatives to compute the copynumbers using R > > packages? > > > > > > Thank you in advance > > Best regards > > Christian > > > > ============================================== > > Christian Stratowa, PhD > > Boehringer Ingelheim Austria > > Dept NCE Lead Discovery - Bioinformatics > > Dr. Boehringergasse 5-11 > > A-1121 Vienna, Austria > > Tel.: ++43-1-80105-2470 > > Fax: ++43-1-80105-2782 > > email: christian.stratowa at vie.boehringer-ingelheim.com > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > >

ADD REPLY • link 18.1 years ago Henrik Bengtsson ★ 2.4k

Login before adding your answer.