Entering edit mode
Christian.Stratowa@vie.boehringer-ingel…
▴
270
@christianstratowavieboehringer-ingelheimcom-545
Last seen 10.3 years ago
Dear all,
Until now I have done all CopyNumber (and LOH) analysis using
Affymetrix
CNAT4.
However, I would prefer to use Bioconductor for this purpose, thus I
have a
couple of questions:
1, Normalization and summarization of mapping array 50K and 250K CEL-
files:
Currently, there seem to be only two packages available, which are
able to
read mapping array CEL-files, namely:
package "oligo" and packages "PLASQ" and "PLASQ500K", respectively.
Using package "oligo" I can do:
> library(oligo)
> snprma250 <- justSNPRMA(cels250, phenoData=pheno250)
Then I get the normalized intensities:
> asTA250 <- antisenseThetaA(snprma250)
> asTB250 <- antisenseThetaB(snprma250)
> sTA250 <- senseThetaA(snprma250)
> sTB250 <- senseThetaB(snprma250)
Using package "PLASQ500K" I can do:
> library(PLASQ500K)
> ref <- celExtNorm("SND", "Sty")
> sam <- celExtract("STD", "Sty")
I get a matrix of normalized probe intensities for reference (ref) and
samples (sam).
Are there other packages available which can use mapping array CEL-
files?
2, Genotyping:
Package "oligo" can be used for genotyping:
> crlmmOut250 <- justCRLMM(cels250, phenoData=pheno250)
> genocall250 <- calls(crlmmOut250)
> genoconf250 <- callsConfidence(crlmmOut250)
However, the following results in an error:
> snprma250 <- justSNPRMA(cels250, phenoData=pheno250)
> crlmmOut250 <- crlmm(snprma250, correctionFile="outputEM.rda")
see:
https://stat.ethz.ch/pipermail/bioconductor/attachments/20080128/50495
06c/att
achment.pl
Package "PLASQ500K" could also be used for genotyping:
> geno <- EMSNP(???)
Although I did not try it, this function seems to have a huge memory
problem,
see below.
3, CopyNumber analysis:
Although there seem to be some packages which could use the output
from the
Affymetrix CNAT4 results, it seems that there is currently no package
able to
do copynumber analysis for Affymetrix mapping arrays. Is this correct?
3a, CNRLMM:
In a Johns Hopkins Tech Report, Paper 122, 2006, Wang, Caravalho et al
describe
a new copynumber algorithm, which they want to make available at
Bioconductor.
Does anybody know when the CNRLMM algorithm will be available?
3b, PLASQ500K
I tried to compute parent-specific copy number using PLASQ500K:
> library(PLASQ500K)
> psCN <-
pscn(StyFolder="STD",normStyFolder="SND",betasSty=NULL,quantSty=NULL,b
etasSty
File="betasSty.Rdata",rawCNStyfile="rawCNSty.Rdata")
Using only 18 250K Sty CEL-files it was impossible to finish this
calculation.
On a 32GB RAM Linux server the job got killed, since function EMSNP()
which
is
called from function getBetas() used up all RAM. Starting the
computation on
our 64GB RAM Linux server, function EMSNP() could be executed,
nevertheless,
we had to kill the job, when it reached memory consumption of 74GB!!!
at a
later stage!
3c, Compute raw copy numbers for unpaired copynumber analysis:
Using the results from justSNPRMA() I tried to compute the copynumbers
in the following way:
# Reference files
snprma250ref <- justSNPRMA(cels250ref, phenoData=pheno250ref)
# Sample files
snprma250sam <- justSNPRMA(cels250sam, phenoData=pheno250sam)
## separate allels combined as in CNAT4, see
cnat_4_algorithm_whitepaper.pdf,
page 9:
# TCN(sumLog) = log2(SamA/RefA) + log2(SamB/RefB)
# Reference for allele A:
# allele A as array
ref250A <- array(NA,
dim=c(nrow(antisenseThetaA(snprma250ref)),ncol(antisenseThetaA(snprma2
50ref))
, 2),
dimnames=list(rownames(antisenseThetaA(snprma250ref)),colnames(antisen
seTheta
A(snprma250ref)),c("antisense","sense")))
ref250A[,,1] <- antisenseThetaA(snprma250ref)
ref250A[,,2] <- senseThetaA(snprma250ref)
# Reference A: rowMeans over sense and antisense strand
refA <-
sapply(1:dim(ref250A)[2],function(x)rowMeans(ref250A[,x,],na.rm=T))
colnames(refA) <- colnames(ref250A)
# Reference for allele B:
# allele B as array
ref250B <- array(NA,
dim=c(nrow(antisenseThetaB(snprma250ref)),ncol(antisenseThetaB(snprma2
50ref))
, 2),
dimnames=list(rownames(antisenseThetaB(snprma250ref)),colnames(antisen
seTheta
B(snprma250ref)),c("antisense","sense")))
ref250B[,,1] <- antisenseThetaB(snprma250ref)
ref250B[,,2] <- senseThetaB(snprma250ref)
# Reference B: rowMeans over sense and antisense strand
refB <-
sapply(1:dim(ref250B)[2],function(x)rowMeans(ref250B[,x,],na.rm=T))
colnames(refB) <- colnames(ref250B)
# Sample for allele A:
# allele A as array
sam250A <- array(NA,
dim=c(nrow(antisenseThetaA(snprma250sam)),ncol(antisenseThetaA(snprma2
50sam))
, 2),
dimnames=list(rownames(antisenseThetaA(snprma250sam)),colnames(antisen
seTheta
A(snprma250sam)),c("antisense","sense")))
sam250A[,,1] <- antisenseThetaA(snprma250sam)
sam250A[,,2] <- senseThetaA(snprma250sam)
# Sample A: rowMeans over sense and antisense strand
samA <-
sapply(1:dim(sam250A)[2],function(x)rowMeans(sam250A[,x,],na.rm=T))
colnames(samA) <- colnames(sam250A)
# Sample for allele B:
# allele B as array
sam250B <- array(NA,
dim=c(nrow(antisenseThetaB(snprma250sam)),ncol(antisenseThetaB(snprma2
50sam))
, 2),
dimnames=list(rownames(antisenseThetaB(snprma250sam)),colnames(antisen
seTheta
B(snprma250sam)),c("antisense","sense")))
sam250B[,,1] <- antisenseThetaB(snprma250sam)
sam250B[,,2] <- senseThetaB(snprma250sam)
# Sample B: rowMeans over sense and antisense strand
samB <-
sapply(1:dim(sam250B)[2],function(x)rowMeans(sam250B[,x,],na.rm=T))
colnames(samB) <- colnames(sam250B)
# Total CopyNumber TCN(sumLog), see cnat_4_algorithm_whitepaper.pdf,
page 9
TCN.sL <- (samA - rowMeans(refA)) + (samB - rowMeans(refB))
# real copy number is: cn = 2^(2^cn) ?? (or 2^(cn+1) ??)
cn.sL <- 2^(2^TCN.sL)
headcn.sL)
# CEU_NA06993_STY.CEL CEU_NA06994_STY.CEL
CEU_NA07022_STY.CEL
#SNP_A-1780271 1.801377 3.034645
2.314986
#SNP_A-1780274 2.017805 2.494345
2.370112
#SNP_A-1780277 1.558268 2.446690
2.983195
#SNP_A-1780278 1.879762 1.859002
1.697422
#SNP_A-1780283 2.064631 1.639300
1.912674
#SNP_A-1780290 2.142572 2.738094
2.029215
# or alternatively: cn = 2^cnA + 2^cnB ??
cn <- 2^(samA - rowMeans(refA)) + 2^(samB - rowMeans(refB))
head(cn)
# CEU_NA06993_STY.CEL CEU_NA06994_STY.CEL
CEU_NA07022_STY.CEL
#SNP_A-1780271 1.859447 2.786287
2.369363
#SNP_A-1780274 2.160573 2.315243
2.271201
#SNP_A-1780277 3.203198 2.453341
2.773667
#SNP_A-1780278 1.908932 1.990323
1.748716
#SNP_A-1780283 2.046767 1.691257
1.937375
#SNP_A-1780290 2.098621 2.416547
2.020832
- Is this computation correct?
- Is this way to compute the copynumbers a valuable option?
- Are there any alternatives to compute the copynumbers using R
packages?
Thank you in advance
Best regards
Christian
==============================================
Christian Stratowa, PhD
Boehringer Ingelheim Austria
Dept NCE Lead Discovery - Bioinformatics
Dr. Boehringergasse 5-11
A-1121 Vienna, Austria
Tel.: ++43-1-80105-2470
Fax: ++43-1-80105-2782
email: christian.stratowa at vie.boehringer-ingelheim.com