Question

duplicateCorrelation

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 51 minutes ago

WEHI, Melbourne, Australia

Dear Devin, There are a couple of problems. Firstly, you've told us that your replicates are 112 spots apart, but you haven't told limma this. So the software is assuming that the replicates are side-by-side, which is the default. You need instead: > cor <- duplicateCorrelation(MA, design, ndups=3, spacing=112) Secondly, two arrays is pretty minimal to estimate duplicate correlations. The help page for duplicateCorrelation says: For this function to return statistically useful results, there must be at least two more arrays than the number of coefficients to be estimated, i.e., two more than the column rank of 'design'. Hence you need at least 3 arrays to have confidence in your results whereas you have only two. If you want to check that duplicateCorrelation() is getting the right input, the best way is to check that your replicates really are at the spacing you think they are. Your data files (ScanArray?) almost certainly contain a gene ID column. Let's assume this column is called "ID". Use > RG <- read.maimages(..., annotation="ID") Then > unwrapdups(MA$genes$ID, ndups=3, spacing=112) is a matrix which should have three identical columns. Does it? Best wishes Gordon >[BioC] duplicateCorrelation >Devin Scannell scannedr at tcd.ie >Fri Nov 18 02:03:07 CET 2005 > >Hi, > >this is not a very interesting question but it has given me enough >trouble to get me to mail the list so I hope somebody has time to >reply. > >I have several two-colour arrays to analyze. Each probe is present >three times on each chip and they are spaced 112 spots apart (not my >decision). The consensus correlation returned by duplicateCorrelation >is typically around zero which is surprising since the spots are close >together and the data looks good in MA plots (even before >normalization). A histogram of the individual correlations >(cor$all.correlations from duplicateCorrelation) supports the >conclusion that the within-chip replicates are poorly correlated. > >I am concerned that the numbers that are being handed to >duplicateCorrelation are incorrect somehow but I am not sure what I am >doing wrong (code below). I have looked at the code for >duplicateCorrelation and cannot follow it so I was wondering if anyone >can suggest a way to verify the correlations it is calculating. Ideally >I would like to be able to select a specific gene, calculate the >correlation between replicates myself and verify that this is the same >as I obtain from duplicateCorrelation. > >Thanks in advance, >Devin >library(limma) > >targets <- readTargets() > >targets > SlideNumber Name FileName Cy3 Cy5 >13 13 60H_9:12 13.csv WT1 60H1 >17 17 60H_12:9 17.csv 60H1 WT1 > >flag.check <- function(x) as.numeric(x$Flags >= 3) >RG <- read.maimages(targets$FileName, sep=",", columns=list(Rf="Ch1 >Median",Gf="Ch2 Median",Rb="Ch1 B Median",Gb="Ch2 B Median"), >wt.fun=flag.check) > >RG$genes <- readGAL() >RG$printer <- getLayout(RG$genes) > >RG.bgc <- backgroundCorrect(RG, method="normexp", offset=50) >MA <- normalizeWithinArrays(RG.bgc, method="loess") > >design <- cbind(c(1,-1)) >cor <- duplicateCorrelation(MA, design, ndups=3)

probe limma probe limma • 1.2k views

ADD COMMENT • link updated 20.2 years ago by Devin Scannell ▴ 20 • written 20.2 years ago by Gordon Smyth 53k

score 0 · Answer 1 · 2005-11-21

Thanks for the helpful reply Gordon. > There are a couple of problems. Firstly, you've told us that your > replicates are 112 spots apart, but you haven't told limma this. So > the software is assuming that the replicates are side-by-side, which > is the default. You need instead: > > > cor <- duplicateCorrelation(MA, design, ndups=3, spacing=112) I had tried this but the but got errors from the unwrapdups() function -- the answer to your question: > > unwrapdups(MA$genes$ID, ndups=3, spacing=112) > is a matrix which should have three identical columns. Does it? .... is therefore no. I used the alternative (incorrect) syntax because the layout data is in the MA object and I thought the function might identify the replicate probes automatically. foolish in retrospect. In any case, it turns out that although all the replicate probes are 112 spots apart they are interspersed with variable numbers of unused spots which causes unwrapdups() to choke. making a new MA object (after background correction and normalization) with the replicate probes side-by-side solves this problem. the consensus correlation is > .5. > Secondly, two arrays is pretty minimal to estimate duplicate > correlations. we have more samples. I wanted to keep the description of the problem as simple as possible.... thanks for the help. Best, Devin > Best wishes > Gordon > >> [BioC] duplicateCorrelation >> Devin Scannell scannedr at tcd.ie >> Fri Nov 18 02:03:07 CET 2005 >> >> Hi, >> >> this is not a very interesting question but it has given me enough >> trouble to get me to mail the list so I hope somebody has time to >> reply. >> >> I have several two-colour arrays to analyze. Each probe is present >> three times on each chip and they are spaced 112 spots apart (not my >> decision). The consensus correlation returned by duplicateCorrelation >> is typically around zero which is surprising since the spots are close >> together and the data looks good in MA plots (even before >> normalization). A histogram of the individual correlations >> (cor$all.correlations from duplicateCorrelation) supports the >> conclusion that the within-chip replicates are poorly correlated. >> >> I am concerned that the numbers that are being handed to >> duplicateCorrelation are incorrect somehow but I am not sure what I am >> doing wrong (code below). I have looked at the code for >> duplicateCorrelation and cannot follow it so I was wondering if anyone >> can suggest a way to verify the correlations it is calculating. >> Ideally >> I would like to be able to select a specific gene, calculate the >> correlation between replicates myself and verify that this is the same >> as I obtain from duplicateCorrelation. >> >> Thanks in advance, >> Devin > >> library(limma) >> >> targets <- readTargets() >> >> targets >> SlideNumber Name FileName Cy3 Cy5 >> 13 13 60H_9:12 13.csv WT1 60H1 >> 17 17 60H_12:9 17.csv 60H1 WT1 >> >> flag.check <- function(x) as.numeric(x$Flags >= 3) >> RG <- read.maimages(targets$FileName, sep=",", columns=list(Rf="Ch1 >> Median",Gf="Ch2 Median",Rb="Ch1 B Median",Gb="Ch2 B Median"), >> wt.fun=flag.check) >> >> RG$genes <- readGAL() >> RG$printer <- getLayout(RG$genes) >> >> RG.bgc <- backgroundCorrect(RG, method="normexp", offset=50) >> MA <- normalizeWithinArrays(RG.bgc, method="loess") >> >> design <- cbind(c(1,-1)) >> cor <- duplicateCorrelation(MA, design, ndups=3) >