Hi Helen,
I have an experiment that I have analysed with limma, and spans two
print
runs in which the layout was changed between print runs. The approach
I
used was to normalise the print runs separately (since I wanted to use
print-tip normalisation which is obviously layout dependent), and then
combine the two MA objects.
There are several approaches to combining the print runs depending
upon the
unqiueness of your probes. If all your probes have unique names then
you can
use a modified version of the merge function (I think Gordon was going
to
implement it, but I made my own version based on merge.RGlist)
setMethod("merge", c("MAList","MAList"),
definition=function(x,y,z,...) {
# Merge MAList y into x aligning by row names - based on merge.RGlist
genes1 <- rownames(x$M)
if(is.null(genes1)) genes1 <- rownames(x$A)
genes2 <- rownames(y$M)
if(is.null(genes2)) genes2 <- rownames(y$A)
if(is.null(genes1) || is.null(genes2)) stop("Need row names to
align on")
fields1 <- names(x)
fields2 <- names(y)
if(!identical(fields1,fields2)) stop("The two MALists have
different
elements")
ord2 <- match(makeUnique(genes1), makeUnique(genes2))
for (i in fields1) x[[i]] <- cbind(x[[i]],y[[i]][ord2,])
x
})
You'd call this using
MA.Combined <- merge(MA.PrintRun1,MA.PrintRun2)
this will merge two MA objects on the rownames. However if you have
some
duplicate rownames, this method doesn't work. I think the MA values
from the
second print run will all be set to NA which is not what you want.
I had the situation where ~95% of my probe names were unique, so the
above
solution wasn't quite suitable. To get around this, I wrote my own
function
to map array location of print run 2 onto the locations of print run
1.
This requires you to know how the arrays were laid out differently. If
you
can work this out, create a vector that maps the index of run2 to run
1
mapRun2ToRun1[IndexInGALForPrintRun2] = indexInGALForPrintRun1
Then use mapRun2toRun1 in place of ord2 in the above code or use
MA.combined <- new("MAList",
list(M=cbind(MAPrintRun1[["M"]],MAPrintRun2[["M"]][mapRun2ToRun1,]),
A=cbind(MAPrintRun2[["A"]],MAPrintRun2[["A"]][mapRun2ToRun1,]),
weights=cbind(MAPrintRun1[["weights"]],MAPrintRun2[["weights"]][mapRun
2ToRun
1,]))
As to your question on how to cope with duplicate spots on one array
and not
the other, that is a little trickier. I have some suggestions, but I
suspect
others may be better qualified to answer this part.
I'll assume print run1 is single spots, print run2 is duplicate spots
and
that the bottom half of the array is the (duplicate) copy of the top
half of
the array.
First create a dummy MA list that is just run1 duplicated to look like
run
2, and set all the duplicate values to NA's
eg if the bottom half of the array is the (duplicate) copy of the top
half
of the array them
MA1.duplicate <- rbind(MA1,MA1)
MA1.duplicate[(dim(MA1)[1]+1):(2*dim(MA1)[1],] <- NA
the combine runs 1 and 2, run dupcor and lmFit. I think this should
work,
but I'm not sure of the validity of this approach (anyone else like
to
comment??)
Finally with regard to how what happens to genes on one array and not
the
other I think it depends on how the data from the different arrays are
merged. If you retain the probe, you could give it the data from run
1, and
set all the run2 estimates to NA's. In that case you'll get an
estimate out
of limma - it will be a poorer estimate compared to the other genes
(since
it has fewer degrees of freedom). One comment is that its possible
you
could have different sequences (ie regions) from the same gene on the
array.
If this is the case, I would treat each sequence variant separately
(ie as
different genes since the non-specific hyb and binding efficiency
probably
vary between the sequences)
Finally I would suggest that you contact whoever printed your arrays
and
find out exactly how the print runs differ, and what was spotted on
the
arrays and where it came from. This should then help you decide what
is the
best approach for analysing the data.
Cheers
Chris
Dr Chris Wilkinson
Research Officer (Bioinformatics) | Visiting Research Fellow
Child Health Research Institute (CHRI) | Microarray Analysis Group
7th floor, Clarence Rieger Building | Room 121
Women's and Children's Hospital | School of Applied
Mathematics
72 King William Rd, North Adelaide, 5006 | The University of Adelaide,
5005
Math's Office (Room 121) Ph: 8303 3714
CHRI Office (CR2 52A) Ph: 8161 6363
Christopher.Wilkinson@adelaide.edu.au
http://mag.maths.adelaide.edu.au/crwilkinson.html
> Can anyone help me with the following please?
>
> I have an experiment using two sets of arrays with different layouts
and
> therefore different GAL files that I need to analyse together. A
> previous suggestion on this mailing list was to normalize and then
> combine the log ratios. Can anyone tell me what the code for
combining 2
> MALists is please?
>
> Secondly one of the sets of arrays has the genes printed in
duplicate
> and the other set does not. Is there a way I can use dupcor.series
for
> the arrays with the duplicates and then combine them with the other
set
> of arrays? (or at least take the average - without having to
manually
> alter my gpr files)
>
> Failing all this, if I just combined the MALists as they are, will I
> have problems since the genes are in duplicate on some arrays with
same
> IDs etc and not others?
>
> Finally....I have been told that the genes are the same on both
types of
> slides - whether this means 100% the same or more or less the same,
I'm
> not sure. If they are not completely identical how will the genes,
that
> are only on one set of arrays, be dealt with? i.e. will they be
excluded
> or will they be included in the calculations with data only from one
set
> of arrays?
>
> Many thanks,
>
> Helen