Question

limma - arrays with different GAL files

0

Entering edit mode

Helen Cattan ▴ 100

@helen-cattan-687

Last seen 9.7 years ago

Hi, Can anyone help me with the following please? I have an experiment using two sets of arrays with different layouts and therefore different GAL files that I need to analyse together. A previous suggestion on this mailing list was to normalize and then combine the log ratios. Can anyone tell me what the code for combining 2 MALists is please? Secondly one of the sets of arrays has the genes printed in duplicate and the other set does not. Is there a way I can use dupcor.series for the arrays with the duplicates and then combine them with the other set of arrays? (or at least take the average - without having to manually alter my gpr files) Failing all this, if I just combined the MALists as they are, will I have problems since the genes are in duplicate on some arrays with same IDs etc and not others? Finally....I have been told that the genes are the same on both types of slides - whether this means 100% the same or more or less the same, I'm not sure. If they are not completely identical how will the genes, that are only on one set of arrays, be dealt with? i.e. will they be excluded or will they be included in the calculations with data only from one set of arrays? Many thanks, Helen [[alternative HTML version deleted]]

• 735 views

ADD COMMENT • link updated 20.0 years ago by Christopher Wilkinson ▴ 140 • written 20.0 years ago by Helen Cattan ▴ 100

score 0 · Answer 1 · 2004-04-22

Hi Helen, I have an experiment that I have analysed with limma, and spans two print runs in which the layout was changed between print runs. The approach I used was to normalise the print runs separately (since I wanted to use print-tip normalisation which is obviously layout dependent), and then combine the two MA objects. There are several approaches to combining the print runs depending upon the unqiueness of your probes. If all your probes have unique names then you can use a modified version of the merge function (I think Gordon was going to implement it, but I made my own version based on merge.RGlist) setMethod("merge", c("MAList","MAList"), definition=function(x,y,z,...) { # Merge MAList y into x aligning by row names - based on merge.RGlist genes1 <- rownames(x$M) if(is.null(genes1)) genes1 <- rownames(x$A) genes2 <- rownames(y$M) if(is.null(genes2)) genes2 <- rownames(y$A) if(is.null(genes1) || is.null(genes2)) stop("Need row names to align on") fields1 <- names(x) fields2 <- names(y) if(!identical(fields1,fields2)) stop("The two MALists have different elements") ord2 <- match(makeUnique(genes1), makeUnique(genes2)) for (i in fields1) x[[i]] <- cbind(x[[i]],y[[i]][ord2,]) x }) You'd call this using MA.Combined <- merge(MA.PrintRun1,MA.PrintRun2) this will merge two MA objects on the rownames. However if you have some duplicate rownames, this method doesn't work. I think the MA values from the second print run will all be set to NA which is not what you want. I had the situation where ~95% of my probe names were unique, so the above solution wasn't quite suitable. To get around this, I wrote my own function to map array location of print run 2 onto the locations of print run 1. This requires you to know how the arrays were laid out differently. If you can work this out, create a vector that maps the index of run2 to run 1 mapRun2ToRun1[IndexInGALForPrintRun2] = indexInGALForPrintRun1 Then use mapRun2toRun1 in place of ord2 in the above code or use MA.combined <- new("MAList", list(M=cbind(MAPrintRun1[["M"]],MAPrintRun2[["M"]][mapRun2ToRun1,]), A=cbind(MAPrintRun2[["A"]],MAPrintRun2[["A"]][mapRun2ToRun1,]), weights=cbind(MAPrintRun1[["weights"]],MAPrintRun2[["weights"]][mapRun 2ToRun 1,])) As to your question on how to cope with duplicate spots on one array and not the other, that is a little trickier. I have some suggestions, but I suspect others may be better qualified to answer this part. I'll assume print run1 is single spots, print run2 is duplicate spots and that the bottom half of the array is the (duplicate) copy of the top half of the array. First create a dummy MA list that is just run1 duplicated to look like run 2, and set all the duplicate values to NA's eg if the bottom half of the array is the (duplicate) copy of the top half of the array them MA1.duplicate <- rbind(MA1,MA1) MA1.duplicate[(dim(MA1)[1]+1):(2*dim(MA1)[1],] <- NA the combine runs 1 and 2, run dupcor and lmFit. I think this should work, but I'm not sure of the validity of this approach (anyone else like to comment??) Finally with regard to how what happens to genes on one array and not the other I think it depends on how the data from the different arrays are merged. If you retain the probe, you could give it the data from run 1, and set all the run2 estimates to NA's. In that case you'll get an estimate out of limma - it will be a poorer estimate compared to the other genes (since it has fewer degrees of freedom). One comment is that its possible you could have different sequences (ie regions) from the same gene on the array. If this is the case, I would treat each sequence variant separately (ie as different genes since the non-specific hyb and binding efficiency probably vary between the sequences) Finally I would suggest that you contact whoever printed your arrays and find out exactly how the print runs differ, and what was spotted on the arrays and where it came from. This should then help you decide what is the best approach for analysing the data. Cheers Chris Dr Chris Wilkinson Research Officer (Bioinformatics) | Visiting Research Fellow Child Health Research Institute (CHRI) | Microarray Analysis Group 7th floor, Clarence Rieger Building | Room 121 Women's and Children's Hospital | School of Applied Mathematics 72 King William Rd, North Adelaide, 5006 | The University of Adelaide, 5005 Math's Office (Room 121) Ph: 8303 3714 CHRI Office (CR2 52A) Ph: 8161 6363 Christopher.Wilkinson@adelaide.edu.au http://mag.maths.adelaide.edu.au/crwilkinson.html > Can anyone help me with the following please? > > I have an experiment using two sets of arrays with different layouts and > therefore different GAL files that I need to analyse together. A > previous suggestion on this mailing list was to normalize and then > combine the log ratios. Can anyone tell me what the code for combining 2 > MALists is please? > > Secondly one of the sets of arrays has the genes printed in duplicate > and the other set does not. Is there a way I can use dupcor.series for > the arrays with the duplicates and then combine them with the other set > of arrays? (or at least take the average - without having to manually > alter my gpr files) > > Failing all this, if I just combined the MALists as they are, will I > have problems since the genes are in duplicate on some arrays with same > IDs etc and not others? > > Finally....I have been told that the genes are the same on both types of > slides - whether this means 100% the same or more or less the same, I'm > not sure. If they are not completely identical how will the genes, that > are only on one set of arrays, be dealt with? i.e. will they be excluded > or will they be included in the calculations with data only from one set > of arrays? > > Many thanks, > > Helen