Subsetting Affybatch objects by gene lists.

0

Entering edit mode

Horswell, Stuart ▴ 30

@horswell-stuart-674

Last seen 9.6 years ago

Hi all, I'm trying to run an analysis on 24 Affymetrix HGu95v2 chips. I've set up, via merge.AffyBatch, an affybatch object containing all 24 arrays. A1 <- read.affybatch("A1.cel") . . . A24 <- read.affybatch("A24.cel") A <- merge.AffyBatch(A1, A2) A <- merge.AffyBatch(A, A3) . . . A<- merge.AffyBatch(A, A24) I then computed MAS5 type Present/Absent calls for each array using mas5calls. A.calls <- mas5calls(A) p.a.A <- exprs(A.calls) What I'd like to do now is remove all of those genes without a single present call across all 24 arrays before normalizing. I can use the p.a.A file to obtain a list of the gene names/affy id tags that I want to remove but I can't figure out how to delete the relavent probe pairs from my affybatch object. In fact that only things I've been able to find on the mailing list archive and/or vignettes are how to subset by array or how to remove chunks from the cdf environment - but this presents me with two problems, first I'm not sure I can get the pattern matching working well enough to identify which entry numbers in the cdf file correspond to the gene list I have, and secondly, people have already commented that this isn't neccessarily a sensible approach for proper analysis anyway. So I'm kind of stumped now! Any help or advice would be most greatfully received, many thanks, Stu

cdf probe cdf probe • 1.5k views

ADD COMMENT • link updated 20.1 years ago by Adaikalavan Ramasamy ★ 1.8k • written 20.1 years ago by Horswell, Stuart ▴ 30

0

Entering edit mode

Aedin Culhane ▴ 310

@aedin-culhane-500

Last seen 9.6 years ago

Hi Stuart ReadAffy() will automatically read all cel files in your directory, or if you wish to select specific ones, ReadAffy(widget=T) is useful. I don't think this is the shortest way, but this should filter your genes: data(affybatch.example) PACalls <- mas5calls(affybatch.example) c<-exprs(PACalls) countA <- function(x,A = 2){ length(which(x=="A")) == A #Count the no of A and check if equal to A } countA will return a result TRUE, and FAlSE. To test the rows satisifying TRUE use apply apply(c, 1, countA, A=0) and then to subselect these use: c[apply(c, 1, countA, A=0),] c[apply(c, 1, countA, A=1),] c[apply(c, 1, countA, A=2),] This will give you lists of genes for which there are allA, 1 A's, 2 A's etc. Equally (which(x=="P")) could be used. Then filter your data using A<-ncol(exprs(data)) fil<-c[apply(c, 1, countA, A=A),] exprs(data)[fil,] Hope this helps Aedin -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of Horswell, Stuart Sent: 15 March 2004 14:22 To: bioconductor@stat.math.ethz.ch Subject: [BioC] Subsetting Affybatch objects by gene lists. Hi all, I'm trying to run an analysis on 24 Affymetrix HGu95v2 chips. I've set up, via merge.AffyBatch, an affybatch object containing all 24 arrays. A1 <- read.affybatch("A1.cel") . . . A24 <- read.affybatch("A24.cel") A <- merge.AffyBatch(A1, A2) A <- merge.AffyBatch(A, A3) . . . A<- merge.AffyBatch(A, A24) I then computed MAS5 type Present/Absent calls for each array using mas5calls. A.calls <- mas5calls(A) p.a.A <- exprs(A.calls) What I'd like to do now is remove all of those genes without a single present call across all 24 arrays before normalizing. I can use the p.a.A file to obtain a list of the gene names/affy id tags that I want to remove but I can't figure out how to delete the relavent probe pairs from my affybatch object. In fact that only things I've been able to find on the mailing list archive and/or vignettes are how to subset by array or how to remove chunks from the cdf environment - but this presents me with two problems, first I'm not sure I can get the pattern matching working well enough to identify which entry numbers in the cdf file correspond to the gene list I have, and secondly, people have already commented that this isn't neccessarily a sensible approach for proper analysis anyway. So I'm kind of stumped now! Any help or advice would be most greatfully received, many thanks, Stu _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 20.1 years ago Aedin Culhane ▴ 310

0

Entering edit mode

Adaikalavan Ramasamy ★ 1.8k

@adaikalavan-ramasamy-675

Last seen 9.6 years ago

> -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of Horswell, > Stuart > Sent: 15 March 2004 14:22 > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] Subsetting Affybatch objects by gene lists. > > > > Hi all, > > I'm trying to run an analysis on 24 Affymetrix HGu95v2 chips. > > I've set up, via merge.AffyBatch, an affybatch object containing > all 24 arrays. > > A1 <- read.affybatch("A1.cel") > . > . > . > A24 <- read.affybatch("A24.cel") > > A <- merge.AffyBatch(A1, A2) > A <- merge.AffyBatch(A, A3) > . > . > . > A<- merge.AffyBatch(A, A24) > Why are you doing this ? If you have all the cel files in one place, then filenames <- list.files(path="/celfile/directory/", pattern=".cel", full.names=TRUE) raw <- ReadAffy(filenames=filenames) > I then computed MAS5 type Present/Absent calls for each array > using mas5calls. > > A.calls <- mas5calls(A) > p.a.A <- exprs(A.calls) > > What I'd like to do now is remove all of those genes without a > single present call across all 24 arrays before normalizing. > > I can use the p.a.A file to obtain a list of the gene names/affy > id tags that I want to remove but I can't figure out how to > delete the relavent probe pairs from my affybatch object. You mean probesets and not probe pairs ? The dimension of p.a.A is n x 24; where n is the number of probesets in HGu95v2. MAS 5.0 gives one detection call for each probeset (which contain 11-20 probe pairs). If you want to remove all the probesets with 0 Present call, try this : A.expr <- mas5(A) A.expr <- exprs(A.expr) A.calls <- mas5calls(A) A.calls <- exprs(A.calls) presence <- apply( A.calls, 1, function(x) sum( x=="P" ) ) bad <- which( presence == 0 ) length(bad) stopifnot( identical( rownames(A.expr), rownames(A.calls) ) ) stopifnot( identical( colnames(A.expr), colnames(A.calls) ) ) # check if the dataset was permutated somehow if(length(bad) > 0 ){ A.expr.trimmed <- A.expr[ -bad, ] A.call.trimmed <- A.call[ -bad, ] } > In fact that only things I've been able to find on the mailing > list archive and/or vignettes are how to subset by array or how > to remove chunks from the cdf environment - but this presents me The resultant of using exprs() on an Affybatch is a matrix and you can simply proceed with subset. Try class(A.expr) before and after exprs(). > with two problems, first I'm not sure I can get the pattern > matching working well enough to identify which entry numbers in > the cdf file correspond to the gene list I have, and secondly, > people have already commented that this isn't neccessarily a > sensible approach for proper analysis anyway. So I'm kind of stumped now! Better avoid matching whenever possible as it can be slow on large dataset. Better to check if rownames and colnames are identical. You can be more stringent and remove any probesets with less say 3 present calls etc. Or you can try using other normalization method like rma, gcrma etc. > Any help or advice would be most greatfully received, > > many thanks, > > Stu > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.1 years ago Adaikalavan Ramasamy ★ 1.8k

Login before adding your answer.