topTable threshold on p-value and logFC [Re: was design matrix]

0

Entering edit mode

Marcus Davy ▴ 680

@marcus-davy-374

Last seen 11.4 years ago

Additionally to decideTests(), I made a function which is useful for making *any* filter you like. The example provided filters the same as decideTests(). You must correctly specify the columns of interest in the ''filter'' expression argument so some knowledge of limma's data structures is required. ttFilter <- function (filter = "abs(logFC) > 0.69 & abs(t) > 2", fit, sort.by = "t", number = nrow(fit), ...) { tt <- topTable(fit, sort.by = sort.by, number = number, ...) tCols <- colnames(tt) e <- new.env() for (i in tCols) { e[[i]] <- tt[[i]] } toSub <- eval(parse(text = filter), envir = e) if( anyis.na(toSub)) ){ toSub <- toSub[!is.na(toSub)] } return(tt[toSub, ]) } Some Examples; library(limma) set.seed(1) MA <- matrix(rnorm(100, 0,3), nc=4) fit <- lmFit(MA) fit <- eBayes(fit) topTable(fit) # Post filter on |M|>2 ttFilter(filter = "abs(logFC)>2", fit) # |M|>1.4 & abs(t) > 1.8 ttFilter(filter = "abs(logFC)>1.4 & abs(t)>1.8", fit) Marcus On 5/10/07 1:58 PM, "Gordon Smyth" <smyth at="" wehi.edu.au=""> wrote: > I have changed the subject line to something more appropriate. > > In R 2.5.1 and Bioconductor 2.0, the recommended way to do what you > want (select DE genes on the basis of a combination of p-value and > log fold change) was to use decideTests(). In R 2.6.0 and > Bioconductor 2.1, you will find that topTable() now has p-value and > logFC arguments which allow you to do the same thing using topTable(). > > Best wishes > Gordon > >> Date: Wed, 3 Oct 2007 17:31:34 +0100 (BST) >> From: Lev Soinov <lev_embl1 at="" yahoo.co.uk=""> >> Subject: Re: [BioC] design matrix >> To: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> >> Cc: bioconductor at stat.math.ethz.ch >> Message-ID: <412385.24484.qm at web27908.mail.ukl.yahoo.com> >> Content-Type: text/plain >> >> Dear List, >> >> Could you help me with another small issue? >> I usually write out the results of my analysis using the >> write.table function as follows: >> >> Assuming 30000 probes in the dataset: >> data <- ReadAffy() >> eset <- rma(data) >> >> design <- model.matrix(~ -1+factor(c(1,1,1,2,2,3,3,3))) >> colnames(design) <- c("group1", "group2", "group3") >> contrast.matrix <- makeContrasts(group2-group1, group3-group2, >> group3-group1, levels=design) >> >> fit <- lmFit(temp, design) >> fit2 <- contrasts.fit(fit, contrast.matrix) >> fit2 <- eBayes(fit2) >> >> C1<-topTable(fit2, coef=1, number=30000, adjust="BH") >> >> write.table(C1,file="comparison1.txt",append=TRUE,quote=FALSE,sep=" \t",row.na >> mes=TRUE,col.names=FALSE) >> >> C2<-topTable(fit2, coef=2, number=30000, adjust="BH") >> >> write.table(C2,file="comparison2.txt",append=TRUE,quote=FALSE,sep=" \t",row.na >> mes=TRUE,col.names=FALSE) >> >> C3<-topTable(fit2, coef=3, number=30000, adjust="BH") >> >> write.table(C3,file="comparison3.txt",append=TRUE,quote=FALSE,sep=" \t",row.na >> mes=TRUE,col.names=FALSE) >> >> I then use the written out txt files (comparison1.txt, >> comparison2.txt and comparison3.txt) to select significant probes >> on the basis of log2fold change and adjusted p values thresholds, using >> Excel. >> Would you say that this is a correct way to do this and could you >> please recommend me some other, may be more efficient way of >> writing the results of topTable for all 30000 probes out? >> >> With kind regards, >> Lev. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor ___________________________________________________________________ The contents of this e-mail are privileged and/or confid...{{dropped:6}}

• 2.2k views

ADD COMMENT • link 18.3 years ago Marcus Davy ▴ 680

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 1 day ago

United States

Hi Marcus -- A few comments below, for what it's worth... Marcus Davy <mdavy at="" hortresearch.co.nz=""> writes: > Additionally to decideTests(), I made a function which is useful for making > *any* filter you like. The example provided filters the same as > decideTests(). > You must correctly specify the columns of interest in the ''filter'' > expression argument so some knowledge of limma's data structures is > required. > > ttFilter <- > function (filter = "abs(logFC) > 0.69 & abs(t) > 2", fit, sort.by = "t", > number = nrow(fit), ...) > { > tt <- topTable(fit, sort.by = sort.by, number = number, ...) from here... > tCols <- colnames(tt) > e <- new.env() > for (i in tCols) { > e[[i]] <- tt[[i]] > } > toSub <- eval(parse(text = filter), envir = e) ... to here copies the data frame returned by topTable into an environment, to be used in eval. However, the 'envir' argument to eval can be a data.frame (!, see the help page for eval), so you could have just toSub <- eval(parse(text=filter), tt) 'with' provides a kind of user-friendly access to this for interactive use toSub <- with(tt, abs(logFC) > 0.69 & abs(t) > 2) > if( anyis.na(toSub)) ){ > toSub <- toSub[!is.na(toSub)] > } > return(tt[toSub, ]) reducing the length of toSub (by deleting the NA's) will likely lead to unexpected recycling of the subscript index, e.g., > df <- data.frame(x=1:3) > df[c(TRUE,FALSE),, drop=FALSE] x 1 1 3 3 Martin > } > > Some Examples; > library(limma) > set.seed(1) > MA <- matrix(rnorm(100, 0,3), nc=4) > fit <- lmFit(MA) > fit <- eBayes(fit) > topTable(fit) > # Post filter on |M|>2 > ttFilter(filter = "abs(logFC)>2", fit) > # |M|>1.4 & abs(t) > 1.8 > ttFilter(filter = "abs(logFC)>1.4 & abs(t)>1.8", fit) > > > Marcus > > On 5/10/07 1:58 PM, "Gordon Smyth" <smyth at="" wehi.edu.au=""> wrote: > >> I have changed the subject line to something more appropriate. >> >> In R 2.5.1 and Bioconductor 2.0, the recommended way to do what you >> want (select DE genes on the basis of a combination of p-value and >> log fold change) was to use decideTests(). In R 2.6.0 and >> Bioconductor 2.1, you will find that topTable() now has p-value and >> logFC arguments which allow you to do the same thing using topTable(). >> >> Best wishes >> Gordon >> >>> Date: Wed, 3 Oct 2007 17:31:34 +0100 (BST) >>> From: Lev Soinov <lev_embl1 at="" yahoo.co.uk=""> >>> Subject: Re: [BioC] design matrix >>> To: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> >>> Cc: bioconductor at stat.math.ethz.ch >>> Message-ID: <412385.24484.qm at web27908.mail.ukl.yahoo.com> >>> Content-Type: text/plain >>> >>> Dear List, >>> >>> Could you help me with another small issue? >>> I usually write out the results of my analysis using the >>> write.table function as follows: >>> >>> Assuming 30000 probes in the dataset: >>> data <- ReadAffy() >>> eset <- rma(data) >>> >>> design <- model.matrix(~ -1+factor(c(1,1,1,2,2,3,3,3))) >>> colnames(design) <- c("group1", "group2", "group3") >>> contrast.matrix <- makeContrasts(group2-group1, group3-group2, >>> group3-group1, levels=design) >>> >>> fit <- lmFit(temp, design) >>> fit2 <- contrasts.fit(fit, contrast.matrix) >>> fit2 <- eBayes(fit2) >>> >>> C1<-topTable(fit2, coef=1, number=30000, adjust="BH") >>> >>> write.table(C1,file="comparison1.txt",append=TRUE,quote=FALSE,sep= "\t",row.na >>> mes=TRUE,col.names=FALSE) >>> >>> C2<-topTable(fit2, coef=2, number=30000, adjust="BH") >>> >>> write.table(C2,file="comparison2.txt",append=TRUE,quote=FALSE,sep= "\t",row.na >>> mes=TRUE,col.names=FALSE) >>> >>> C3<-topTable(fit2, coef=3, number=30000, adjust="BH") >>> >>> write.table(C3,file="comparison3.txt",append=TRUE,quote=FALSE,sep= "\t",row.na >>> mes=TRUE,col.names=FALSE) >>> >>> I then use the written out txt files (comparison1.txt, >>> comparison2.txt and comparison3.txt) to select significant probes >>> on the basis of log2fold change and adjusted p values thresholds, using >>> Excel. >>> Would you say that this is a correct way to do this and could you >>> please recommend me some other, may be more efficient way of >>> writing the results of topTable for all 30000 probes out? >>> >>> With kind regards, >>> Lev. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ___________________________________________________________________ > The contents of this e-mail are privileged and/or confid...{{dropped:6}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology Shared Resource Director Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (208) 667-2793

ADD COMMENT • link 18.3 years ago Martin Morgan 25k

0

Entering edit mode

Marcus Davy ▴ 680

@marcus-davy-374

Last seen 11.4 years ago

Thanks for the information. Yes you are correct, the code; >> if( anyis.na(toSub)) ){ >> toSub <- toSub[!is.na(toSub)] >> } >> return(tt[toSub, ]) is a bug and needs to be removed because of recycling it will stuff the returned index up. That was a quick hack I added recently without thinking about enough which is incorrect as your have pointed out. Marcus On 8/10/07 2:31 PM, "Martin Morgan" <mtmorgan at="" fhcrc.org=""> wrote: > Hi Marcus -- A few comments below, for what it's worth... > > Marcus Davy <mdavy at="" hortresearch.co.nz=""> writes: > >> Additionally to decideTests(), I made a function which is useful for making >> *any* filter you like. The example provided filters the same as >> decideTests(). >> You must correctly specify the columns of interest in the ''filter'' >> expression argument so some knowledge of limma's data structures is >> required. >> >> ttFilter <- >> function (filter = "abs(logFC) > 0.69 & abs(t) > 2", fit, sort.by = "t", >> number = nrow(fit), ...) >> { >> tt <- topTable(fit, sort.by = sort.by, number = number, ...) > > from here... > >> tCols <- colnames(tt) >> e <- new.env() >> for (i in tCols) { >> e[[i]] <- tt[[i]] >> } >> toSub <- eval(parse(text = filter), envir = e) > > ... to here copies the data frame returned by topTable into an > environment, to be used in eval. However, the 'envir' argument to eval > can be a data.frame (!, see the help page for eval), so you could have > just > > toSub <- eval(parse(text=filter), tt) > > 'with' provides a kind of user-friendly access to this for interactive use > > toSub <- with(tt, abs(logFC) > 0.69 & abs(t) > 2) > >> if( anyis.na(toSub)) ){ >> toSub <- toSub[!is.na(toSub)] >> } >> return(tt[toSub, ]) > > reducing the length of toSub (by deleting the NA's) will likely lead > to unexpected recycling of the subscript index, e.g., > >> df <- data.frame(x=1:3) >> df[c(TRUE,FALSE),, drop=FALSE] > x > 1 1 > 3 3 > > Martin > >> } >> >> Some Examples; >> library(limma) >> set.seed(1) >> MA <- matrix(rnorm(100, 0,3), nc=4) >> fit <- lmFit(MA) >> fit <- eBayes(fit) >> topTable(fit) >> # Post filter on |M|>2 >> ttFilter(filter = "abs(logFC)>2", fit) >> # |M|>1.4 & abs(t) > 1.8 >> ttFilter(filter = "abs(logFC)>1.4 & abs(t)>1.8", fit) >> >> >> Marcus >> >> On 5/10/07 1:58 PM, "Gordon Smyth" <smyth at="" wehi.edu.au=""> wrote: >> >>> I have changed the subject line to something more appropriate. >>> >>> In R 2.5.1 and Bioconductor 2.0, the recommended way to do what you >>> want (select DE genes on the basis of a combination of p-value and >>> log fold change) was to use decideTests(). In R 2.6.0 and >>> Bioconductor 2.1, you will find that topTable() now has p-value and >>> logFC arguments which allow you to do the same thing using topTable(). >>> >>> Best wishes >>> Gordon >>> >>>> Date: Wed, 3 Oct 2007 17:31:34 +0100 (BST) >>>> From: Lev Soinov <lev_embl1 at="" yahoo.co.uk=""> >>>> Subject: Re: [BioC] design matrix >>>> To: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> >>>> Cc: bioconductor at stat.math.ethz.ch >>>> Message-ID: <412385.24484.qm at web27908.mail.ukl.yahoo.com> >>>> Content-Type: text/plain >>>> >>>> Dear List, >>>> >>>> Could you help me with another small issue? >>>> I usually write out the results of my analysis using the >>>> write.table function as follows: >>>> >>>> Assuming 30000 probes in the dataset: >>>> data <- ReadAffy() >>>> eset <- rma(data) >>>> >>>> design <- model.matrix(~ -1+factor(c(1,1,1,2,2,3,3,3))) >>>> colnames(design) <- c("group1", "group2", "group3") >>>> contrast.matrix <- makeContrasts(group2-group1, group3-group2, >>>> group3-group1, levels=design) >>>> >>>> fit <- lmFit(temp, design) >>>> fit2 <- contrasts.fit(fit, contrast.matrix) >>>> fit2 <- eBayes(fit2) >>>> >>>> C1<-topTable(fit2, coef=1, number=30000, adjust="BH") >>>> >>>> write.table(C1,file="comparison1.txt",append=TRUE,quote=FALSE,sep ="\t",row. >>>> na >>>> mes=TRUE,col.names=FALSE) >>>> >>>> C2<-topTable(fit2, coef=2, number=30000, adjust="BH") >>>> >>>> write.table(C2,file="comparison2.txt",append=TRUE,quote=FALSE,sep ="\t",row. >>>> na >>>> mes=TRUE,col.names=FALSE) >>>> >>>> C3<-topTable(fit2, coef=3, number=30000, adjust="BH") >>>> >>>> write.table(C3,file="comparison3.txt",append=TRUE,quote=FALSE,sep ="\t",row. >>>> na >>>> mes=TRUE,col.names=FALSE) >>>> >>>> I then use the written out txt files (comparison1.txt, >>>> comparison2.txt and comparison3.txt) to select significant probes >>>> on the basis of log2fold change and adjusted p values thresholds, using >>>> Excel. >>>> Would you say that this is a correct way to do this and could you >>>> please recommend me some other, may be more efficient way of >>>> writing the results of topTable for all 30000 probes out? >>>> >>>> With kind regards, >>>> Lev. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> ___________________________________________________________________ >> The contents of this e-mail are privileged and/or confid...{{dropped:6}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor ___________________________________________________________________ The contents of this e-mail are privileged and/or confid...{{dropped:6}}

ADD COMMENT • link 18.3 years ago Marcus Davy ▴ 680

0

Entering edit mode

Marcus Davy ▴ 680

@marcus-davy-374

Last seen 11.4 years ago

Condensed function for filtering based on Martins suggestions; ttFilter <- function( filter = "abs(logFC) > 0.69 & abs(t) > 2", fit, sort.by="t", number = nrow(fit), ...) { tt <- topTable(fit, sort.by = sort.by, number = number, ...) # Obtain logical from filter toSub <- eval(parse(text=filter), tt) return( tt[toSub, ] ) } Marcus On 8/10/07 2:51 PM, "Marcus Davy" <mdavy at="" hortresearch.co.nz=""> wrote: > Thanks for the information. > > Yes you are correct, the code; > >>> if( anyis.na(toSub)) ){ >>> toSub <- toSub[!is.na(toSub)] >>> } >>> return(tt[toSub, ]) > > is a bug and needs to be removed because of recycling it will stuff the > returned index up. That was a quick hack I added recently without thinking > about enough which is incorrect as your have pointed out. > > > Marcus > > > > On 8/10/07 2:31 PM, "Martin Morgan" <mtmorgan at="" fhcrc.org=""> wrote: > >> Hi Marcus -- A few comments below, for what it's worth... >> >> Marcus Davy <mdavy at="" hortresearch.co.nz=""> writes: >> >>> Additionally to decideTests(), I made a function which is useful for making >>> *any* filter you like. The example provided filters the same as >>> decideTests(). >>> You must correctly specify the columns of interest in the ''filter'' >>> expression argument so some knowledge of limma's data structures is >>> required. >>> >>> ttFilter <- >>> function (filter = "abs(logFC) > 0.69 & abs(t) > 2", fit, sort.by = "t", >>> number = nrow(fit), ...) >>> { >>> tt <- topTable(fit, sort.by = sort.by, number = number, ...) >> >> from here... >> >>> tCols <- colnames(tt) >>> e <- new.env() >>> for (i in tCols) { >>> e[[i]] <- tt[[i]] >>> } >>> toSub <- eval(parse(text = filter), envir = e) >> >> ... to here copies the data frame returned by topTable into an >> environment, to be used in eval. However, the 'envir' argument to eval >> can be a data.frame (!, see the help page for eval), so you could have >> just >> >> toSub <- eval(parse(text=filter), tt) >> >> 'with' provides a kind of user-friendly access to this for interactive use >> >> toSub <- with(tt, abs(logFC) > 0.69 & abs(t) > 2) >> >>> if( anyis.na(toSub)) ){ >>> toSub <- toSub[!is.na(toSub)] >>> } >>> return(tt[toSub, ]) >> >> reducing the length of toSub (by deleting the NA's) will likely lead >> to unexpected recycling of the subscript index, e.g., >> >>> df <- data.frame(x=1:3) >>> df[c(TRUE,FALSE),, drop=FALSE] >> x >> 1 1 >> 3 3 >> >> Martin >> >>> } >>> >>> Some Examples; >>> library(limma) >>> set.seed(1) >>> MA <- matrix(rnorm(100, 0,3), nc=4) >>> fit <- lmFit(MA) >>> fit <- eBayes(fit) >>> topTable(fit) >>> # Post filter on |M|>2 >>> ttFilter(filter = "abs(logFC)>2", fit) >>> # |M|>1.4 & abs(t) > 1.8 >>> ttFilter(filter = "abs(logFC)>1.4 & abs(t)>1.8", fit) >>> >>> >>> Marcus >>> >>> On 5/10/07 1:58 PM, "Gordon Smyth" <smyth at="" wehi.edu.au=""> wrote: >>> >>>> I have changed the subject line to something more appropriate. >>>> >>>> In R 2.5.1 and Bioconductor 2.0, the recommended way to do what you >>>> want (select DE genes on the basis of a combination of p-value and >>>> log fold change) was to use decideTests(). In R 2.6.0 and >>>> Bioconductor 2.1, you will find that topTable() now has p-value and >>>> logFC arguments which allow you to do the same thing using topTable(). >>>> >>>> Best wishes >>>> Gordon >>>> >>>>> Date: Wed, 3 Oct 2007 17:31:34 +0100 (BST) >>>>> From: Lev Soinov <lev_embl1 at="" yahoo.co.uk=""> >>>>> Subject: Re: [BioC] design matrix >>>>> To: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> >>>>> Cc: bioconductor at stat.math.ethz.ch >>>>> Message-ID: <412385.24484.qm at web27908.mail.ukl.yahoo.com> >>>>> Content-Type: text/plain >>>>> >>>>> Dear List, >>>>> >>>>> Could you help me with another small issue? >>>>> I usually write out the results of my analysis using the >>>>> write.table function as follows: >>>>> >>>>> Assuming 30000 probes in the dataset: >>>>> data <- ReadAffy() >>>>> eset <- rma(data) >>>>> >>>>> design <- model.matrix(~ -1+factor(c(1,1,1,2,2,3,3,3))) >>>>> colnames(design) <- c("group1", "group2", "group3") >>>>> contrast.matrix <- makeContrasts(group2-group1, group3-group2, >>>>> group3-group1, levels=design) >>>>> >>>>> fit <- lmFit(temp, design) >>>>> fit2 <- contrasts.fit(fit, contrast.matrix) >>>>> fit2 <- eBayes(fit2) >>>>> >>>>> C1<-topTable(fit2, coef=1, number=30000, adjust="BH") >>>>> >>>>> write.table(C1,file="comparison1.txt",append=TRUE,quote=FALSE,sep="\t" ,row>>>>> . >>>>> na >>>>> mes=TRUE,col.names=FALSE) >>>>> >>>>> C2<-topTable(fit2, coef=2, number=30000, adjust="BH") >>>>> >>>>> write.table(C2,file="comparison2.txt",append=TRUE,quote=FALSE,sep="\t" ,row>>>>> . >>>>> na >>>>> mes=TRUE,col.names=FALSE) >>>>> >>>>> C3<-topTable(fit2, coef=3, number=30000, adjust="BH") >>>>> >>>>> write.table(C3,file="comparison3.txt",append=TRUE,quote=FALSE,sep="\t" ,row>>>>> . >>>>> na >>>>> mes=TRUE,col.names=FALSE) >>>>> >>>>> I then use the written out txt files (comparison1.txt, >>>>> comparison2.txt and comparison3.txt) to select significant probes >>>>> on the basis of log2fold change and adjusted p values thresholds, using >>>>> Excel. >>>>> Would you say that this is a correct way to do this and could you >>>>> please recommend me some other, may be more efficient way of >>>>> writing the results of topTable for all 30000 probes out? >>>>> >>>>> With kind regards, >>>>> Lev. >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> ___________________________________________________________________ >>> The contents of this e-mail are privileged and/or confid...{{dropped:6}} >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ___________________________________________________________________ > The contents of this e-mail are privileged and/or confid...{{dropped:6}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor ___________________________________________________________________ The contents of this e-mail are privileged and/or confid...{{dropped:6}}

ADD COMMENT • link 18.3 years ago Marcus Davy ▴ 680

Login before adding your answer.