Creating a customized cut flow with the affy/limma packages
2
0
Entering edit mode
J.M.Jensen ▴ 20
@jmjensen-5115
Last seen 9.6 years ago
James W. MacDonald <jmacdon at="" ...=""> writes: > > Hi J.M., > > On 2/13/2012 9:35 AM, J.M.Jensen [guest] wrote: > > Dear all, > > I am trying to create a customized cut flow with the affy/limma packages. In order to assign different > cut-off values between different arrays would be to first make one cut-off and then use the remaining > expression set in further analysis. When working with the affy/limma packages, I read and normalize my > data with the justRMA() function, fit a linear model with the limma package and use the decideTests() from > the limma package to select probes with a specified cut-off (e.g. lfc). The problem is that I cannot assign > multiple criteria for the cut flow. For example, if I want to remove all probes from my data set that differ > more than a lfc of 1.2 between two controls, regardless of the expression values in the rest of the samples. > After this cut, I would then be able to use another cut o > ff value, e.g. lfc=1.8 to select probes from different contrasts of the remaining data. I tried looking > into the vennSelect() (affycoretools), because I can use it to select various contras! > ts > > and their intersection, however, I cannot figure how to assign different cut off criteria. > > Any help or thoughts on this would be most appreciated. > > It seems like you want to do two very different things here, but correct > me if I misunderstand. First, it appears that you want to remove all > probes that vary between controls. Here I am assuming that you have one > set of controls, and you want to exclude any probeset that varies > between any two of the controls by 2^1.2 or greater. Second, you want to > select all probes that vary between various contrasts. > > These are very different things, and I don't think there is a > one-size-fits-all solution. The way I understand it, you won't be able > to do anything with the fitted object to filter the first set of > probesets. Instead, you want to use the raw data. > > You could set up a function, say filterControls() to do step 1. > > filterControls <- function(controlvector, eset, filt){ > ## control vector is a numeric vector > ## eset is an ExpressionSet (e.g., from justRMA()) > ## filt is a numeric log fold change to filter on > require(gtools) > comb <- combinations(length(controlvector), 2, controlvector) > indlst <- lapply(1:nrow(comb), function(x) abs(eset[,comb[x,1]] - > eset[,comb[x,2]]) > filt) > ind <- apply(do.call("cbind", indlst), 1, any) > eset <- eset[!ind,] > eset > } > > Then you would just go forward with the subsetted ExpressionSet object, > using decideTests() with whatever lfc you like, and vennSelect() to output. > > Best, > > Jim Hi Jim. Thank you for your quick response. You understand me correct I believe, however, I have some trouble deciphering the solution you suggested. I believe I get the basic idea of probing combinations of the controls and then excluding the combinations which differ with more than 'filt' - if that is indeed the idea. But I do not quite grasp how to determine the controlvector - I have my two control samples loaded in my ExpressionSet (loaded from .CEL files along with my other samples using justRMA()). I have tried to determine the controlvector as the two columns containing my control samples using the exprs() function (e.g. controlvector <- exprs(eset)[,3:4] (column 3 and 4 being the control samples) - but I believe I am missing something here. If you would care to elaborate, I would be very grateful. Best, Jacob > > Best regards, > > J.M. Jensen > > > > -- output of sessionInfo(): > > > > R version 2.13.0 (2011-04-13) > > Platform: i386-pc-mingw32/i386 (32-bit) > > > > locale: > > [1] LC_COLLATE=English_United States.1252 > > [2] LC_CTYPE=English_United States.1252 > > [3] LC_MONETARY=English_United States.1252 > > [4] LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] grid stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] venneuler_1.1-0 rJava_0.9-3 affycoretools_1.24.0 > > [4] KEGG.db_2.5.0 gplots_2.10.1 KernSmooth_2.23-4 > > [7] caTools_1.12 bitops_1.0-4.1 gdata_2.8.2 > > [10] gtools_2.6.2 GO.db_2.5.0 hgu133plus2.db_2.5.0 > > [13] org.Hs.eg.db_2.5.0 RSQLite_0.9-4 DBI_0.2-5 > > [16] hgu133plus2cdf_2.8.0 annotate_1.30.1 AnnotationDbi_1.14.1 > > [19] limma_3.8.3 affy_1.30.0 Biobase_2.12.2 > > > > loaded via a namespace (and not attached): > > [1] affyio_1.20.0 annaffy_1.24.0 biomaRt_2.8.1 > > [4] Biostrings_2.20.4 Category_2.18.0 gcrma_2.24.1 > > [7] genefilter_1.34.0 GOstats_2.18.0 graph_1.30.0 > > [10] GSEABase_1.14.0 IRanges_1.10.6 preprocessCore_1.14.0 > > [13] RBGL_1.28.0 RCurl_1.6-10.1 splines_2.13.0 > > [16] survival_2.36-5 tools_2.13.0 XML_3.4-2.2 > > [19] xtable_1.6-0 > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at ... > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
GO limma ASSIGN GO limma ASSIGN • 943 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States
Hi JM, On 2/14/2012 6:00 AM, J.M.Jensen wrote: > James W. MacDonald<jmacdon at="" ...=""> writes: > >> Hi J.M., >> >> On 2/13/2012 9:35 AM, J.M.Jensen [guest] wrote: >>> Dear all, >>> I am trying to create a customized cut flow with the affy/limma packages. In > order to assign different >> cut-off values between different arrays would be to first make one cut-off and > then use the remaining >> expression set in further analysis. When working with the affy/limma packages, > I read and normalize my >> data with the justRMA() function, fit a linear model with the limma package > and use the decideTests() from >> the limma package to select probes with a specified cut-off (e.g. lfc). The > problem is that I cannot assign >> multiple criteria for the cut flow. For example, if I want to remove all > probes from my data set that differ >> more than a lfc of 1.2 between two controls, regardless of the expression > values in the rest of the samples. >> After this cut, I would then be able to use another cut o >> ff value, e.g. lfc=1.8 to select probes from different contrasts of the > remaining data. I tried looking >> into the vennSelect() (affycoretools), because I can use it to select various > contras! >> ts >>> and their intersection, however, I cannot figure how to assign different > cut off criteria. >>> Any help or thoughts on this would be most appreciated. >> It seems like you want to do two very different things here, but correct >> me if I misunderstand. First, it appears that you want to remove all >> probes that vary between controls. Here I am assuming that you have one >> set of controls, and you want to exclude any probeset that varies >> between any two of the controls by 2^1.2 or greater. Second, you want to >> select all probes that vary between various contrasts. >> >> These are very different things, and I don't think there is a >> one-size-fits-all solution. The way I understand it, you won't be able >> to do anything with the fitted object to filter the first set of >> probesets. Instead, you want to use the raw data. >> >> You could set up a function, say filterControls() to do step 1. >> >> filterControls<- function(controlvector, eset, filt){ >> ## control vector is a numeric vector >> ## eset is an ExpressionSet (e.g., from justRMA()) >> ## filt is a numeric log fold change to filter on >> require(gtools) >> comb<- combinations(length(controlvector), 2, controlvector) >> indlst<- lapply(1:nrow(comb), function(x) abs(eset[,comb[x,1]] - >> eset[,comb[x,2]])> filt) >> ind<- apply(do.call("cbind", indlst), 1, any) >> eset<- eset[!ind,] >> eset >> } >> >> Then you would just go forward with the subsetted ExpressionSet object, >> using decideTests() with whatever lfc you like, and vennSelect() to output. >> >> Best, >> >> Jim > Hi Jim. > > Thank you for your quick response. You understand me correct I believe, however, > I have some trouble deciphering the solution you suggested. I believe I get the > basic idea of probing combinations of the controls and then excluding the > combinations which differ with more than 'filt' - if that is indeed the idea. > But I do not quite grasp how to determine the controlvector - I have my two > control samples loaded in my ExpressionSet (loaded from .CEL files along with my > other samples using justRMA()). I have tried to determine the controlvector as > the two columns containing my control samples using the exprs() function (e.g. > controlvector<- exprs(eset)[,3:4] (column 3 and 4 being the control samples) - > but I believe I am missing something here. The controlvector just gives the columns of the exprs slot that contain controls. We then use these columns to get all combinations of comparisons (in your case it will just be 3 v 4). So something like filterControls(3:4, eset, 1.2) will do what you requested originally. Best, Jim > > > If you would care to elaborate, I would be very grateful. > > Best, > Jacob > >>> Best regards, >>> J.M. Jensen >>> >>> -- output of sessionInfo(): >>> >>> R version 2.13.0 (2011-04-13) >>> Platform: i386-pc-mingw32/i386 (32-bit) >>> >>> locale: >>> [1] LC_COLLATE=English_United States.1252 >>> [2] LC_CTYPE=English_United States.1252 >>> [3] LC_MONETARY=English_United States.1252 >>> [4] LC_NUMERIC=C >>> [5] LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] grid stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] venneuler_1.1-0 rJava_0.9-3 affycoretools_1.24.0 >>> [4] KEGG.db_2.5.0 gplots_2.10.1 KernSmooth_2.23-4 >>> [7] caTools_1.12 bitops_1.0-4.1 gdata_2.8.2 >>> [10] gtools_2.6.2 GO.db_2.5.0 hgu133plus2.db_2.5.0 >>> [13] org.Hs.eg.db_2.5.0 RSQLite_0.9-4 DBI_0.2-5 >>> [16] hgu133plus2cdf_2.8.0 annotate_1.30.1 AnnotationDbi_1.14.1 >>> [19] limma_3.8.3 affy_1.30.0 Biobase_2.12.2 >>> >>> loaded via a namespace (and not attached): >>> [1] affyio_1.20.0 annaffy_1.24.0 biomaRt_2.8.1 >>> [4] Biostrings_2.20.4 Category_2.18.0 gcrma_2.24.1 >>> [7] genefilter_1.34.0 GOstats_2.18.0 graph_1.30.0 >>> [10] GSEABase_1.14.0 IRanges_1.10.6 preprocessCore_1.14.0 >>> [13] RBGL_1.28.0 RCurl_1.6-10.1 splines_2.13.0 >>> [16] survival_2.36-5 tools_2.13.0 XML_3.4-2.2 >>> [19] xtable_1.6-0 >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at ... >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
J.M.Jensen ▴ 20
@jmjensen-5115
Last seen 9.6 years ago
James W. MacDonald <jmacdon at="" ...=""> writes: > > Hi J.M., > > On 2/13/2012 9:35 AM, J.M.Jensen [guest] wrote: > > Dear all, > > I am trying to create a customized cut flow with the affy/limma packages. In order to assign different > cut-off values between different arrays would be to first make one cut-off and then use the remaining > expression set in further analysis. When working with the affy/limma packages, I read and normalize my > data with the justRMA() function, fit a linear model with the limma package and use the decideTests() from > the limma package to select probes with a specified cut-off (e.g. lfc). The problem is that I cannot assign > multiple criteria for the cut flow. For example, if I want to remove all probes from my data set that differ > more than a lfc of 1.2 between two controls, regardless of the expression values in the rest of the samples. > After this cut, I would then be able to use another cut o > ff value, e.g. lfc=1.8 to select probes from different contrasts of the remaining data. I tried looking > into the vennSelect() (affycoretools), because I can use it to select various contras! > ts > > and their intersection, however, I cannot figure how to assign different cut off criteria. > > Any help or thoughts on this would be most appreciated. > > It seems like you want to do two very different things here, but correct > me if I misunderstand. First, it appears that you want to remove all > probes that vary between controls. Here I am assuming that you have one > set of controls, and you want to exclude any probeset that varies > between any two of the controls by 2^1.2 or greater. Second, you want to > select all probes that vary between various contrasts. > > These are very different things, and I don't think there is a > one-size-fits-all solution. The way I understand it, you won't be able > to do anything with the fitted object to filter the first set of > probesets. Instead, you want to use the raw data. > > You could set up a function, say filterControls() to do step 1. > > filterControls <- function(controlvector, eset, filt){ > ## control vector is a numeric vector > ## eset is an ExpressionSet (e.g., from justRMA()) > ## filt is a numeric log fold change to filter on > require(gtools) > comb <- combinations(length(controlvector), 2, controlvector) > indlst <- lapply(1:nrow(comb), function(x) abs(eset[,comb[x,1]] - > eset[,comb[x,2]]) > filt) > ind <- apply(do.call("cbind", indlst), 1, any) > eset <- eset[!ind,] > eset > } > > Then you would just go forward with the subsetted ExpressionSet object, > using decideTests() with whatever lfc you like, and vennSelect() to output. > > Best, > > Jim > Hi Jim, I believe I might have misunderstood your interpretation after all - It just struck me that this might actually not what I wanted exactly, although it is rather close - What I want in step 1 is to exclude probes that vary with more than e.g. 2^1.2 between two probes - however not just any two control probes - it should still be the same probes compared. Let me give an example: Below are shown the two control samples with the RMA normalized expression values of five probes. A1319_03.CEL A1319_04.CEL 1007_s_at 10.623985 10.823174 1053_at 8.259183 8.720877 117_at 5.854332 6.126637 121_at 7.732116 7.797098 1255_g_at 2.456035 6.298101 The only one probeset that should be excluded is the '1255_g_at', it does not matter that '1007_s_at' in the 'A1319_03.CEL' sample differs from '121_at' in either of the samples. Again, I appreciate your help. Best, Jacob > > Best regards, > > J.M. Jensen > > > > -- output of sessionInfo(): > > > > R version 2.13.0 (2011-04-13) > > Platform: i386-pc-mingw32/i386 (32-bit) > > > > locale: > > [1] LC_COLLATE=English_United States.1252 > > [2] LC_CTYPE=English_United States.1252 > > [3] LC_MONETARY=English_United States.1252 > > [4] LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] grid stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] venneuler_1.1-0 rJava_0.9-3 affycoretools_1.24.0 > > [4] KEGG.db_2.5.0 gplots_2.10.1 KernSmooth_2.23-4 > > [7] caTools_1.12 bitops_1.0-4.1 gdata_2.8.2 > > [10] gtools_2.6.2 GO.db_2.5.0 hgu133plus2.db_2.5.0 > > [13] org.Hs.eg.db_2.5.0 RSQLite_0.9-4 DBI_0.2-5 > > [16] hgu133plus2cdf_2.8.0 annotate_1.30.1 AnnotationDbi_1.14.1 > > [19] limma_3.8.3 affy_1.30.0 Biobase_2.12.2 > > > > loaded via a namespace (and not attached): > > [1] affyio_1.20.0 annaffy_1.24.0 biomaRt_2.8.1 > > [4] Biostrings_2.20.4 Category_2.18.0 gcrma_2.24.1 > > [7] genefilter_1.34.0 GOstats_2.18.0 graph_1.30.0 > > [10] GSEABase_1.14.0 IRanges_1.10.6 preprocessCore_1.14.0 > > [13] RBGL_1.28.0 RCurl_1.6-10.1 splines_2.13.0 > > [16] survival_2.36-5 tools_2.13.0 XML_3.4-2.2 > > [19] xtable_1.6-0 > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at ... > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 495 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6