Some Genefilter questions

0

Entering edit mode

David K Pritchard ▴ 70

@david-k-pritchard-590

Last seen 9.9 years ago

Robert, there are two sets of studies which have suggested the ~ 40% expression level from what I remember. Classic COT curve studies from several decades ago suggested roughly this level. More recently, MPSS (Massive Parrelel Signature Sequencing) studies have also suggested this is a reasonable cutoff. Based on these studies I use the same rule of thumb that you do - the median. David Pritchard On Thu, 30 Nov 2006, Robert Gentleman wrote: > Hi, > > Lourdusamy A Anbarasu wrote: >> Dear Dr. Robert, >> >> You have mentioned that the filtering on the variability is preferred >> than raw intensity value. I have also read your previous post on this >> issue. For filters based on CV, are there any recommended cut-off values? > > Not really. A widely held, but AFAIK undocumented, belief is that in > any given tissue/cell about 40% of the genome is expressed at any time. > So, I usually choose the median - that is somewhat conservative with > respect to the above cited statistic - but this is a personal > preference. I have not seen any research (and I think it would be hard). > > > best wishes > Robert > >> >> Thanks in advance. >> >> Best regards, >> Anbarasu >> >> On 11/30/06, *Robert Gentleman* <rgentlem at="" fhcrc.org="">> <mailto:rgentlem at="" fhcrc.org=""> > wrote: >> >> Hi, >> >> Amy Mikhail wrote: >> > Dear Bioconductors, >> > >> > I am annalysing 6 PlasmodiumAnopheles genechips, which have only >> Anopheles >> > mosquito samples hybridised to them (i.e. they are not infected >> > mosquitoes). The 6 chips include 3 replicates, each consisting >> of two >> > time points. The design matrix is as follows: >> > >> >> design >> > M15d M43d >> > [1,] 1 0 >> > [2,] 0 1 >> > [3,] 1 0 >> > [4,] 0 1 >> > [5,] 1 0 >> > [6,] 0 1 >> > >> > >> > I have tried both gcRMA (in AffyLMGUI), and RMA, MBEI and MAS5 >> (in affy). >> > Looking at the (BH) adjusted p values <0.05, this gave me 2, 12, >> 0 and 0 >> > DE genes, respectively... much less than I was expecting. >> > >> > As this affy chip contains probesets for both mosquito and malaria >> > parasite genes, I am wondering: >> > >> > (a) if it is better to remove all the parasite probesets before >> my analysis; >> >> Yes, if you don't intend to use them, and they are not relevant to >> your analysis. There is no point in doing p-value corrections for tests >> you know are not interesting/relevant a priori. >> >> > >> > (b) if so at what stage I should do this (before or after >> normalisation >> > and background correction, or does it matter?) >> >> After both and prior to analysis - otherwise you are likely to >> need to >> do some serious tweaking of the normalization code. >> >> > >> > (c) how would I filter out these probesets using genefilter (all the >> > parasite affy IDs begin with Pf. - could I use this prefix in the >> affy IDs >> > to filter out the probesets, and if so how?) >> >> you don't need genefilter at all, this is a subseting problem. >> If you had an ExpressionSet you would do something like: >> >> parasites = grep("^Pf", featureNames(myExpressionSet)) >> >> mySubset = myExpressionSet[!parasites,] >> >> > >> > Secondly, I did not add any of the polyA controls to my >> samples. I would >> > like to know: >> > >> > (d) Do any of the bg correct / normalisation methods I tried utilise >> > affymetrix control probesets, and if so, how? >> >> I doubt it. >> >> > >> > (e) Should I also filter out the control sets - again, if so at >> what stage >> > in the analysis and what would be an appropriate code to use? >> > >> >> same place as you filter the parasite genes and pretty much in the >> same way. They are likely to start with AFFX. >> >> > I did try the code for non-specific filtering (on my RMA dataset) >> from pg. >> > 232 of the bioconductor monograph, but the reduction in the number of >> > probesets was quite drastic; >> > >> >> f1 <- pOverA(0.25, log2(100)) >> >> f2 <- function(x) (IQR(x) > 0.5) >> >> that is a typo in the text - you probably want to filter out those >> with IQR below the median, not for some fixed value. >> >> >> ff <- filterfun(f1, f2) >> >> selected <- genefilter(Baseage.transformed , ff) >> >> sum(selected) >> > [1] 404 ###(The origninal no. of probesets is 22,726)### >> >> Baseage.sub <- Baseage.transformed[selected, ] >> > >> > Also, I understood from the monograph that "100" was to filter out >> > fluorescence intensities less than this, but I am not clear if >> this is >> > from raw intensities or log2 values? >> >> raw - 100 on the log2 scale is larger than can be represented in the >> image file formats used. And don't do that - it is not a good idea - >> filter on variability. >> >> >> > >> > All the parasite probesets have raw intensities <35 .... so could >> I apply >> > this as a simple filter, and would this have to be on raw (rather >> than >> > normalised data)? >> >> >> Best wishes >> Robert >> >> > >> > Appologies for the long posting... >> > >> > Looking forward to any replies, >> > Regards, >> > Amy >> > >> >> sessionInfo() >> > R version 2.4.0 (2006-10-03) >> > i386-pc-mingw32 >> > >> > locale: >> > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >> > States.1252;LC_MONETARY=English_United >> > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >> > >> > attached base packages: >> > [1] "tcltk" "splines" "tools" "methods" "stats" >> > "graphics" "grDevices" "utils" "datasets" "base" >> > >> > other attached packages: >> > plasmodiumanophelescdf tkWidgets DynDoc >> > widgetTools agahomology >> > "1.14.0" " 1.12.0" "1.12.0" >> > "1.10.0" "1.14.2" >> > affyPLM gcrma matchprobes >> > affydata annaffy >> > "1.10.0" "2.6.0" "1.6.0" >> > "1.10.0" "1.6.0" >> > KEGG GO limma >> > geneplotter annotate >> > "1.14.0" "1.14.0" "2.9.1" >> > "1.12.0" "1.12.0" >> > affy affyio genefilter >> > survival Biobase >> > "1.12.0" "1.2.0" "1.12.0 " >> > "2.29" "1.12.0" >> > >> > >> > ------------------------------------------- >> > Amy Mikhail >> > Research student >> > University of Aberdeen >> > Zoology Building >> > Tillydrone Avenue >> > Aberdeen AB24 2TZ >> > Scotland >> > Email: a.mikhail at abdn.ac.uk <mailto:a.mikhail at="" abdn.ac.uk=""> >> > Phone: 00-44-1224-272880 (lab) >> > 00-44-1224-273256 (office) >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at stat.math.ethz.ch >> <mailto:bioconductor at="" stat.math.ethz.ch=""> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >> > >> >> -- >> Robert Gentleman, PhD >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M2-B876 >> PO Box 19024 >> Seattle, Washington 98109-1024 >> 206-667-7700 >> rgentlem at fhcrc.org <mailto:rgentlem at="" fhcrc.org=""> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch <mailto:bioconductor at="" stat.math.ethz.ch=""> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> >> -- >> Lourdusamy A Anbarasu >> Dipartimento Medicina Sperimentale e Sanita Pubblica >> Via Scalzino 3 >> 62032 Camerino (MC) > > -- > Robert Gentleman, PhD > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > PO Box 19024 > Seattle, Washington 98109-1024 > 206-667-7700 > rgentlem at fhcrc.org > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

Normalization Survival Cancer cdf genefilter geneplotter affy affydata widgetTools gcrma • 2.0k views

ADD COMMENT • link updated 17.7 years ago by Amy Mikhail ▴ 460 • written 17.7 years ago by David K Pritchard ▴ 70

0

Entering edit mode

Jenny Drnevich ★ 2.2k

@jenny-drnevich-382

Last seen 9.9 years ago

Hi Amy, >Jenny, just wanted to clarify what you said; you reckon if I only want to >remove the foreign species probesets I should do this before >preprocessing, but if I want to remove e.g. absent calls from my own >species probes I should do this after preprocessing. Is this right? Yes, IMO, at least if you're doing the GCRMA background correction. With the soybean data I've worked with, I've seen very large differences in the GC-based background correction depending on whether the other species' probesets were removed or not. Soybeans might be unusual because about 90% of the soybean probesets seem to be expressed, so that throwing out the non-expressed, non-soy probesets, radically changed the distribution of the values sampled for the background estimation. I created a scenario where 30% of the probesets were non-expressed non-soy, 35% were non- expressed soy, and the remaining 35% were expressed soy. The changes in the background correction after throwing out the non-soy were not as extreme, but still could have a large effect (over 4 FC!!) at low expression levels. I'm not sure which is "right" and which is "wrong", but I tend to agree with Jim that I don't feel comfortable using other species' non- expressed probesets to estimate background or normalization distributions for my target species. However, RMA's background correction wasn't really affected by throwing out the non-soy probesets or not. >Also, how do I create the character vector of my parasite probesets for >your code? You said before they all start with "Pf", so you can do something similar to what Robert suggested >parasites <- grep("Pf", geneNames(yourAffyBatchObject), value=TRUE) Giving the argument 'value=TRUE' will give you the gene names, instead of their indices. BTW Robert - you had put "^Pf" - was the ^ a typo, or does that indicate 'begins with' rather than 'anywhere'? >Robert, I tried subsetting after preprocessing but before analysis ... it >made no difference to the order of probesets, however the numbers changed >slightly (all the probesets had slightly higher adjusted P.values after >removing the parasite probes). See below: > > >Why would the adjusted P values be higher in the second case (number of >parasite probes removed was about 4,000)? This is due to the phenomenon that Claus mentioned - by removing the parasite probes, which have low variation, the average variance across genes will increase, subsequently leading to smaller t-values and larger raw p-values. Even though you are correcting for fewer genes, the change in the variance correction can have a larger effect on the adjusted p-values. Best, Jenny >Regards, >Amy > >--------------------------------------------------------------------- ------ > > > Hi, > > > > It may be worth pointing out that a related question can have a huge > > impact on normalization of certain glass arrays. One of the standard > > protocols on the Agilent 44K human arrays causes several hundred control > > spots to light up extremely brightly in the green channel, but remain > > completely off in the red channel. If you leave these control spots in > > the data set when you normalize between channels (i.e., within arrays), > > every known normalization methods breaks -- in the precise sense that it > > will systematically distort the comparison between the red and green > > channels. If you then model the data incorporating a dye effect, you > > will think that almost every gene exhibits a dye bias. On the other > > hand, if you remove these control spots before normalizing between > > channels, then modeling the dye bias suggest that it rarely exists.... > > > > As for the question originally asked here, I would not expect the > > foreign species probes to break the normalization (unless they somehow > > light up in one group of samples but not in the other). So, my own bias > > would be to keep them for background correction and normalization, but > > remove them before the rest of the analysis. > > > > Best, > > Kevin > > > > Jenny Drnevich wrote: > >> Hi Amy, > >> > >> Don't you just love it when you get one response suggesting you do one > >> thing (remove malarial genes after pre-processing) and another response > >> suggesting the opposite? Although I think in this case Robert was > >> suggesting you remove them after pre-processing because it was easier > >> than > >> trying to modify either the normalization code or the cdf environment, > >> which is what Jim pointed out to you. I ran into this same problem with > >> having probesets for other species on the soybean array, which is why I > >> used Ariel's code. I think that if you're using a mixed species array > >> but > >> only put one of the species on it, then you should remove the other > >> species' probesets BEFORE doing the normalization because they really > >> have > >> no bearing on the transcriptome you're trying to measure. On the other > >> hand, if you also want to filter your species' probesets based on > >> presence/absence, minimum cutoff, variation, etc.* , then you should > >> filter > >> these genes AFTER doing the pre-processing because these probesets do > >> contain information about the transcriptome, even if it is just 'not > >> detectably expressed'. > >> > >> Cheers, > >> Jenny > >> > >> * Contrary to Robert, I prefer to filter on presence/absence (using > >> Affy's > >> calls) rather than variability :) I don't know if there is any > >> documentation on which may be "better"... > >> > >------------------------------------------- >Amy Mikhail >Research student >University of Aberdeen >Zoology Building >Tillydrone Avenue >Aberdeen AB24 2TZ >Scotland >Email: a.mikhail at abdn.ac.uk >Phone: 00-44-1224-272880 (lab) > 00-44-1224-273256 (office) Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD COMMENT • link 17.7 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

Jenny Drnevich wrote: > Hi Amy, > > >> Jenny, just wanted to clarify what you said; you reckon if I only want to >> remove the foreign species probesets I should do this before >> preprocessing, but if I want to remove e.g. absent calls from my own >> species probes I should do this after preprocessing. Is this right? > > Yes, IMO, at least if you're doing the GCRMA background correction. With > the soybean data I've worked with, I've seen very large differences in > the GC-based background correction depending on whether the other > species' probesets were removed or not. Soybeans might be unusual I hate to say it, but I have reason to believe that this is not a function of the removal. The version of GCRMA in Bioconductor uses subsampling which can be greatly affected by what you did (but can be achieved in other ways, without reducing the number of probes). It would help to know if similar effects are observed with other methods, particularly either RMA or VSN. > because about 90% of the soybean probesets seem to be expressed, so that > throwing out the non-expressed, non-soy probesets, radically changed the > distribution of the values sampled for the background estimation. I > created a scenario where 30% of the probesets were non-expressed > non-soy, 35% were non-expressed soy, and the remaining 35% were > expressed soy. The changes in the background correction after throwing > out the non-soy were not as extreme, but still could have a large effect > (over 4 FC!!) at low expression levels. I'm not sure which is "right" > and which is "wrong", but I tend to agree with Jim that I don't feel > comfortable using other species' non-expressed probesets to estimate > background or normalization distributions for my target species. > However, RMA's background correction wasn't really affected by throwing > out the non-soy probesets or not. I don't think Jim disagrees on the background - that should be fine. The real question is normalization, and well, there are reasons both pro and con. I personally doubt the effect is large enough to warrant the effort in "fixing" it, if that is indeed what is happening. > > >> Also, how do I create the character vector of my parasite probesets for >> your code? > > You said before they all start with "Pf", so you can do something > similar to what Robert suggested > > >parasites <- grep("Pf", geneNames(yourAffyBatchObject), value=TRUE) > > Giving the argument 'value=TRUE' will give you the gene names, instead > of their indices. BTW Robert - you had put "^Pf" - was the ^ a typo, or > does that indicate 'begins with' rather than 'anywhere'? It is begins with, since Amy said they "begin with", and I do not want anywhere. It is kind of important to get this right - not a typo. > > > >> Robert, I tried subsetting after preprocessing but before analysis ... it >> made no difference to the order of probesets, however the numbers changed >> slightly (all the probesets had slightly higher adjusted P.values after >> removing the parasite probes). See below: >> >> >> Why would the adjusted P values be higher in the second case (number of >> parasite probes removed was about 4,000)? > > This is due to the phenomenon that Claus mentioned - by removing the > parasite probes, which have low variation, the average variance across > genes will increase, subsequently leading to smaller t-values and larger > raw p-values. Even though you are correcting for fewer genes, the change > in the variance correction can have a larger effect on the adjusted > p-values. > Using some sort of attenuated p-values should help to alleviate this problem. But this is surprising to me - I would need to think about it more to say anything more informative. Robert > Best, > Jenny > >> Regards, >> Amy >> >> ------------------------------------------------------------------- -------- >> >> >> > Hi, >> > >> > It may be worth pointing out that a related question can have a huge >> > impact on normalization of certain glass arrays. One of the standard >> > protocols on the Agilent 44K human arrays causes several hundred >> control >> > spots to light up extremely brightly in the green channel, but remain >> > completely off in the red channel. If you leave these control spots in >> > the data set when you normalize between channels (i.e., within arrays), >> > every known normalization methods breaks -- in the precise sense >> that it >> > will systematically distort the comparison between the red and green >> > channels. If you then model the data incorporating a dye effect, you >> > will think that almost every gene exhibits a dye bias. On the other >> > hand, if you remove these control spots before normalizing between >> > channels, then modeling the dye bias suggest that it rarely exists.... >> > >> > As for the question originally asked here, I would not expect the >> > foreign species probes to break the normalization (unless they somehow >> > light up in one group of samples but not in the other). So, my own bias >> > would be to keep them for background correction and normalization, but >> > remove them before the rest of the analysis. >> > >> > Best, >> > Kevin >> > >> > Jenny Drnevich wrote: >> >> Hi Amy, >> >> >> >> Don't you just love it when you get one response suggesting you do one >> >> thing (remove malarial genes after pre-processing) and another >> response >> >> suggesting the opposite? Although I think in this case Robert was >> >> suggesting you remove them after pre-processing because it was easier >> >> than >> >> trying to modify either the normalization code or the cdf environment, >> >> which is what Jim pointed out to you. I ran into this same problem >> with >> >> having probesets for other species on the soybean array, which is >> why I >> >> used Ariel's code. I think that if you're using a mixed species array >> >> but >> >> only put one of the species on it, then you should remove the other >> >> species' probesets BEFORE doing the normalization because they really >> >> have >> >> no bearing on the transcriptome you're trying to measure. On the other >> >> hand, if you also want to filter your species' probesets based on >> >> presence/absence, minimum cutoff, variation, etc.* , then you should >> >> filter >> >> these genes AFTER doing the pre-processing because these probesets do >> >> contain information about the transcriptome, even if it is just 'not >> >> detectably expressed'. >> >> >> >> Cheers, >> >> Jenny >> >> >> >> * Contrary to Robert, I prefer to filter on presence/absence (using >> >> Affy's >> >> calls) rather than variability :) I don't know if there is any >> >> documentation on which may be "better"... >> >> >> >> ------------------------------------------- >> Amy Mikhail >> Research student >> University of Aberdeen >> Zoology Building >> Tillydrone Avenue >> Aberdeen AB24 2TZ >> Scotland >> Email: a.mikhail at abdn.ac.uk >> Phone: 00-44-1224-272880 (lab) >> 00-44-1224-273256 (office) > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich at uiuc.edu > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD REPLY • link 17.7 years ago rgentleman ★ 5.5k

0

Entering edit mode

Claus Mayer ▴ 340

@claus-mayer-1179

Last seen 9.8 years ago

European Union

Hello, just to throw in my own bits of wisdom: I am clearly on Robert's side in this argument, i.e. normalise with ALL genes, analyse just the species specific ones. When you use GCRMA, you have three main steps in the algorithm: 1)Background correction: As Robert points out, the foreign genes should improve this 2) Quantile Normalisation: Obviously the distribution across all probes will change (mainly it will have more mass on the low-intensity range), but that will be the case for all arrays in the same way, as the foreign genes are not expected to change, so I can't see why these extra genes should be harmful. 3)Summarizing the Probesets: For each gene only the values of all probes correspoding to that gene are used, so this step will not be influenced by additional genes. For the analysis its a different thing. Obviously you want to get rid of genes which are not of interest before p-value adjustment for multiple testing, because you will be more conservative then necessary otherwise. There is also a case for not wanting them to be in the limma analysis I think. The foreign genes will be less variable, as they only show background noise and thus are not affected by biological variability. This will reduce the average variance across all genes and as limma shrinks individual gene variances towards this average the denominators in the moderated t-statistics will be reduced too, thus leading to false positives. I am not sure whether it will really make a big difference practically, but theoretically there is certainly an issue here. Interesting discussion anyway, Claus Jenny Drnevich wrote: > Hi Amy, > > Don't you just love it when you get one response suggesting you do one > thing (remove malarial genes after pre-processing) and another response > suggesting the opposite? Although I think in this case Robert was > suggesting you remove them after pre-processing because it was easier than > trying to modify either the normalization code or the cdf environment, > which is what Jim pointed out to you. I ran into this same problem with > having probesets for other species on the soybean array, which is why I > used Ariel's code. I think that if you're using a mixed species array but > only put one of the species on it, then you should remove the other > species' probesets BEFORE doing the normalization because they really have > no bearing on the transcriptome you're trying to measure. On the other > hand, if you also want to filter your species' probesets based on > presence/absence, minimum cutoff, variation, etc.* , then you should filter > these genes AFTER doing the pre-processing because these probesets do > contain information about the transcriptome, even if it is just 'not > detectably expressed'. > > Cheers, > Jenny > > * Contrary to Robert, I prefer to filter on presence/absence (using Affy's > calls) rather than variability :) I don't know if there is any > documentation on which may be "better"... > > At 05:15 PM 11/29/2006, Robert Gentleman wrote: >> Hi, >> >> Amy Mikhail wrote: >>> Dear Bioconductors, >>> >>> I am annalysing 6 PlasmodiumAnopheles genechips, which have only Anopheles >>> mosquito samples hybridised to them (i.e. they are not infected >>> mosquitoes). The 6 chips include 3 replicates, each consisting of two >>> time points. The design matrix is as follows: >>> >>>> design >>> M15d M43d >>> [1,] 1 0 >>> [2,] 0 1 >>> [3,] 1 0 >>> [4,] 0 1 >>> [5,] 1 0 >>> [6,] 0 1 >>> >>> >>> I have tried both gcRMA (in AffyLMGUI), and RMA, MBEI and MAS5 (in affy). >>> Looking at the (BH) adjusted p values <0.05, this gave me 2, 12, 0 and 0 >>> DE genes, respectively... much less than I was expecting. >>> >>> As this affy chip contains probesets for both mosquito and malaria >>> parasite genes, I am wondering: >>> >>> (a) if it is better to remove all the parasite probesets before my >> analysis; >> >> Yes, if you don't intend to use them, and they are not relevant to >> your analysis. There is no point in doing p-value corrections for tests >> you know are not interesting/relevant a priori. >> >>> (b) if so at what stage I should do this (before or after normalisation >>> and background correction, or does it matter?) >> After both and prior to analysis - otherwise you are likely to need to >> do some serious tweaking of the normalization code. >> >>> (c) how would I filter out these probesets using genefilter (all the >>> parasite affy IDs begin with Pf. - could I use this prefix in the affy IDs >>> to filter out the probesets, and if so how?) >> you don't need genefilter at all, this is a subseting problem. >> If you had an ExpressionSet you would do something like: >> >> parasites = grep("^Pf", featureNames(myExpressionSet)) >> >> mySubset = myExpressionSet[!parasites,] >> >>> Secondly, I did not add any of the polyA controls to my samples. I would >>> like to know: >>> >>> (d) Do any of the bg correct / normalisation methods I tried utilise >>> affymetrix control probesets, and if so, how? >> I doubt it. >> >>> (e) Should I also filter out the control sets - again, if so at what stage >>> in the analysis and what would be an appropriate code to use? >>> >> same place as you filter the parasite genes and pretty much in the >> same way. They are likely to start with AFFX. >> >>> I did try the code for non-specific filtering (on my RMA dataset) from pg. >>> 232 of the bioconductor monograph, but the reduction in the number of >>> probesets was quite drastic; >>> >>>> f1 <- pOverA(0.25, log2(100)) >>>> f2 <- function(x) (IQR(x) > 0.5) >> that is a typo in the text - you probably want to filter out those >> with IQR below the median, not for some fixed value. >> >>>> ff <- filterfun(f1, f2) >>>> selected <- genefilter(Baseage.transformed, ff) >>>> sum(selected) >>> [1] 404 ###(The origninal no. of probesets is 22,726)### >>>> Baseage.sub <- Baseage.transformed[selected, ] >>> Also, I understood from the monograph that "100" was to filter out >>> fluorescence intensities less than this, but I am not clear if this is >>> from raw intensities or log2 values? >> raw - 100 on the log2 scale is larger than can be represented in the >> image file formats used. And don't do that - it is not a good idea - >> filter on variability. >> >> >>> All the parasite probesets have raw intensities <35 .... so could I apply >>> this as a simple filter, and would this have to be on raw (rather than >>> normalised data)? >> >> Best wishes >> Robert >> >>> Appologies for the long posting... >>> >>> Looking forward to any replies, >>> Regards, >>> Amy >>> >>>> sessionInfo() >>> R version 2.4.0 (2006-10-03) >>> i386-pc-mingw32 >>> >>> locale: >>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >>> States.1252;LC_MONETARY=English_United >>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >>> >>> attached base packages: >>> [1] "tcltk" "splines" "tools" "methods" "stats" >>> "graphics" "grDevices" "utils" "datasets" "base" >>> >>> other attached packages: >>> plasmodiumanophelescdf tkWidgets DynDoc >>> widgetTools agahomology >>> "1.14.0" "1.12.0" "1.12.0" >>> "1.10.0" "1.14.2" >>> affyPLM gcrma matchprobes >>> affydata annaffy >>> "1.10.0" "2.6.0" "1.6.0" >>> "1.10.0" "1.6.0" >>> KEGG GO limma >>> geneplotter annotate >>> "1.14.0" "1.14.0" "2.9.1" >>> "1.12.0" "1.12.0" >>> affy affyio genefilter >>> survival Biobase >>> "1.12.0" "1.2.0" "1.12.0" >>> "2.29" "1.12.0" >>> >>> >>> ------------------------------------------- >>> Amy Mikhail >>> Research student >>> University of Aberdeen >>> Zoology Building >>> Tillydrone Avenue >>> Aberdeen AB24 2TZ >>> Scotland >>> Email: a.mikhail at abdn.ac.uk >>> Phone: 00-44-1224-272880 (lab) >>> 00-44-1224-273256 (office) >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> -- >> Robert Gentleman, PhD >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M2-B876 >> PO Box 19024 >> Seattle, Washington 98109-1024 >> 206-667-7700 >> rgentlem at fhcrc.org >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich at uiuc.edu > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > -- ********************************************************************** ************* Dr Claus-D. Mayer | http://www.bioss.ac.uk Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk Rowett Research Institute | Telephone: +44 (0) 1224 716652 Aberdeen AB21 9SB, Scotland, UK. | Fax: +44 (0) 1224 715349

ADD COMMENT • link 17.7 years ago Claus Mayer ▴ 340

0

Entering edit mode

Hi Claus, Claus Mayer wrote: > Hello, > > just to throw in my own bits of wisdom: I am clearly on Robert's side in > this argument, i.e. normalise with ALL genes, analyse just the species > specific ones. When you use GCRMA, you have three main steps in the > algorithm: > > 1)Background correction: As Robert points out, the foreign genes should > improve this > > 2) Quantile Normalisation: Obviously the distribution across all probes > will change (mainly it will have more mass on the low-intensity range), > but that will be the case for all arrays in the same way, as the foreign > genes are not expected to change, so I can't see why these extra genes > should be harmful. This is the only point where Robert and I (and you, for that matter) don't necessarily agree. I agree that the distribution will have greater mass in the low-intensity range, and I also agree that the expression of the foreign genes won't change (since their transcript won't be hybed). However, just because the transcript isn't hybed to the chip doesn't mean that the intensity values of the foreign probes won't vary (possibly widely - without data in hand, we can't know). Rafa has shown that hybing yeast DNA to a human chip will result in some probes lighting up (but AFAIR, he didn't replicate so we don't know the variability of the spurious signal). Throwing a bunch of possibly noisy data into the mix could easily trash any signal you might have for low-expressing genes. Affy data are noisy enough at the low end that I am not completely comfortable with an assumption that the probe intensity values for the foreign genes will be essentially static or at least well behaved. Best, Jim > > 3)Summarizing the Probesets: For each gene only the values of all probes > correspoding to that gene are used, so this step will not be influenced > by additional genes. > > For the analysis its a different thing. Obviously you want to get rid of > genes which are not of interest before p-value adjustment for multiple > testing, because you will be more conservative then necessary otherwise. > There is also a case for not wanting them to be in the limma analysis I > think. The foreign genes will be less variable, as they only show > background noise and thus are not affected by biological variability. > This will reduce the average variance across all genes and as limma > shrinks individual gene variances towards this average the denominators > in the moderated t-statistics will be reduced too, thus leading to false > positives. I am not sure whether it will really make a big difference > practically, but theoretically there is certainly an issue here. > > Interesting discussion anyway, > > Claus > > Jenny Drnevich wrote: > >>Hi Amy, >> >>Don't you just love it when you get one response suggesting you do one >>thing (remove malarial genes after pre-processing) and another response >>suggesting the opposite? Although I think in this case Robert was >>suggesting you remove them after pre-processing because it was easier than >>trying to modify either the normalization code or the cdf environment, >>which is what Jim pointed out to you. I ran into this same problem with >>having probesets for other species on the soybean array, which is why I >>used Ariel's code. I think that if you're using a mixed species array but >>only put one of the species on it, then you should remove the other >>species' probesets BEFORE doing the normalization because they really have >>no bearing on the transcriptome you're trying to measure. On the other >>hand, if you also want to filter your species' probesets based on >>presence/absence, minimum cutoff, variation, etc.* , then you should filter >>these genes AFTER doing the pre-processing because these probesets do >>contain information about the transcriptome, even if it is just 'not >>detectably expressed'. >> >>Cheers, >>Jenny >> >>* Contrary to Robert, I prefer to filter on presence/absence (using Affy's >>calls) rather than variability :) I don't know if there is any >>documentation on which may be "better"... >> >>At 05:15 PM 11/29/2006, Robert Gentleman wrote: >> >>>Hi, >>> >>>Amy Mikhail wrote: >>> >>>>Dear Bioconductors, >>>> >>>>I am annalysing 6 PlasmodiumAnopheles genechips, which have only Anopheles >>>>mosquito samples hybridised to them (i.e. they are not infected >>>>mosquitoes). The 6 chips include 3 replicates, each consisting of two >>>>time points. The design matrix is as follows: >>>> >>>> >>>>>design >>>> >>>> M15d M43d >>>>[1,] 1 0 >>>>[2,] 0 1 >>>>[3,] 1 0 >>>>[4,] 0 1 >>>>[5,] 1 0 >>>>[6,] 0 1 >>>> >>>> >>>>I have tried both gcRMA (in AffyLMGUI), and RMA, MBEI and MAS5 (in affy). >>>>Looking at the (BH) adjusted p values <0.05, this gave me 2, 12, 0 and 0 >>>>DE genes, respectively... much less than I was expecting. >>>> >>>>As this affy chip contains probesets for both mosquito and malaria >>>>parasite genes, I am wondering: >>>> >>>>(a) if it is better to remove all the parasite probesets before my >>> >>>analysis; >>> >>> Yes, if you don't intend to use them, and they are not relevant to >>>your analysis. There is no point in doing p-value corrections for tests >>>you know are not interesting/relevant a priori. >>> >>> >>>>(b) if so at what stage I should do this (before or after normalisation >>>>and background correction, or does it matter?) >>> >>> After both and prior to analysis - otherwise you are likely to need to >>>do some serious tweaking of the normalization code. >>> >>> >>>>(c) how would I filter out these probesets using genefilter (all the >>>>parasite affy IDs begin with Pf. - could I use this prefix in the affy IDs >>>>to filter out the probesets, and if so how?) >>> >>> you don't need genefilter at all, this is a subseting problem. >>> If you had an ExpressionSet you would do something like: >>> >>> parasites = grep("^Pf", featureNames(myExpressionSet)) >>> >>> mySubset = myExpressionSet[!parasites,] >>> >>> >>>>Secondly, I did not add any of the polyA controls to my samples. I would >>>>like to know: >>>> >>>>(d) Do any of the bg correct / normalisation methods I tried utilise >>>>affymetrix control probesets, and if so, how? >>> >>> I doubt it. >>> >>> >>>>(e) Should I also filter out the control sets - again, if so at what stage >>>>in the analysis and what would be an appropriate code to use? >>>> >>> >>> same place as you filter the parasite genes and pretty much in the >>>same way. They are likely to start with AFFX. >>> >>> >>>>I did try the code for non-specific filtering (on my RMA dataset) from pg. >>>>232 of the bioconductor monograph, but the reduction in the number of >>>>probesets was quite drastic; >>>> >>>> >>>>>f1 <- pOverA(0.25, log2(100)) >>>>>f2 <- function(x) (IQR(x) > 0.5) >>> >>> that is a typo in the text - you probably want to filter out those >>>with IQR below the median, not for some fixed value. >>> >>> >>>>>ff <- filterfun(f1, f2) >>>>>selected <- genefilter(Baseage.transformed, ff) >>>>>sum(selected) >>>> >>>>[1] 404 ###(The origninal no. of probesets is 22,726)### >>>> >>>>>Baseage.sub <- Baseage.transformed[selected, ] >>>> >>>>Also, I understood from the monograph that "100" was to filter out >>>>fluorescence intensities less than this, but I am not clear if this is >>>>from raw intensities or log2 values? >>> >>> raw - 100 on the log2 scale is larger than can be represented in the >>>image file formats used. And don't do that - it is not a good idea - >>>filter on variability. >>> >>> >>> >>>>All the parasite probesets have raw intensities <35 .... so could I apply >>>>this as a simple filter, and would this have to be on raw (rather than >>>>normalised data)? >>> >>> Best wishes >>> Robert >>> >>> >>>>Appologies for the long posting... >>>> >>>>Looking forward to any replies, >>>>Regards, >>>>Amy >>>> >>>> >>>>>sessionInfo() >>>> >>>>R version 2.4.0 (2006-10-03) >>>>i386-pc-mingw32 >>>> >>>>locale: >>>>LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >>>>States.1252;LC_MONETARY=English_United >>>>States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >>>> >>>>attached base packages: >>>> [1] "tcltk" "splines" "tools" "methods" "stats" >>>>"graphics" "grDevices" "utils" "datasets" "base" >>>> >>>>other attached packages: >>>>plasmodiumanophelescdf tkWidgets DynDoc >>>> widgetTools agahomology >>>> "1.14.0" "1.12.0" "1.12.0" >>>> "1.10.0" "1.14.2" >>>> affyPLM gcrma matchprobes >>>> affydata annaffy >>>> "1.10.0" "2.6.0" "1.6.0" >>>> "1.10.0" "1.6.0" >>>> KEGG GO limma >>>> geneplotter annotate >>>> "1.14.0" "1.14.0" "2.9.1" >>>> "1.12.0" "1.12.0" >>>> affy affyio genefilter >>>> survival Biobase >>>> "1.12.0" "1.2.0" "1.12.0" >>>> "2.29" "1.12.0" >>>> >>>> >>>>------------------------------------------- >>>>Amy Mikhail >>>>Research student >>>>University of Aberdeen >>>>Zoology Building >>>>Tillydrone Avenue >>>>Aberdeen AB24 2TZ >>>>Scotland >>>>Email: a.mikhail at abdn.ac.uk >>>>Phone: 00-44-1224-272880 (lab) >>>> 00-44-1224-273256 (office) >>>> >>>>_______________________________________________ >>>>Bioconductor mailing list >>>>Bioconductor at stat.math.ethz.ch >>>>https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>Search the archives: >>> >>>http://news.gmane.org/gmane.science.biology.informatics.conductor >>>-- >>>Robert Gentleman, PhD >>>Program in Computational Biology >>>Division of Public Health Sciences >>>Fred Hutchinson Cancer Research Center >>>1100 Fairview Ave. N, M2-B876 >>>PO Box 19024 >>>Seattle, Washington 98109-1024 >>>206-667-7700 >>>rgentlem at fhcrc.org >>> >>>_______________________________________________ >>>Bioconductor mailing list >>>Bioconductor at stat.math.ethz.ch >>>https://stat.ethz.ch/mailman/listinfo/bioconductor >>>Search the archives: >>>http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>Jenny Drnevich, Ph.D. >> >>Functional Genomics Bioinformatics Specialist >>W.M. Keck Center for Comparative and Functional Genomics >>Roy J. Carver Biotechnology Center >>University of Illinois, Urbana-Champaign >> >>330 ERML >>1201 W. Gregory Dr. >>Urbana, IL 61801 >>USA >> >>ph: 217-244-7355 >>fax: 217-265-5066 >>e-mail: drnevich at uiuc.edu >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> >> > > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD REPLY • link 17.7 years ago James W. MacDonald 66k

0

Entering edit mode

Lourdusamy A Anbarasu ▴ 30

@lourdusamy-a-anbarasu-1951

Last seen 9.9 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20061204/ acfa167d/attachment.pl

ADD COMMENT • link 17.7 years ago Lourdusamy A Anbarasu ▴ 30

0

Entering edit mode

Amy Mikhail ▴ 460

@amy-mikhail-1317

Last seen 9.9 years ago

Hi all, Sorry to bring this interesting discussion back to the mundane again for a minute, but when trying to create the "parasites" character vector from my affybatch object, removing the parasite probe sets with Robert's suggested code is giving me an error: > parasites <- grep("^Pf", geneNames(Baseage.rawdata), value=TRUE) > parasites [2965] "Pf.4.138.0_CDS_at" "Pf.4.139.0_CDS_at" [2967] "Pf.4.14.0_CDS_at" "Pf.4.140.0_CDS_at" [2969] "Pf.4.142.0_CDS_at" "Pf.4.144.0_CDS_at" [2971] "Pf.4.146.0_CDS_at" "Pf.4.147.0_CDS_at" [2973] "Pf.4.148.0_CDS_at" "Pf.4.149.0_CDS_at" ###etc### > Mossie.rawsub = Baseage.rawdata[-parasites,] Error in -parasites : invalid argument to unary operator In addition: Warning message: The use of abatch[i,] and abatch[i] is decrepit. Please us abatch[,i] instead. in: Baseage.rawdata[-parasites, ] If I try as the warning message suggests, I get another error: > Mossie.rawsub <- Baseage.rawdata[-,parasites] Error: syntax error in "Mossie.rawsub <- Baseage.rawdata[-," Or variations on a theme... > Mossie.rawsub <- Baseage.rawdata[,-parasites] Error in -parasites : invalid argument to unary operator I could not find the first error in the bioconductor archives. Any ideas? Many thanks, Amy ------------------------------------------- Amy Mikhail Research student University of Aberdeen Zoology Building Tillydrone Avenue Aberdeen AB24 2TZ Scotland Email: a.mikhail at abdn.ac.uk Phone: 00-44-1224-272880 (lab) 00-44-1224-273256 (office)

ADD COMMENT • link 17.7 years ago Amy Mikhail ▴ 460

0

Entering edit mode

Hi Amy, You've got two different problems here. 1) You can't subset an AffyBatch with probeset names because it contains individual probe values. Actually, I don't think you can subset an AffyBatch row-wise at all, period. 2) You can subset an exprSet or ExpressionSet object row-wise using probeset names, but not like you're trying to do. You can't use "-" with a character vector to remove those probesets, but you can use a character vector to keep probesets. There are several ways to do this - here's one: new.eset <- eset[setdiff(geneNames(Baseage.rawdata), parasites) , ] Cheers, Jenny At 01:00 PM 12/4/2006, Amy Mikhail wrote: >Hi all, > >Sorry to bring this interesting discussion back to the mundane again for a >minute, but when trying to create the "parasites" character vector from my >affybatch object, removing the parasite probe sets with Robert's suggested >code is giving me an error: > > > > parasites <- grep("^Pf", geneNames(Baseage.rawdata), value=TRUE) > > parasites >[2965] "Pf.4.138.0_CDS_at" "Pf.4.139.0_CDS_at" >[2967] "Pf.4.14.0_CDS_at" "Pf.4.140.0_CDS_at" >[2969] "Pf.4.142.0_CDS_at" "Pf.4.144.0_CDS_at" >[2971] "Pf.4.146.0_CDS_at" "Pf.4.147.0_CDS_at" >[2973] "Pf.4.148.0_CDS_at" "Pf.4.149.0_CDS_at" ###etc### > > > Mossie.rawsub = Baseage.rawdata[-parasites,] >Error in -parasites : invalid argument to unary operator >In addition: Warning message: >The use of abatch[i,] and abatch[i] is decrepit. Please us abatch[,i] >instead. > in: Baseage.rawdata[-parasites, ] > > >If I try as the warning message suggests, I get another error: > > > Mossie.rawsub <- Baseage.rawdata[-,parasites] >Error: syntax error in "Mossie.rawsub <- Baseage.rawdata[-," > >Or variations on a theme... > > > Mossie.rawsub <- Baseage.rawdata[,-parasites] >Error in -parasites : invalid argument to unary operator > > >I could not find the first error in the bioconductor archives. > >Any ideas? > >Many thanks, >Amy > > > >------------------------------------------- >Amy Mikhail >Research student >University of Aberdeen >Zoology Building >Tillydrone Avenue >Aberdeen AB24 2TZ >Scotland >Email: a.mikhail at abdn.ac.uk >Phone: 00-44-1224-272880 (lab) > 00-44-1224-273256 (office) Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD REPLY • link 17.7 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

Oops - forgot to mention in my last post that the only way I've been able to remove probesets from AffyBatch objects was to use Ariel's 'RemoveProbes' function that was referenced previously in this thread. Jenny At 01:21 PM 12/4/2006, Jenny Drnevich wrote: >Hi Amy, > >You've got two different problems here. 1) You can't subset an AffyBatch >with probeset names because it contains individual probe values. Actually, >I don't think you can subset an AffyBatch row-wise at all, period. > >2) You can subset an exprSet or ExpressionSet object row-wise using >probeset names, but not like you're trying to do. You can't use "-" with a >character vector to remove those probesets, but you can use a character >vector to keep probesets. There are several ways to do this - here's one: > >new.eset <- eset[setdiff(geneNames(Baseage.rawdata), parasites) , ] > >Cheers, >Jenny > >At 01:00 PM 12/4/2006, Amy Mikhail wrote: > >Hi all, > > > >Sorry to bring this interesting discussion back to the mundane again for a > >minute, but when trying to create the "parasites" character vector from my > >affybatch object, removing the parasite probe sets with Robert's suggested > >code is giving me an error: > > > > > > > parasites <- grep("^Pf", geneNames(Baseage.rawdata), value=TRUE) > > > parasites > >[2965] "Pf.4.138.0_CDS_at" "Pf.4.139.0_CDS_at" > >[2967] "Pf.4.14.0_CDS_at" "Pf.4.140.0_CDS_at" > >[2969] "Pf.4.142.0_CDS_at" "Pf.4.144.0_CDS_at" > >[2971] "Pf.4.146.0_CDS_at" "Pf.4.147.0_CDS_at" > >[2973] "Pf.4.148.0_CDS_at" "Pf.4.149.0_CDS_at" ###etc### > > > > > Mossie.rawsub = Baseage.rawdata[-parasites,] > >Error in -parasites : invalid argument to unary operator > >In addition: Warning message: > >The use of abatch[i,] and abatch[i] is decrepit. Please us abatch[,i] > >instead. > > in: Baseage.rawdata[-parasites, ] > > > > > >If I try as the warning message suggests, I get another error: > > > > > Mossie.rawsub <- Baseage.rawdata[-,parasites] > >Error: syntax error in "Mossie.rawsub <- Baseage.rawdata[-," > > > >Or variations on a theme... > > > > > Mossie.rawsub <- Baseage.rawdata[,-parasites] > >Error in -parasites : invalid argument to unary operator > > > > > >I could not find the first error in the bioconductor archives. > > > >Any ideas? > > > >Many thanks, > >Amy > > > > > > > >------------------------------------------- > >Amy Mikhail > >Research student > >University of Aberdeen > >Zoology Building > >Tillydrone Avenue > >Aberdeen AB24 2TZ > >Scotland > >Email: a.mikhail at abdn.ac.uk > >Phone: 00-44-1224-272880 (lab) > > 00-44-1224-273256 (office) > >Jenny Drnevich, Ph.D. > >Functional Genomics Bioinformatics Specialist >W.M. Keck Center for Comparative and Functional Genomics >Roy J. Carver Biotechnology Center >University of Illinois, Urbana-Champaign > >330 ERML >1201 W. Gregory Dr. >Urbana, IL 61801 >USA > >ph: 217-244-7355 >fax: 217-265-5066 >e-mail: drnevich at uiuc.edu > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 17.7 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

Hi Jenny, Thanks - just realised my mistake; I forgot that I was just trying to create the character vector and the next bit was already in your code... Cheers, Amy ---------------------------------------------------------------------- ----- Amy > Oops - forgot to mention in my last post that the only way I've been able > to remove probesets from AffyBatch objects was to use Ariel's > 'RemoveProbes' function that was referenced previously in this thread. > > Jenny > > At 01:21 PM 12/4/2006, Jenny Drnevich wrote: >>Hi Amy, >> >>You've got two different problems here. 1) You can't subset an AffyBatch >>with probeset names because it contains individual probe values. >> Actually, >>I don't think you can subset an AffyBatch row-wise at all, period. >> >>2) You can subset an exprSet or ExpressionSet object row-wise using >>probeset names, but not like you're trying to do. You can't use "-" with >> a >>character vector to remove those probesets, but you can use a character >>vector to keep probesets. There are several ways to do this - here's one: >> >>new.eset <- eset[setdiff(geneNames(Baseage.rawdata), parasites) , ] >> >>Cheers, >>Jenny >> >>At 01:00 PM 12/4/2006, Amy Mikhail wrote: >> >Hi all, >> > >> >Sorry to bring this interesting discussion back to the mundane again >> for a >> >minute, but when trying to create the "parasites" character vector from >> my >> >affybatch object, removing the parasite probe sets with Robert's >> suggested >> >code is giving me an error: >> > >> > >> > > parasites <- grep("^Pf", geneNames(Baseage.rawdata), value=TRUE) >> > > parasites >> >[2965] "Pf.4.138.0_CDS_at" "Pf.4.139.0_CDS_at" >> >[2967] "Pf.4.14.0_CDS_at" "Pf.4.140.0_CDS_at" >> >[2969] "Pf.4.142.0_CDS_at" "Pf.4.144.0_CDS_at" >> >[2971] "Pf.4.146.0_CDS_at" "Pf.4.147.0_CDS_at" >> >[2973] "Pf.4.148.0_CDS_at" "Pf.4.149.0_CDS_at" ###etc### >> > >> > > Mossie.rawsub = Baseage.rawdata[-parasites,] >> >Error in -parasites : invalid argument to unary operator >> >In addition: Warning message: >> >The use of abatch[i,] and abatch[i] is decrepit. Please us abatch[,i] >> >instead. >> > in: Baseage.rawdata[-parasites, ] >> > >> > >> >If I try as the warning message suggests, I get another error: >> > >> > > Mossie.rawsub <- Baseage.rawdata[-,parasites] >> >Error: syntax error in "Mossie.rawsub <- Baseage.rawdata[-," >> > >> >Or variations on a theme... >> > >> > > Mossie.rawsub <- Baseage.rawdata[,-parasites] >> >Error in -parasites : invalid argument to unary operator >> > >> > >> >I could not find the first error in the bioconductor archives. >> > >> >Any ideas? >> > >> >Many thanks, >> >Amy >> > >> > >> > >> >------------------------------------------- >> >Amy Mikhail >> >Research student >> >University of Aberdeen >> >Zoology Building >> >Tillydrone Avenue >> >Aberdeen AB24 2TZ >> >Scotland >> >Email: a.mikhail at abdn.ac.uk >> >Phone: 00-44-1224-272880 (lab) >> > 00-44-1224-273256 (office) >> >>Jenny Drnevich, Ph.D. >> >>Functional Genomics Bioinformatics Specialist >>W.M. Keck Center for Comparative and Functional Genomics >>Roy J. Carver Biotechnology Center >>University of Illinois, Urbana-Champaign >> >>330 ERML >>1201 W. Gregory Dr. >>Urbana, IL 61801 >>USA >> >>ph: 217-244-7355 >>fax: 217-265-5066 >>e-mail: drnevich at uiuc.edu >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor > > ------------------------------------------- Amy Mikhail Research student University of Aberdeen Zoology Building Tillydrone Avenue Aberdeen AB24 2TZ Scotland Email: a.mikhail at abdn.ac.uk Phone: 00-44-1224-272880 (lab) 00-44-1224-273256 (office)

ADD REPLY • link 17.7 years ago Amy Mikhail ▴ 460

0

Entering edit mode

Hi Jenny, Robert, Thanks a million, it has worked ... now I can have a look and see if removing the probesets actually made a difference :) - will let you know... Cheers, Amy > Oops - forgot to mention in my last post that the only way I've been able > to remove probesets from AffyBatch objects was to use Ariel's > 'RemoveProbes' function that was referenced previously in this thread. > > Jenny > > At 01:21 PM 12/4/2006, Jenny Drnevich wrote: >>Hi Amy, >> >>You've got two different problems here. 1) You can't subset an AffyBatch >>with probeset names because it contains individual probe values. >> Actually, >>I don't think you can subset an AffyBatch row-wise at all, period. >> >>2) You can subset an exprSet or ExpressionSet object row-wise using >>probeset names, but not like you're trying to do. You can't use "-" with >> a >>character vector to remove those probesets, but you can use a character >>vector to keep probesets. There are several ways to do this - here's one: >> >>new.eset <- eset[setdiff(geneNames(Baseage.rawdata), parasites) , ] >> >>Cheers, >>Jenny >> >>At 01:00 PM 12/4/2006, Amy Mikhail wrote: >> >Hi all, >> > >> >Sorry to bring this interesting discussion back to the mundane again >> for a >> >minute, but when trying to create the "parasites" character vector from >> my >> >affybatch object, removing the parasite probe sets with Robert's >> suggested >> >code is giving me an error: >> > >> > >> > > parasites <- grep("^Pf", geneNames(Baseage.rawdata), value=TRUE) >> > > parasites >> >[2965] "Pf.4.138.0_CDS_at" "Pf.4.139.0_CDS_at" >> >[2967] "Pf.4.14.0_CDS_at" "Pf.4.140.0_CDS_at" >> >[2969] "Pf.4.142.0_CDS_at" "Pf.4.144.0_CDS_at" >> >[2971] "Pf.4.146.0_CDS_at" "Pf.4.147.0_CDS_at" >> >[2973] "Pf.4.148.0_CDS_at" "Pf.4.149.0_CDS_at" ###etc### >> > >> > > Mossie.rawsub = Baseage.rawdata[-parasites,] >> >Error in -parasites : invalid argument to unary operator >> >In addition: Warning message: >> >The use of abatch[i,] and abatch[i] is decrepit. Please us abatch[,i] >> >instead. >> > in: Baseage.rawdata[-parasites, ] >> > >> > >> >If I try as the warning message suggests, I get another error: >> > >> > > Mossie.rawsub <- Baseage.rawdata[-,parasites] >> >Error: syntax error in "Mossie.rawsub <- Baseage.rawdata[-," >> > >> >Or variations on a theme... >> > >> > > Mossie.rawsub <- Baseage.rawdata[,-parasites] >> >Error in -parasites : invalid argument to unary operator >> > >> > >> >I could not find the first error in the bioconductor archives. >> > >> >Any ideas? >> > >> >Many thanks, >> >Amy >> > >> > >> > >> >------------------------------------------- >> >Amy Mikhail >> >Research student >> >University of Aberdeen >> >Zoology Building >> >Tillydrone Avenue >> >Aberdeen AB24 2TZ >> >Scotland >> >Email: a.mikhail at abdn.ac.uk >> >Phone: 00-44-1224-272880 (lab) >> > 00-44-1224-273256 (office) >> >>Jenny Drnevich, Ph.D. >> >>Functional Genomics Bioinformatics Specialist >>W.M. Keck Center for Comparative and Functional Genomics >>Roy J. Carver Biotechnology Center >>University of Illinois, Urbana-Champaign >> >>330 ERML >>1201 W. Gregory Dr. >>Urbana, IL 61801 >>USA >> >>ph: 217-244-7355 >>fax: 217-265-5066 >>e-mail: drnevich at uiuc.edu >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor > > ------------------------------------------- Amy Mikhail Research student University of Aberdeen Zoology Building Tillydrone Avenue Aberdeen AB24 2TZ Scotland Email: a.mikhail at abdn.ac.uk Phone: 00-44-1224-272880 (lab) 00-44-1224-273256 (office)

ADD REPLY • link 17.7 years ago Amy Mikhail ▴ 460

Login before adding your answer.