Question: genefilter kOverA filter by range
0
gravatar for Kristina M Fontanez
5.6 years ago by
Kristina M Fontanez220 wrote:
Dear Bioconductors, I am trying to use the genefilter package to filter a set of Log2fold changes so that I can keep those taxa with Log2fold changes > 3. However, the data itself consists of both positive and negative values, as is the case with log 2 fold comparisons. Example data: OTU Table: [5 taxa and 3 samples] taxa are rows LvS DvS LvD OTU1206 10.3 1.3 9.0 OTU1203 8.3 2.7 5.5 OTU1297 6.8 -0.9 7.7 OTU1338 6.2 -1.4 7.7 OTU1144 7.4 1.6 5.8 I want to create a filter so that the OTUs with Log2 fold changes > magnitude 3 in either the positive or negative direction are kept. However, the documentation for kOverA in the genefilter package implies that you can only input “values you want to exceed”. As the code below is currently written, I am only keeping taxa with a log2 fold change > +3 in any one sample. However, taxa with a log2 fold change of -7 in a particular sample would be left out. I tested whether I was missing any OTUs by looking for the minimum value in the original OTU table (comp) and in the filtered OTU table (LFC3). As you can see the minimum -7.4 log2 fold change value in comp does not exist in the LFC3 object so it was excluded by my flist2 filter. Is there a similar function like kOverA that I can use to get large magnitude changes in both the positive and negative directions? I tried the code: > comp phyloseq-class experiment-level object otu_table() OTU Table: [ 2151 taxa and 3 samples ] tax_table() Taxonomy Table: [ 2151 taxa by 6 taxonomic ranks ] > flist2<-filterfun(kOverA(1,3.0)) > LFC3=filter_taxa(comp,flist2,TRUE) > LFC3 phyloseq-class experiment-level object otu_table() OTU Table: [ 164 taxa and 3 samples ] tax_table() Taxonomy Table: [ 164 taxa by 6 taxonomic ranks ] > min(otu_table(comp)) [1] -7.4 > min(otu_table(LFC3)) [1] -5.5 Thank you, Kristina > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] genefilter_1.44.0 ggplot2_0.9.3.1 scales_0.2.3 phyloseq_1.7.12 loaded via a namespace (and not attached): [1] ade4_1.6-2 annotate_1.40.0 AnnotationDbi_1.24.0 [4] ape_3.0-11 Biobase_2.22.0 BiocGenerics_0.8.0 [7] biom_0.3.11 Biostrings_2.30.1 cluster_1.14.4 [10] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 [13] DESeq2_1.2.8 dichromat_2.0-0 digest_0.6.4 [16] foreach_1.4.1 GenomicRanges_1.14.4 grid_3.0.2 [19] gtable_0.1.2 igraph_0.6.6 IRanges_1.20.6 [22] iterators_1.0.6 labeling_0.2 lattice_0.20-24 [25] locfit_1.5-9.1 MASS_7.3-29 Matrix_1.1-1.1 [28] multtest_2.18.0 munsell_0.4.2 nlme_3.1-113 [31] parallel_3.0.2 permute_0.8-0 plyr_1.8 [34] proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.10.6 [37] RcppArmadillo_0.4.000 reshape2_1.2.2 RJSONIO_1.0-3 [40] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 [43] stringr_0.6.2 survival_2.37-4 tools_3.0.2 [46] vegan_2.0-10 XML_3.95-0.2 xtable_1.7-1 [49] XVector_0.2.0 ------------------------------------------------------------------ Kristina Fontanez, Postdoctoral Fellow fontanez@mit.edu<mailto:fontanez@mit.edu> Massachusetts Institute of Technology Department of Civil and Environmental Engineering 48-120E 15 Vassar Street Cambridge, MA 02139 [[alternative HTML version deleted]]
genefilter • 952 views
ADD COMMENTlink modified 5.6 years ago by James W. MacDonald50k • written 5.6 years ago by Kristina M Fontanez220
Answer: genefilter kOverA filter by range
0
gravatar for James W. MacDonald
5.6 years ago by
United States
James W. MacDonald50k wrote:
Hi Kristina, On 1/23/2014 4:25 PM, Kristina M Fontanez wrote: > Dear Bioconductors, > > I am trying to use the genefilter package to filter a set of Log2fold changes so that I can keep those taxa with Log2fold changes > 3. However, the data itself consists of both positive and negative values, as is the case with log 2 fold comparisons. You don't need the genefilter package to do this, and in fact genefilter is intended for a completely different task. Instead you just need to use simple R commands. filt <- rowSums(abs(comp) > 3) > 1 comp[filt,] Best, Jim > > Example data: > OTU Table: [5 taxa and 3 samples] > taxa are rows > LvS DvS LvD > OTU1206 10.3 1.3 9.0 > OTU1203 8.3 2.7 5.5 > OTU1297 6.8 -0.9 7.7 > OTU1338 6.2 -1.4 7.7 > OTU1144 7.4 1.6 5.8 > > I want to create a filter so that the OTUs with Log2 fold changes > magnitude 3 in either the positive or negative direction are kept. However, the documentation for kOverA in the genefilter package implies that you can only input ?values you want to exceed?. As the code below is currently written, I am only keeping taxa with a log2 fold change > +3 in any one sample. However, taxa with a log2 fold change of -7 in a particular sample would be left out. I tested whether I was missing any OTUs by looking for the minimum value in the original OTU table (comp) and in the filtered OTU table (LFC3). As you can see the minimum -7.4 log2 fold change value in comp does not exist in the LFC3 object so it was excluded by my flist2 filter. > > Is there a similar function like kOverA that I can use to get large magnitude changes in both the positive and negative directions? > > I tried the code: >> comp > phyloseq-class experiment-level object > otu_table() OTU Table: [ 2151 taxa and 3 samples ] > tax_table() Taxonomy Table: [ 2151 taxa by 6 taxonomic ranks ] > >> flist2<-filterfun(kOverA(1,3.0)) >> LFC3=filter_taxa(comp,flist2,TRUE) >> LFC3 > phyloseq-class experiment-level object > otu_table() OTU Table: [ 164 taxa and 3 samples ] > tax_table() Taxonomy Table: [ 164 taxa by 6 taxonomic ranks ] >> min(otu_table(comp)) > [1] -7.4 >> min(otu_table(LFC3)) > [1] -5.5 > > Thank you, > Kristina > >> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] genefilter_1.44.0 ggplot2_0.9.3.1 scales_0.2.3 phyloseq_1.7.12 > > loaded via a namespace (and not attached): > [1] ade4_1.6-2 annotate_1.40.0 AnnotationDbi_1.24.0 > [4] ape_3.0-11 Biobase_2.22.0 BiocGenerics_0.8.0 > [7] biom_0.3.11 Biostrings_2.30.1 cluster_1.14.4 > [10] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 > [13] DESeq2_1.2.8 dichromat_2.0-0 digest_0.6.4 > [16] foreach_1.4.1 GenomicRanges_1.14.4 grid_3.0.2 > [19] gtable_0.1.2 igraph_0.6.6 IRanges_1.20.6 > [22] iterators_1.0.6 labeling_0.2 lattice_0.20-24 > [25] locfit_1.5-9.1 MASS_7.3-29 Matrix_1.1-1.1 > [28] multtest_2.18.0 munsell_0.4.2 nlme_3.1-113 > [31] parallel_3.0.2 permute_0.8-0 plyr_1.8 > [34] proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.10.6 > [37] RcppArmadillo_0.4.000 reshape2_1.2.2 RJSONIO_1.0-3 > [40] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 > [43] stringr_0.6.2 survival_2.37-4 tools_3.0.2 > [46] vegan_2.0-10 XML_3.95-0.2 xtable_1.7-1 > [49] XVector_0.2.0 > > ------------------------------------------------------------------ > Kristina Fontanez, Postdoctoral Fellow > fontanez at mit.edu<mailto:fontanez at="" mit.edu=""> > Massachusetts Institute of Technology > Department of Civil and Environmental Engineering > 48-120E > 15 Vassar Street > Cambridge, MA 02139 > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENTlink written 5.6 years ago by James W. MacDonald50k
Hi James, Unfortunately, your proposed solution didn’t work for me. I think it’s because I am working with objects built with the phyloseq package which I am trying to subsequently filter with the genefilter functions. First, a review of the two phyloseq objects from my last post: > comp phyloseq-class experiment-level object otu_table() OTU Table: [ 2151 taxa and 3 samples ] tax_table() Taxonomy Table: [ 2151 taxa by 6 taxonomic ranks ] > LFC3 phyloseq-class experiment-level object otu_table() OTU Table: [ 164 taxa and 3 samples ] tax_table() Taxonomy Table: [ 164 taxa by 6 taxonomic ranks ] Now, your solution: > filtest<-rowSums(abs(otu_table(comp))>3)>1 > filtest[1:5] OTU1206 OTU1203 OTU1297 OTU1338 OTU1144 TRUE TRUE TRUE TRUE TRUE > LFC3true=filter_taxa(comp,filtest,TRUE) > LFC3true phyloseq-class experiment-level object otu_table() OTU Table: [ 66 taxa and 3 samples ] tax_table() Taxonomy Table: [ 66 taxa by 6 taxonomic ranks ] > min(otu_table(comp)) [1] -7.4 > min(otu_table(LFC3true)) [1] -5.5 BUT, as you can see the new LFC3true object is still missing that -7.4 value and now it contains even less taxa than the LFC3 object. If the filter works correctly, I should be getting MORE taxa added to that object. I also tried your solution verbatim but ran into trouble because my phyloseq object can’t be subset in the way you suggested: > newcomp=comp[filtest,] Error in comp[filtest, ] : object of type 'S4' is not subsettable I believe that in order to use the filter_taxa function to subset the phyloseq object, I need a genefilter list object. Pasted below is the information in the phyloseq manual from the filter_taxa object. filter_taxa Filter taxa based on across-sample OTU abundance criteria Description This function is directly analogous to the genefilter function for microarray filtering, but is used for filtering OTUs from phyloseq objects. It applies an arbitrary set of functions — as a function list, for instance, created by filterfun — as across-sample criteria, one OTU at a time. It takes as input a phyloseq object, and returns a logical vector indicating whether or not each OTU passed the criteria. Alternatively, if the "prune" option is set to FALSE, it returns the already-trimmed version of the phyloseq object. Usage Arguments physeq (Required). A phyloseq-class object that you want to trim/filter. flist (Required). A function or list of functions that take a vector of abundance values and return a logical. Some canned useful function types are included in the genefilter-package. prune (Optional). A logical. Default FALSE. If TRUE, then the function returns the pruned phyloseq-class object, rather than the logical vector of taxa that passed the filter. Value A logical vector equal to the number of taxa in physeq. This can be provided directly to prune_taxa as first argument. Alternatively, if prune==TRUE, the pruned phyloseq-class object is returned instead. Thanks, Kristina ------------------------------------------------------------------ Kristina Fontanez, Postdoctoral Fellow fontanez@mit.edu<mailto:fontanez@mit.edu> Massachusetts Institute of Technology Department of Civil and Environmental Engineering 48-120E 15 Vassar Street Cambridge, MA 02139 On Jan 23, 2014, at 4:38 PM, James W. MacDonald <jmacdon@uw.edu<mailto:jmacdon@uw.edu>> wrote: Hi Kristina, On 1/23/2014 4:25 PM, Kristina M Fontanez wrote: Dear Bioconductors, I am trying to use the genefilter package to filter a set of Log2fold changes so that I can keep those taxa with Log2fold changes > 3. However, the data itself consists of both positive and negative values, as is the case with log 2 fold comparisons. You don't need the genefilter package to do this, and in fact genefilter is intended for a completely different task. Instead you just need to use simple R commands. filt <- rowSums(abs(comp) > 3) > 1 comp[filt,] Best, Jim Example data: OTU Table: [5 taxa and 3 samples] taxa are rows LvS DvS LvD OTU1206 10.3 1.3 9.0 OTU1203 8.3 2.7 5.5 OTU1297 6.8 -0.9 7.7 OTU1338 6.2 -1.4 7.7 OTU1144 7.4 1.6 5.8 I want to create a filter so that the OTUs with Log2 fold changes > magnitude 3 in either the positive or negative direction are kept. However, the documentation for kOverA in the genefilter package implies that you can only input “values you want to exceed”. As the code below is currently written, I am only keeping taxa with a log2 fold change > +3 in any one sample. However, taxa with a log2 fold change of -7 in a particular sample would be left out. I tested whether I was missing any OTUs by looking for the minimum value in the original OTU table (comp) and in the filtered OTU table (LFC3). As you can see the minimum -7.4 log2 fold change value in comp does not exist in the LFC3 object so it was excluded by my flist2 filter. Is there a similar function like kOverA that I can use to get large magnitude changes in both the positive and negative directions? I tried the code: comp phyloseq-class experiment-level object otu_table() OTU Table: [ 2151 taxa and 3 samples ] tax_table() Taxonomy Table: [ 2151 taxa by 6 taxonomic ranks ] flist2<-filterfun(kOverA(1,3.0)) LFC3=filter_taxa(comp,flist2,TRUE) LFC3 phyloseq-class experiment-level object otu_table() OTU Table: [ 164 taxa and 3 samples ] tax_table() Taxonomy Table: [ 164 taxa by 6 taxonomic ranks ] min(otu_table(comp)) [1] -7.4 min(otu_table(LFC3)) [1] -5.5 Thank you, Kristina sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] genefilter_1.44.0 ggplot2_0.9.3.1 scales_0.2.3 phyloseq_1.7.12 loaded via a namespace (and not attached): [1] ade4_1.6-2 annotate_1.40.0 AnnotationDbi_1.24.0 [4] ape_3.0-11 Biobase_2.22.0 BiocGenerics_0.8.0 [7] biom_0.3.11 Biostrings_2.30.1 cluster_1.14.4 [10] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 [13] DESeq2_1.2.8 dichromat_2.0-0 digest_0.6.4 [16] foreach_1.4.1 GenomicRanges_1.14.4 grid_3.0.2 [19] gtable_0.1.2 igraph_0.6.6 IRanges_1.20.6 [22] iterators_1.0.6 labeling_0.2 lattice_0.20-24 [25] locfit_1.5-9.1 MASS_7.3-29 Matrix_1.1-1.1 [28] multtest_2.18.0 munsell_0.4.2 nlme_3.1-113 [31] parallel_3.0.2 permute_0.8-0 plyr_1.8 [34] proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.10.6 [37] RcppArmadillo_0.4.000 reshape2_1.2.2 RJSONIO_1.0-3 [40] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 [43] stringr_0.6.2 survival_2.37-4 tools_3.0.2 [46] vegan_2.0-10 XML_3.95-0.2 xtable_1.7-1 [49] XVector_0.2.0 ------------------------------------------------------------------ Kristina Fontanez, Postdoctoral Fellow fontanez@mit.edu<mailto:fontanez@mit.edu><mailto:fontanez@mit.edu> Massachusetts Institute of Technology Department of Civil and Environmental Engineering 48-120E 15 Vassar Street Cambridge, MA 02139 [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]
ADD REPLYlink written 5.6 years ago by Kristina M Fontanez220
Hi Kristina, This part: filtest<-rowSums(abs(otu_table(comp))>3)>1 should actually be filtest<-rowSums(abs(otu_table(comp))>3)>0 returns a logical vector, which you would then use as input to prune_taxa(), not filter_taxa(). I am surprised it worked for you, as internally filter_taxa is doing something like apply(OTU, 1, flist) where flist is supposed to be a function, not a logical vector. Alternatively you could just create a kOverA() version that does what you want: kOverAbsA <- function(k, A, na.rm = TRUE{ function(x) { if(na.rm) x <- x[!is.na(x)] sum(abs(x) > A) >= k } } and then do what you tried before filtered <- filter_taxa(comp, filterfun(kOverAbsA(1, 3)), TRUE) Best, Jim On 1/23/2014 5:15 PM, Kristina M Fontanez wrote: > Hi James, > > Unfortunately, your proposed solution didn?t work for me. I think it?s > because I am working with objects built with the phyloseq package > which I am trying to subsequently filter with the genefilter functions. > > First, a review of the two phyloseq objects from my last post: > > comp > phyloseq-class experiment-level object > otu_table() OTU Table: [ 2151 taxa and 3 samples ] > tax_table() Taxonomy Table: [ 2151 taxa by 6 taxonomic ranks ] > > LFC3 > phyloseq-class experiment-level object > otu_table() OTU Table: [ 164 taxa and 3 samples ] > tax_table() Taxonomy Table: [ 164 taxa by 6 taxonomic ranks ] > > Now, your solution: > > filtest<-rowSums(abs(otu_table(comp))>3)>1 > > filtest[1:5] > OTU1206 OTU1203 OTU1297 OTU1338 OTU1144 > TRUE TRUE TRUE TRUE TRUE > > LFC3true=filter_taxa(comp,filtest,TRUE) > > LFC3true > phyloseq-class experiment-level object > otu_table() OTU Table: [ 66 taxa and 3 samples ] > tax_table() Taxonomy Table: [ 66 taxa by 6 taxonomic ranks ] > > min(otu_table(comp)) > [1] -7.4 > > min(otu_table(LFC3true)) > [1] -5.5 > > BUT, as you can see the new LFC3true object is still missing that -7.4 > value and now it contains even less taxa than the LFC3 object. If the > filter works correctly, I should be getting MORE taxa added to that > object. > > I also tried your solution verbatim but ran into trouble because my > phyloseq object can?t be subset in the way you suggested: > > newcomp=comp[filtest,] > Error in comp[filtest, ] : object of type 'S4' is not subsettable > > I believe that in order to use the filter_taxa function to subset the > phyloseq object, I need a genefilter list object. Pasted below is the > information in the phyloseq manual from the filter_taxa object. > > filter_taxa Filter taxa based on across-sample OTU abundance criteria > > Description > > This function is directly analogous to the genefilter function for > microarray filtering, but is used for filtering OTUs from phyloseq > objects. It applies an arbitrary set of functions ? as a function > list, for instance, created by filterfun ? as across-sample criteria, > one OTU at a time. It takes as input a phyloseq object, and returns a > logical vector indicating whether or not each OTU passed the criteria. > Alternatively, if the "prune" option is set to FALSE, it returns the > already-trimmed version of the phyloseq object. > > Usage > > Arguments > > physeq (Required). A phyloseq-class object that you want to trim/filter. > > flist (Required). A function or list of functions that take a vector > of abundance values and return a logical. Some canned useful function > types are included in the genefilter-package. > > prune (Optional). A logical. Default FALSE. If TRUE, then the function > returns the pruned phyloseq-class object, rather than the logical > vector of taxa that passed the filter. > > Value > > A logical vector equal to the number of taxa in physeq. This can be > provided directly to prune_taxa as first argument. Alternatively, if > prune==TRUE, the pruned phyloseq-class object is returned instead. > > > Thanks, > Kristina > ------------------------------------------------------------------ > Kristina Fontanez, Postdoctoral Fellow > fontanez at mit.edu <mailto:fontanez at="" mit.edu=""> > Massachusetts Institute of Technology > Department of Civil and Environmental Engineering > 48-120E > 15 Vassar Street > Cambridge, MA 02139 > > > > On Jan 23, 2014, at 4:38 PM, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > >> Hi Kristina, >> >> On 1/23/2014 4:25 PM, Kristina M Fontanez wrote: >>> Dear Bioconductors, >>> >>> I am trying to use the genefilter package to filter a set of >>> Log2fold changes so that I can keep those taxa with Log2fold changes >>> > 3. However, the data itself consists of both positive and negative >>> values, as is the case with log 2 fold comparisons. >> >> You don't need the genefilter package to do this, and in fact >> genefilter is intended for a completely different task. >> >> Instead you just need to use simple R commands. >> >> filt <- rowSums(abs(comp) > 3) > 1 >> comp[filt,] >> >> Best, >> >> Jim >> >> >> >>> >>> Example data: >>> OTU Table: [5 taxa and 3 samples] >>> taxa are rows >>> LvS DvS LvD >>> OTU1206 10.3 1.3 9.0 >>> OTU1203 8.3 2.7 5.5 >>> OTU1297 6.8 -0.9 7.7 >>> OTU1338 6.2 -1.4 7.7 >>> OTU1144 7.4 1.6 5.8 >>> >>> I want to create a filter so that the OTUs with Log2 fold changes > >>> magnitude 3 in either the positive or negative direction are kept. >>> However, the documentation for kOverA in the genefilter package >>> implies that you can only input ?values you want to exceed?. As the >>> code below is currently written, I am only keeping taxa with a log2 >>> fold change > +3 in any one sample. However, taxa with a log2 fold >>> change of -7 in a particular sample would be left out. I tested >>> whether I was missing any OTUs by looking for the minimum value in >>> the original OTU table (comp) and in the filtered OTU table (LFC3). >>> As you can see the minimum -7.4 log2 fold change value in comp does >>> not exist in the LFC3 object so it was excluded by my flist2 filter. >>> >>> Is there a similar function like kOverA that I can use to get large >>> magnitude changes in both the positive and negative directions? >>> >>> I tried the code: >>>> comp >>> phyloseq-class experiment-level object >>> otu_table() OTU Table: [ 2151 taxa and 3 samples ] >>> tax_table() Taxonomy Table: [ 2151 taxa by 6 taxonomic ranks ] >>> >>>> flist2<-filterfun(kOverA(1,3.0)) >>>> LFC3=filter_taxa(comp,flist2,TRUE) >>>> LFC3 >>> phyloseq-class experiment-level object >>> otu_table() OTU Table: [ 164 taxa and 3 samples ] >>> tax_table() Taxonomy Table: [ 164 taxa by 6 taxonomic ranks ] >>>> min(otu_table(comp)) >>> [1] -7.4 >>>> min(otu_table(LFC3)) >>> [1] -5.5 >>> >>> Thank you, >>> Kristina >>> >>>> sessionInfo() >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-apple-darwin10.8.0 (64-bit) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] genefilter_1.44.0 ggplot2_0.9.3.1 scales_0.2.3 >>> phyloseq_1.7.12 >>> >>> loaded via a namespace (and not attached): >>> [1] ade4_1.6-2 annotate_1.40.0 AnnotationDbi_1.24.0 >>> [4] ape_3.0-11 Biobase_2.22.0 BiocGenerics_0.8.0 >>> [7] biom_0.3.11 Biostrings_2.30.1 cluster_1.14.4 >>> [10] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 >>> [13] DESeq2_1.2.8 dichromat_2.0-0 digest_0.6.4 >>> [16] foreach_1.4.1 GenomicRanges_1.14.4 grid_3.0.2 >>> [19] gtable_0.1.2 igraph_0.6.6 IRanges_1.20.6 >>> [22] iterators_1.0.6 labeling_0.2 lattice_0.20-24 >>> [25] locfit_1.5-9.1 MASS_7.3-29 Matrix_1.1-1.1 >>> [28] multtest_2.18.0 munsell_0.4.2 nlme_3.1-113 >>> [31] parallel_3.0.2 permute_0.8-0 plyr_1.8 >>> [34] proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.10.6 >>> [37] RcppArmadillo_0.4.000 reshape2_1.2.2 RJSONIO_1.0-3 >>> [40] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 >>> [43] stringr_0.6.2 survival_2.37-4 tools_3.0.2 >>> [46] vegan_2.0-10 XML_3.95-0.2 xtable_1.7-1 >>> [49] XVector_0.2.0 >>> >>> ------------------------------------------------------------------ >>> Kristina Fontanez, Postdoctoral Fellow >>> fontanez at mit.edu <mailto:fontanez at="" mit.edu=""><mailto:fontanez at="" mit.edu=""> >>> Massachusetts Institute of Technology >>> Department of Civil and Environmental Engineering >>> 48-120E >>> 15 Vassar Street >>> Cambridge, MA 02139 >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the >>> archives:http://news.gmane.org/gmane.science.biology.informatics.c onductor >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLYlink written 5.6 years ago by James W. MacDonald50k
Hi Jim- Thank you so much for your help. Your first solution worked for me and I’ve now captured the minimum and maximum log2fold changes in the new LFC3truenames object. > filtest<-rowSums(abs(otu_table(comp))>3)>0 > LFC3truenames=prune_taxa(filtest,comp) > LFC3truenames phyloseq-class experiment-level object otu_table() OTU Table: [ 253 taxa and 3 samples ] tax_table() Taxonomy Table: [ 253 taxa by 6 taxonomic ranks ] > min(otu_table(comp)) [1] -7.4 > min(otu_table(LFC3truenames)) [1] -7.4 > max(otu_table(comp)) [1] 10.3 > max(otu_table(LFC3truenames)) [1] 10.3 Thank you! Kristina ------------------------------------------------------------------ Kristina Fontanez, Postdoctoral Fellow fontanez@mit.edu<mailto:fontanez@mit.edu> Massachusetts Institute of Technology Department of Civil and Environmental Engineering 48-120E 15 Vassar Street Cambridge, MA 02139 On Jan 23, 2014, at 5:50 PM, James W. MacDonald <jmacdon@uw.edu<mailto:jmacdon@uw.edu>> wrote: Hi Kristina, This part: filtest<-rowSums(abs(otu_table(comp))>3)>1 should actually be filtest<-rowSums(abs(otu_table(comp))>3)>0 returns a logical vector, which you would then use as input to prune_taxa(), not filter_taxa(). I am surprised it worked for you, as internally filter_taxa is doing something like apply(OTU, 1, flist) where flist is supposed to be a function, not a logical vector. Alternatively you could just create a kOverA() version that does what you want: kOverAbsA <- function(k, A, na.rm = TRUE{ function(x) { if(na.rm) x <- x[!is.na(x)] sum(abs(x) > A) >= k } } and then do what you tried before filtered <- filter_taxa(comp, filterfun(kOverAbsA(1, 3)), TRUE) Best, Jim On 1/23/2014 5:15 PM, Kristina M Fontanez wrote: Hi James, Unfortunately, your proposed solution didn’t work for me. I think it’s because I am working with objects built with the phyloseq package which I am trying to subsequently filter with the genefilter functions. First, a review of the two phyloseq objects from my last post: > comp phyloseq-class experiment-level object otu_table() OTU Table: [ 2151 taxa and 3 samples ] tax_table() Taxonomy Table: [ 2151 taxa by 6 taxonomic ranks ] > LFC3 phyloseq-class experiment-level object otu_table() OTU Table: [ 164 taxa and 3 samples ] tax_table() Taxonomy Table: [ 164 taxa by 6 taxonomic ranks ] Now, your solution: > filtest<-rowSums(abs(otu_table(comp))>3)>1 > filtest[1:5] OTU1206 OTU1203 OTU1297 OTU1338 OTU1144 TRUE TRUE TRUE TRUE TRUE > LFC3true=filter_taxa(comp,filtest,TRUE) > LFC3true phyloseq-class experiment-level object otu_table() OTU Table: [ 66 taxa and 3 samples ] tax_table() Taxonomy Table: [ 66 taxa by 6 taxonomic ranks ] > min(otu_table(comp)) [1] -7.4 > min(otu_table(LFC3true)) [1] -5.5 BUT, as you can see the new LFC3true object is still missing that -7.4 value and now it contains even less taxa than the LFC3 object. If the filter works correctly, I should be getting MORE taxa added to that object. I also tried your solution verbatim but ran into trouble because my phyloseq object can’t be subset in the way you suggested: > newcomp=comp[filtest,] Error in comp[filtest, ] : object of type 'S4' is not subsettable I believe that in order to use the filter_taxa function to subset the phyloseq object, I need a genefilter list object. Pasted below is the information in the phyloseq manual from the filter_taxa object. filter_taxa Filter taxa based on across-sample OTU abundance criteria Description This function is directly analogous to the genefilter function for microarray filtering, but is used for filtering OTUs from phyloseq objects. It applies an arbitrary set of functions — as a function list, for instance, created by filterfun — as across-sample criteria, one OTU at a time. It takes as input a phyloseq object, and returns a logical vector indicating whether or not each OTU passed the criteria. Alternatively, if the "prune" option is set to FALSE, it returns the already-trimmed version of the phyloseq object. Usage Arguments physeq (Required). A phyloseq-class object that you want to trim/filter. flist (Required). A function or list of functions that take a vector of abundance values and return a logical. Some canned useful function types are included in the genefilter-package. prune (Optional). A logical. Default FALSE. If TRUE, then the function returns the pruned phyloseq-class object, rather than the logical vector of taxa that passed the filter. Value A logical vector equal to the number of taxa in physeq. This can be provided directly to prune_taxa as first argument. Alternatively, if prune==TRUE, the pruned phyloseq-class object is returned instead. Thanks, Kristina ------------------------------------------------------------------ Kristina Fontanez, Postdoctoral Fellow fontanez@mit.edu<mailto:fontanez@mit.edu> <mailto:fontanez@mit.edu> Massachusetts Institute of Technology Department of Civil and Environmental Engineering 48-120E 15 Vassar Street Cambridge, MA 02139 On Jan 23, 2014, at 4:38 PM, James W. MacDonald <jmacdon@uw.edu<mailto:jmacdon@uw.edu> <mailto:jmacdon@uw.edu>> wrote: Hi Kristina, On 1/23/2014 4:25 PM, Kristina M Fontanez wrote: Dear Bioconductors, I am trying to use the genefilter package to filter a set of Log2fold changes so that I can keep those taxa with Log2fold changes > 3. However, the data itself consists of both positive and negative values, as is the case with log 2 fold comparisons. You don't need the genefilter package to do this, and in fact genefilter is intended for a completely different task. Instead you just need to use simple R commands. filt <- rowSums(abs(comp) > 3) > 1 comp[filt,] Best, Jim Example data: OTU Table: [5 taxa and 3 samples] taxa are rows LvS DvS LvD OTU1206 10.3 1.3 9.0 OTU1203 8.3 2.7 5.5 OTU1297 6.8 -0.9 7.7 OTU1338 6.2 -1.4 7.7 OTU1144 7.4 1.6 5.8 I want to create a filter so that the OTUs with Log2 fold changes > magnitude 3 in either the positive or negative direction are kept. However, the documentation for kOverA in the genefilter package implies that you can only input “values you want to exceed”. As the code below is currently written, I am only keeping taxa with a log2 fold change > +3 in any one sample. However, taxa with a log2 fold change of -7 in a particular sample would be left out. I tested whether I was missing any OTUs by looking for the minimum value in the original OTU table (comp) and in the filtered OTU table (LFC3). As you can see the minimum -7.4 log2 fold change value in comp does not exist in the LFC3 object so it was excluded by my flist2 filter. Is there a similar function like kOverA that I can use to get large magnitude changes in both the positive and negative directions? I tried the code: comp phyloseq-class experiment-level object otu_table() OTU Table: [ 2151 taxa and 3 samples ] tax_table() Taxonomy Table: [ 2151 taxa by 6 taxonomic ranks ] flist2<-filterfun(kOverA(1,3.0)) LFC3=filter_taxa(comp,flist2,TRUE) LFC3 phyloseq-class experiment-level object otu_table() OTU Table: [ 164 taxa and 3 samples ] tax_table() Taxonomy Table: [ 164 taxa by 6 taxonomic ranks ] min(otu_table(comp)) [1] -7.4 min(otu_table(LFC3)) [1] -5.5 Thank you, Kristina sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] genefilter_1.44.0 ggplot2_0.9.3.1 scales_0.2.3 phyloseq_1.7.12 loaded via a namespace (and not attached): [1] ade4_1.6-2 annotate_1.40.0 AnnotationDbi_1.24.0 [4] ape_3.0-11 Biobase_2.22.0 BiocGenerics_0.8.0 [7] biom_0.3.11 Biostrings_2.30.1 cluster_1.14.4 [10] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 [13] DESeq2_1.2.8 dichromat_2.0-0 digest_0.6.4 [16] foreach_1.4.1 GenomicRanges_1.14.4 grid_3.0.2 [19] gtable_0.1.2 igraph_0.6.6 IRanges_1.20.6 [22] iterators_1.0.6 labeling_0.2 lattice_0.20-24 [25] locfit_1.5-9.1 MASS_7.3-29 Matrix_1.1-1.1 [28] multtest_2.18.0 munsell_0.4.2 nlme_3.1-113 [31] parallel_3.0.2 permute_0.8-0 plyr_1.8 [34] proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.10.6 [37] RcppArmadillo_0.4.000 reshape2_1.2.2 RJSONIO_1.0-3 [40] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 [43] stringr_0.6.2 survival_2.37-4 tools_3.0.2 [46] vegan_2.0-10 XML_3.95-0.2 xtable_1.7-1 [49] XVector_0.2.0 ------------------------------------------------------------------ Kristina Fontanez, Postdoctoral Fellow fontanez@mit.edu<mailto:fontanez@mit.edu> <mailto:fontanez@mit.edu><mailto:fontanez@mit.edu> Massachusetts Institute of Technology Department of Civil and Environmental Engineering 48-120E 15 Vassar Street Cambridge, MA 02139 [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> <mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives:http://news.gmane.org/gmane.science.biology.inform atics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]
ADD REPLYlink written 5.6 years ago by Kristina M Fontanez220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 284 users visited in the last hour