Problem with removing duplicated probes of datasets without annotation

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Dear R helpers, I'm working with the goat dataset with no available annotation db. For this reason, I use the 'genefilter' instead of 'nsFilter' function with ANOVA (p<0.05) (available in 'genefilter' package). The problem is that I have the filtered data with 500 ducplicated probes of which I want to remove. Due to my limited ability, I cannot figure out how to do them. It would be great if I can either select a probe of each duplicates with lowest p-value or most variance. Would you please help me with some examples? Best Regards, Kaj -- output of sessionInfo(): R version 3.1.0 (2014-04-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Biobase_2.24.0 BiocGenerics_0.10.0 genefilter_1.46.1 loaded via a namespace (and not attached): [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 [4] GenomeInfoDb_1.0.2 IRanges_1.22.9 RSQLite_0.11.4 [7] splines_3.1.0 stats4_3.1.0 survival_2.37-7 [10] tcltk_3.1.0 tools_3.1.0 XML_3.98-1.1 [13] xtable_1.7-3 -- Sent via the guest posting facility at bioconductor.org.

Annotation probe Annotation probe • 1.1k views

ADD COMMENT • link updated 9.8 years ago by Wolfgang Huber ★ 13k • written 9.8 years ago by Guest User ★ 13k

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 11 days ago

EMBL European Molecular Biology Laborat…

Kaj this is an R question, code like the following would do the job x = ? a data.frame with columns ?probeid? and ?pvalue? ... s = split( seq_len(nrow(x)), x$probeid) uniqueids = sapply( s, function(i) i[which.min(x$pvalue[i])] ) And you can replace what?s inside the ?which.min(?)? expression with whatever pleases you. There are plenty of places in vignettes etc. where this type of operation is done. One I happen to be aware of right now is inside the function ?myHeatmap? of the ?Hiiragi2013? package. Wolfgang Huber On 22 Jun 2014, at 10:02, Kaj Chokeshaiusaha [guest] <guest at="" bioconductor.org=""> wrote: > Dear R helpers, > > I'm working with the goat dataset with no available annotation db. For this reason, I use the 'genefilter' instead of 'nsFilter' function with ANOVA (p<0.05) (available in 'genefilter' package). The problem is that I have the filtered data with 500 ducplicated probes of which I want to remove. > > Due to my limited ability, I cannot figure out how to do them. It would be great if I can either select a probe of each duplicates with lowest p-value or most variance. > > Would you please help me with some examples? > > Best Regards, > Kaj > > -- output of sessionInfo(): > > R version 3.1.0 (2014-04-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Biobase_2.24.0 BiocGenerics_0.10.0 genefilter_1.46.1 > > loaded via a namespace (and not attached): > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 > [4] GenomeInfoDb_1.0.2 IRanges_1.22.9 RSQLite_0.11.4 > [7] splines_3.1.0 stats4_3.1.0 survival_2.37-7 > [10] tcltk_3.1.0 tools_3.1.0 XML_3.98-1.1 > [13] xtable_1.7-3 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 9.8 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear Wolfgang, Thank you very much for your crack-open!! I will try to proceed with these examples, ASAP. Just an additional question, I notice that the result object of 'genefilter' function is the logical object (TRUE, FALSE). It seems like I have to perform one-way ANOVA by myself...Do I mistake anything? Thank you very much, Kaj 2014-06-22 15:49 GMT+07:00 Wolfgang Huber <whuber@embl.de>: > Kaj > > this is an R question, code like the following would do the job > > x = â¦ a data.frame with columns âprobeidâ and âpvalueâ ... > s = split( seq_len(nrow(x)), x$probeid) > uniqueids = sapply( s, function(i) i[which.min(x$pvalue[i])] ) > > And you can replace whatâs inside the âwhich.min(â¦)â expression with > whatever pleases you. > > There are plenty of places in vignettes etc. where this type of operation > is done. One I happen to be aware of right now is inside the function > âmyHeatmapâ of the âHiiragi2013â package. > > > Wolfgang Huber > > > > > On 22 Jun 2014, at 10:02, Kaj Chokeshaiusaha [guest] < > guest@bioconductor.org> wrote: > > > Dear R helpers, > > > > I'm working with the goat dataset with no available annotation db. For > this reason, I use the 'genefilter' instead of 'nsFilter' function with > ANOVA (p<0.05) (available in 'genefilter' package). The problem is that I > have the filtered data with 500 ducplicated probes of which I want to > remove. > > > > Due to my limited ability, I cannot figure out how to do them. It would > be great if I can either select a probe of each duplicates with lowest > p-value or most variance. > > > > Would you please help me with some examples? > > > > Best Regards, > > Kaj > > > > -- output of sessionInfo(): > > > > R version 3.1.0 (2014-04-10) > > Platform: x86_64-pc-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] Biobase_2.24.0 BiocGenerics_0.10.0 genefilter_1.46.1 > > > > loaded via a namespace (and not attached): > > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 > > [4] GenomeInfoDb_1.0.2 IRanges_1.22.9 RSQLite_0.11.4 > > [7] splines_3.1.0 stats4_3.1.0 survival_2.37-7 > > [10] tcltk_3.1.0 tools_3.1.0 XML_3.98-1.1 > > [13] xtable_1.7-3 > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]

ADD REPLY • link 9.8 years ago Kaj Chokeshaiusaha ▴ 70

0

Entering edit mode

Dear Prof. Huber, I'm very sorry for my clumsiness. I have noticed the function 'rowFtests' and can properly deal with the task. Thank you very much again for your help and generosity. With Respects, Kaj 2014-06-22 21:18 GMT+07:00 Kaj Chokeshaiusaha <kaj.chk@gmail.com>: > Dear Wolfgang, > > Thank you very much for your crack-open!! I will try to proceed with these > examples, ASAP. > Just an additional question, I notice that the result object of > 'genefilter' function is the logical object (TRUE, FALSE). > It seems like I have to perform one-way ANOVA by myself...Do I mistake > anything? > > Thank you very much, > Kaj > > > 2014-06-22 15:49 GMT+07:00 Wolfgang Huber <whuber@embl.de>: > > Kaj >> >> this is an R question, code like the following would do the job >> >> x = â¦ a data.frame with columns âprobeidâ and âpvalueâ ... >> s = split( seq_len(nrow(x)), x$probeid) >> uniqueids = sapply( s, function(i) i[which.min(x$pvalue[i])] ) >> >> And you can replace whatâs inside the âwhich.min(â¦)â expression with >> whatever pleases you. >> >> There are plenty of places in vignettes etc. where this type of operation >> is done. One I happen to be aware of right now is inside the function >> âmyHeatmapâ of the âHiiragi2013â package. >> >> >> Wolfgang Huber >> >> >> >> >> On 22 Jun 2014, at 10:02, Kaj Chokeshaiusaha [guest] < >> guest@bioconductor.org> wrote: >> >> > Dear R helpers, >> > >> > I'm working with the goat dataset with no available annotation db. For >> this reason, I use the 'genefilter' instead of 'nsFilter' function with >> ANOVA (p<0.05) (available in 'genefilter' package). The problem is that I >> have the filtered data with 500 ducplicated probes of which I want to >> remove. >> > >> > Due to my limited ability, I cannot figure out how to do them. It would >> be great if I can either select a probe of each duplicates with lowest >> p-value or most variance. >> > >> > Would you please help me with some examples? >> > >> > Best Regards, >> > Kaj >> > >> > -- output of sessionInfo(): >> > >> > R version 3.1.0 (2014-04-10) >> > Platform: x86_64-pc-linux-gnu (64-bit) >> > >> > locale: >> > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >> > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 >> > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> > [9] LC_ADDRESS=C LC_TELEPHONE=C >> > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> > >> > attached base packages: >> > [1] parallel stats graphics grDevices utils datasets methods >> > [8] base >> > >> > other attached packages: >> > [1] Biobase_2.24.0 BiocGenerics_0.10.0 genefilter_1.46.1 >> > >> > loaded via a namespace (and not attached): >> > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 >> > [4] GenomeInfoDb_1.0.2 IRanges_1.22.9 RSQLite_0.11.4 >> > [7] splines_3.1.0 stats4_3.1.0 survival_2.37-7 >> > [10] tcltk_3.1.0 tools_3.1.0 XML_3.98-1.1 >> > [13] xtable_1.7-3 >> > >> > -- >> > Sent via the guest posting facility at bioconductor.org. >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > [[alternative HTML version deleted]]

ADD REPLY • link 9.8 years ago Kaj Chokeshaiusaha ▴ 70

Login before adding your answer.