Problem with removing duplicated probes of datasets without annotation
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Dear R helpers, I'm working with the goat dataset with no available annotation db. For this reason, I use the 'genefilter' instead of 'nsFilter' function with ANOVA (p<0.05) (available in 'genefilter' package). The problem is that I have the filtered data with 500 ducplicated probes of which I want to remove. Due to my limited ability, I cannot figure out how to do them. It would be great if I can either select a probe of each duplicates with lowest p-value or most variance. Would you please help me with some examples? Best Regards, Kaj -- output of sessionInfo(): R version 3.1.0 (2014-04-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Biobase_2.24.0 BiocGenerics_0.10.0 genefilter_1.46.1 loaded via a namespace (and not attached): [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 [4] GenomeInfoDb_1.0.2 IRanges_1.22.9 RSQLite_0.11.4 [7] splines_3.1.0 stats4_3.1.0 survival_2.37-7 [10] tcltk_3.1.0 tools_3.1.0 XML_3.98-1.1 [13] xtable_1.7-3 -- Sent via the guest posting facility at bioconductor.org.
Annotation probe Annotation probe • 1.1k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 11 days ago
EMBL European Molecular Biology Laborat…
Kaj this is an R question, code like the following would do the job x = ? a data.frame with columns ?probeid? and ?pvalue? ... s = split( seq_len(nrow(x)), x$probeid) uniqueids = sapply( s, function(i) i[which.min(x$pvalue[i])] ) And you can replace what?s inside the ?which.min(?)? expression with whatever pleases you. There are plenty of places in vignettes etc. where this type of operation is done. One I happen to be aware of right now is inside the function ?myHeatmap? of the ?Hiiragi2013? package. Wolfgang Huber On 22 Jun 2014, at 10:02, Kaj Chokeshaiusaha [guest] <guest at="" bioconductor.org=""> wrote: > Dear R helpers, > > I'm working with the goat dataset with no available annotation db. For this reason, I use the 'genefilter' instead of 'nsFilter' function with ANOVA (p<0.05) (available in 'genefilter' package). The problem is that I have the filtered data with 500 ducplicated probes of which I want to remove. > > Due to my limited ability, I cannot figure out how to do them. It would be great if I can either select a probe of each duplicates with lowest p-value or most variance. > > Would you please help me with some examples? > > Best Regards, > Kaj > > -- output of sessionInfo(): > > R version 3.1.0 (2014-04-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Biobase_2.24.0 BiocGenerics_0.10.0 genefilter_1.46.1 > > loaded via a namespace (and not attached): > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 > [4] GenomeInfoDb_1.0.2 IRanges_1.22.9 RSQLite_0.11.4 > [7] splines_3.1.0 stats4_3.1.0 survival_2.37-7 > [10] tcltk_3.1.0 tools_3.1.0 XML_3.98-1.1 > [13] xtable_1.7-3 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Dear Wolfgang, Thank you very much for your crack-open!! I will try to proceed with these examples, ASAP. Just an additional question, I notice that the result object of 'genefilter' function is the logical object (TRUE, FALSE). It seems like I have to perform one-way ANOVA by myself...Do I mistake anything? Thank you very much, Kaj 2014-06-22 15:49 GMT+07:00 Wolfgang Huber <whuber@embl.de>: > Kaj > > this is an R question, code like the following would do the job > > x = … a data.frame with columns ‘probeid’ and ‘pvalue’ ... > s = split( seq_len(nrow(x)), x$probeid) > uniqueids = sapply( s, function(i) i[which.min(x$pvalue[i])] ) > > And you can replace what’s inside the ‘which.min(…)’ expression with > whatever pleases you. > > There are plenty of places in vignettes etc. where this type of operation > is done. One I happen to be aware of right now is inside the function > ‘myHeatmap’ of the ‘Hiiragi2013’ package. > > > Wolfgang Huber > > > > > On 22 Jun 2014, at 10:02, Kaj Chokeshaiusaha [guest] < > guest@bioconductor.org> wrote: > > > Dear R helpers, > > > > I'm working with the goat dataset with no available annotation db. For > this reason, I use the 'genefilter' instead of 'nsFilter' function with > ANOVA (p<0.05) (available in 'genefilter' package). The problem is that I > have the filtered data with 500 ducplicated probes of which I want to > remove. > > > > Due to my limited ability, I cannot figure out how to do them. It would > be great if I can either select a probe of each duplicates with lowest > p-value or most variance. > > > > Would you please help me with some examples? > > > > Best Regards, > > Kaj > > > > -- output of sessionInfo(): > > > > R version 3.1.0 (2014-04-10) > > Platform: x86_64-pc-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] Biobase_2.24.0 BiocGenerics_0.10.0 genefilter_1.46.1 > > > > loaded via a namespace (and not attached): > > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 > > [4] GenomeInfoDb_1.0.2 IRanges_1.22.9 RSQLite_0.11.4 > > [7] splines_3.1.0 stats4_3.1.0 survival_2.37-7 > > [10] tcltk_3.1.0 tools_3.1.0 XML_3.98-1.1 > > [13] xtable_1.7-3 > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Dear Prof. Huber, I'm very sorry for my clumsiness. I have noticed the function 'rowFtests' and can properly deal with the task. Thank you very much again for your help and generosity. With Respects, Kaj 2014-06-22 21:18 GMT+07:00 Kaj Chokeshaiusaha <kaj.chk@gmail.com>: > Dear Wolfgang, > > Thank you very much for your crack-open!! I will try to proceed with these > examples, ASAP. > Just an additional question, I notice that the result object of > 'genefilter' function is the logical object (TRUE, FALSE). > It seems like I have to perform one-way ANOVA by myself...Do I mistake > anything? > > Thank you very much, > Kaj > > > 2014-06-22 15:49 GMT+07:00 Wolfgang Huber <whuber@embl.de>: > > Kaj >> >> this is an R question, code like the following would do the job >> >> x = … a data.frame with columns ‘probeid’ and ‘pvalue’ ... >> s = split( seq_len(nrow(x)), x$probeid) >> uniqueids = sapply( s, function(i) i[which.min(x$pvalue[i])] ) >> >> And you can replace what’s inside the ‘which.min(…)’ expression with >> whatever pleases you. >> >> There are plenty of places in vignettes etc. where this type of operation >> is done. One I happen to be aware of right now is inside the function >> ‘myHeatmap’ of the ‘Hiiragi2013’ package. >> >> >> Wolfgang Huber >> >> >> >> >> On 22 Jun 2014, at 10:02, Kaj Chokeshaiusaha [guest] < >> guest@bioconductor.org> wrote: >> >> > Dear R helpers, >> > >> > I'm working with the goat dataset with no available annotation db. For >> this reason, I use the 'genefilter' instead of 'nsFilter' function with >> ANOVA (p<0.05) (available in 'genefilter' package). The problem is that I >> have the filtered data with 500 ducplicated probes of which I want to >> remove. >> > >> > Due to my limited ability, I cannot figure out how to do them. It would >> be great if I can either select a probe of each duplicates with lowest >> p-value or most variance. >> > >> > Would you please help me with some examples? >> > >> > Best Regards, >> > Kaj >> > >> > -- output of sessionInfo(): >> > >> > R version 3.1.0 (2014-04-10) >> > Platform: x86_64-pc-linux-gnu (64-bit) >> > >> > locale: >> > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >> > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 >> > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> > [9] LC_ADDRESS=C LC_TELEPHONE=C >> > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> > >> > attached base packages: >> > [1] parallel stats graphics grDevices utils datasets methods >> > [8] base >> > >> > other attached packages: >> > [1] Biobase_2.24.0 BiocGenerics_0.10.0 genefilter_1.46.1 >> > >> > loaded via a namespace (and not attached): >> > [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7 >> > [4] GenomeInfoDb_1.0.2 IRanges_1.22.9 RSQLite_0.11.4 >> > [7] splines_3.1.0 stats4_3.1.0 survival_2.37-7 >> > [10] tcltk_3.1.0 tools_3.1.0 XML_3.98-1.1 >> > [13] xtable_1.7-3 >> > >> > -- >> > Sent via the guest posting facility at bioconductor.org. >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 486 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6