limma data frame subsetting problem

0

Entering edit mode

Jabez Wilson ▴ 150

@jabez-wilson-1839

Last seen 9.6 years ago

Dear Bioconductors, I want to do something simple, which I cannot find the solution to. I have a limma data frame and I want to select a subset of the data frame based on whether the values in the "G" channel are > e.g. 5000 As an example I use swirl data targets <- readTargets("SwirlSample.txt"); RG <- read.maimages(targets, source="spot") > head(RG$G) swirl.1 swirl.2 swirl.3 swirl.4 [1,] 22028.260 19278.770 2727.5600 19930.6500 [2,] 25613.200 21438.960 2787.0330 25426.5800 [3,] 22652.390 20386.470 2419.8810 16225.9500 [4,] 8929.286 6677.619 383.2381 786.9048 [5,] 8746.476 6576.292 901.0000 468.0476 [6,] 37010.080 23769.100 23377.9700 28399.0900 I can select the first 20 lines of the df by >RG[1:20,] but I really want to select those lines of the df (and keep the df format intact) where the "G" value in any column is > a certain figure (e.g. 5000) However, > RG[RG$G>5000,] Error in `[.RGList`(RG, RG$G > 5000, ) : (subscript) logical subscript too long I have no success with subset either: > subset(RG, RG$G>5000,) Error: Two subscripts required > subset(RG$G, RG$G>5000) Error in subset.matrix(RG$G, RG$G > 5000) : (subscript) logical subscript too long Do I need to write a loop to check each column of "G" seperately? Or is there a simpler solution? TIA Jabez [[alternative HTML version deleted]]

limma limma • 4.3k views

ADD COMMENT • link 12.6 years ago Jabez Wilson ▴ 150

0

Entering edit mode

Axel Klenk ★ 1.0k

@axel-klenk-3224

Last seen 1 day ago

UPF, Barcelona, Spain

Dear Jabez, try: RG[rowMax(RG$G) > 5000, ] Is that what you want? And BTW, note that you are dealing with an RGList and not a data frame... Cheers, - axel Axel Klenk Research Informatician Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil / Switzerland From: Jabez Wilson <jabezwuk at="" yahoo.co.uk=""> To: bioconductor at r-project.org Date: 05.10.2011 11:15 Subject: [BioC] limma data frame subsetting problem Sent by: bioconductor-bounces at r-project.org Dear Bioconductors, I want to do something simple, which I cannot find the solution to. I have a limma data frame and I want to select a subset of the data frame based on whether the values in the "G" channel are > e.g. 5000 As an example I use swirl data targets <- readTargets("SwirlSample.txt"); RG <- read.maimages(targets, source="spot") > head(RG$G) swirl.1 swirl.2 swirl.3 swirl.4 [1,] 22028.260 19278.770 2727.5600 19930.6500 [2,] 25613.200 21438.960 2787.0330 25426.5800 [3,] 22652.390 20386.470 2419.8810 16225.9500 [4,] 8929.286 6677.619 383.2381 786.9048 [5,] 8746.476 6576.292 901.0000 468.0476 [6,] 37010.080 23769.100 23377.9700 28399.0900 I can select the first 20 lines of the df by >RG[1:20,] but I really want to select those lines of the df (and keep the df format intact) where the "G" value in any column is > a certain figure (e.g. 5000) However, > RG[RG$G>5000,] Error in `[.RGList`(RG, RG$G > 5000, ) : (subscript) logical subscript too long I have no success with subset either: > subset(RG, RG$G>5000,) Error: Two subscripts required > subset(RG$G, RG$G>5000) Error in subset.matrix(RG$G, RG$G > 5000) : (subscript) logical subscript too long Do I need to write a loop to check each column of "G" seperately? Or is there a simpler solution? TIA Jabez [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor The information of this email and in any file transmitted with it is strictly confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any copying, distribution or any other use of this email is prohibited and may be unlawful. In such case, you should please notify the sender immediately and destroy this email. The content of this email is not legally binding unless confirmed by letter. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorised to state them to be the views of the sender's company. For further information about Actelion please see our website at http://www.actelion.com

ADD COMMENT • link 12.6 years ago Axel Klenk ★ 1.0k

0

Entering edit mode

Jabez Wilson ▴ 150

@jabez-wilson-1839

Last seen 9.6 years ago

Yes, thanks, Axel, I think that may well do the trick - assuming you meant rowMaxs() ;-) You're right, it is an RGList not a DF. Jab --- On Wed, 5/10/11, axel.klenk@actelion.com <axel.klenk@actelion.com> wrote: From: axel.klenk@actelion.com <axel.klenk@actelion.com> Subject: Re: [BioC] limma data frame subsetting problem To: "Jabez Wilson" <jabezwuk@yahoo.co.uk> Cc: bioconductor@r-project.org, bioconductor-bounces@r-project.org Date: Wednesday, 5 October, 2011, 10:42 Dear Jabez, try: RG[rowMax(RG$G) > 5000, ] Is that what you want? And BTW, note that you are dealing with an RGList and not a data frame... Cheers, - axel Axel Klenk Research Informatician Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil / Switzerland From: Jabez Wilson <jabezwuk@yahoo.co.uk> To: bioconductor@r-project.org Date: 05.10.2011 11:15 Subject: [BioC] limma data frame subsetting problem Sent by: bioconductor-bounces@r-project.org Dear Bioconductors, I want to do something simple, which I cannot find the solution to. I have a limma data frame and I want to select a subset of the data frame based on whether the values in the "G" channel are > e.g. 5000 As an example I use swirl data targets <- readTargets("SwirlSample.txt"); RG <- read.maimages(targets, source="spot") > head(RG$G) swirl.1 swirl.2 swirl.3 swirl.4 [1,] 22028.260 19278.770 2727.5600 19930.6500 [2,] 25613.200 21438.960 2787.0330 25426.5800 [3,] 22652.390 20386.470 2419.8810 16225.9500 [4,] 8929.286 6677.619 383.2381 786.9048 [5,] 8746.476 6576.292 901.0000 468.0476 [6,] 37010.080 23769.100 23377.9700 28399.0900 I can select the first 20 lines of the df by >RG[1:20,] but I really want to select those lines of the df (and keep the df format intact) where the "G" value in any column is > a certain figure (e.g. 5000) However, > RG[RG$G>5000,] Error in `[.RGList`(RG, RG$G > 5000, ) : (subscript) logical subscript too long I have no success with subset either: > subset(RG, RG$G>5000,) Error: Two subscripts required > subset(RG$G, RG$G>5000) Error in subset.matrix(RG$G, RG$G > 5000) : (subscript) logical subscript too long Do I need to write a loop to check each column of "G" seperately? Or is there a simpler solution? TIA Jabez [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor The information of this email and in any file transmitted with it is strictly confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any copying, distribution or any other use of this email is prohibited and may be unlawful. In such case, you should please notify the sender immediately and destroy this email. The content of this email is not legally binding unless confirmed by letter. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorised to state them to be the views of the sender's company. For further information about Actelion please see our website at http://www.actelion.com [[alternative HTML version deleted]]

ADD COMMENT • link 12.6 years ago Jabez Wilson ▴ 150

0

Entering edit mode

No, I meant rowMax() from package Biobase but obviously this functionality has been implemented several times... rowMaxs() from fBasics (if that's the one you've found) is less efficient but it will hardly make a difference for your use case - as in fortune(98)... :-) - axel Axel Klenk Research Informatician Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil / Switzerland From: Jabez Wilson <jabezwuk at="" yahoo.co.uk=""> To: axel.klenk at actelion.com Cc: bioconductor at r-project.org, bioconductor-bounces at r-project.org Date: 05.10.2011 16:22 Subject: Re: [BioC] limma data frame subsetting problem Yes, thanks, Axel, I think that may well do the trick - assuming you meant rowMaxs() ;-) You're right, it is an RGList not a DF. Jab --- On Wed, 5/10/11, axel.klenk at actelion.com <axel.klenk at="" actelion.com=""> wrote: From: axel.klenk@actelion.com <axel.klenk@actelion.com> Subject: Re: [BioC] limma data frame subsetting problem To: "Jabez Wilson" <jabezwuk at="" yahoo.co.uk=""> Cc: bioconductor at r-project.org, bioconductor-bounces at r-project.org Date: Wednesday, 5 October, 2011, 10:42 Dear Jabez, try: RG[rowMax(RG$G) > 5000, ] Is that what you want? And BTW, note that you are dealing with an RGList and not a data frame... Cheers, - axel Axel Klenk Research Informatician Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil / Switzerland From: Jabez Wilson <jabezwuk at="" yahoo.co.uk=""> To: bioconductor at r-project.org Date: 05.10.2011 11:15 Subject: [BioC] limma data frame subsetting problem Sent by: bioconductor-bounces at r-project.org Dear Bioconductors, I want to do something simple, which I cannot find the solution to. I have a limma data frame and I want to select a subset of the data frame based on whether the values in the "G" channel are > e.g. 5000 As an example I use swirl data targets <- readTargets("SwirlSample.txt"); RG <- read.maimages(targets, source="spot") > head(RG$G) swirl.1 swirl.2 swirl.3 swirl.4 [1,] 22028.260 19278.770 2727.5600 19930.6500 [2,] 25613.200 21438.960 2787.0330 25426.5800 [3,] 22652.390 20386.470 2419.8810 16225.9500 [4,] 8929.286 6677.619 383.2381 786.9048 [5,] 8746.476 6576.292 901.0000 468.0476 [6,] 37010.080 23769.100 23377.9700 28399.0900 I can select the first 20 lines of the df by >RG[1:20,] but I really want to select those lines of the df (and keep the df format intact) where the "G" value in any column is > a certain figure (e.g. 5000) However, > RG[RG$G>5000,] Error in `[.RGList`(RG, RG$G > 5000, ) : (subscript) logical subscript too long I have no success with subset either: > subset(RG, RG$G>5000,) Error: Two subscripts required > subset(RG$G, RG$G>5000) Error in subset.matrix(RG$G, RG$G > 5000) : (subscript) logical subscript too long Do I need to write a loop to check each column of "G" seperately? Or is there a simpler solution? TIA Jabez [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor The information of this email and in any file transmitted with it is strictly confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any copying, distribution or any other use of this email is prohibited and may be unlawful. In such case, you should please notify the sender immediately and destroy this email. The content of this email is not legally binding unless confirmed by letter. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorised to state them to be the views of the sender's company. For further information about Actelion please see our website at http://www.actelion.com The information of this email and in any file transmitted with it is strictly confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any copying, distribution or any other use of this email is prohibited and may be unlawful. In such case, you should please notify the sender immediately and destroy this email. The content of this email is not legally binding unless confirmed by letter. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorised to state them to be the views of the sender's company. For further information about Actelion please see our website at http://www.actelion.com

ADD REPLY • link 12.6 years ago Axel Klenk ★ 1.0k

Login before adding your answer.