Selecting Unique rows in multiple column data frames

0

Entering edit mode

Matjaž Hren ▴ 50

@matjaz-hren-1333

Last seen 9.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20061106/ bfefc4d9/attachment.pl

• 830 views

ADD COMMENT • link updated 17.5 years ago by Gorjanc Gregor ▴ 140 • written 17.5 years ago by Matjaž Hren ▴ 50

0

Entering edit mode

Jenny Drnevich ★ 2.2k

@jenny-drnevich-382

Last seen 9.6 years ago

Hi Matjaz, For option 2, if your data frame is called 'mydata' just do: mydata.unique <- mydata[ !duplicated(mydata$ID), ] This is will pull out the first instance of each ID, along with the M value. Cheers, Jenny At 04:32 AM 11/6/2006, alex lam $RI$ wrote: >Hi Matjaz, >For option 1, have a look at the help page of the method "aggregate". > >I don't understand your option 2. Perhaps I am misreading what your are >saying. >If you want to select unique rows according to column 1 and 2, you can >create a third column by joining col1 and 2 > >Col3 <- paste(ID, M, sep="_") >Index <- unique(Col3) >YourData[Index,] > >But I can't see that any replicates would be having identical M values. > >Cheers, >Alex > >------------------------------------ >Alex Lam >PhD student >Department of Genetics and Genomics >Roslin Institute (Edinburgh) >Roslin >Midlothian EH25 9PS >Great Britain > >Phone +44 131 5274471 >Web http://www.roslin.ac.uk > > >-----Original Message----- >From: bioconductor-bounces at stat.math.ethz.ch >[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Matja? Hren >Sent: 06 November 2006 09:06 >To: Bioconductor >Subject: [BioC] Selecting Unique rows in multiple column data frames > >Dear list! > > > >I have data frames with 2 columns of normalised microarray data (more that >10k rows, custom-made array) with the following layout (not real data): > > > >ID M > >ID1 -4.60138 > >ID2 -3.28832 > >ID3 4.83560 > >ID4 6.45286 > >ID4 6.65235 > >ID4 6.38745 > >ID4 6.74514 > >ID5 4.43995 > >ID6 -1.78943 > >ID7 -4.00257 > >ID8 -4.46327 > >ID9 -3.13956 > >ID10 2.52233 > >ID11 -1.81214 > >ID11 -1.78625 > >ID11 -1.61214 > >ID11 -1.52354 > > > >ID is the oligo ID (spot-ID), M is the corresponding M-value. > > > >Only one spot per block is present in replicates (4). Therefore I would >like to use one of the following 2 options: > > > >1. Average the M-values in rows that have the same ID and extract the data >table with both columns. > >2. or if the first option does not work: Extract the rows with unique ID >(both columns) and remove the replicates. I tried using "unique" on ID >column but I couldn't extend its use to more than one column in the data frame. > > > >I used R 2.4.0 and limma package for normalisation. > > > > > >Thank you in advance, > > > > > >Matjaz > > > >--------------------------------------------------------------------- ------- > >Matjaz Hren > > > >National Institute of Biology > >Department of Plant Physiology and Biotechnology > >SLOVENIA > >--------------------------------------------------------------------- ------- > > > > > > > > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD COMMENT • link 17.5 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

alex lam RI ▴ 310

@alex-lam-ri-1491

Last seen 9.6 years ago

Hi Matjaz, For option 1, have a look at the help page of the method "aggregate". I don't understand your option 2. Perhaps I am misreading what your are saying. If you want to select unique rows according to column 1 and 2, you can create a third column by joining col1 and 2 Col3 <- paste(ID, M, sep="_") Index <- unique(Col3) YourData[Index,] But I can't see that any replicates would be having identical M values. Cheers, Alex ------------------------------------ Alex Lam PhD student Department of Genetics and Genomics Roslin Institute (Edinburgh) Roslin Midlothian EH25 9PS Great Britain Phone +44 131 5274471 Web http://www.roslin.ac.uk -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Matja? Hren Sent: 06 November 2006 09:06 To: Bioconductor Subject: [BioC] Selecting Unique rows in multiple column data frames Dear list! I have data frames with 2 columns of normalised microarray data (more that 10k rows, custom-made array) with the following layout (not real data): ID M ID1 -4.60138 ID2 -3.28832 ID3 4.83560 ID4 6.45286 ID4 6.65235 ID4 6.38745 ID4 6.74514 ID5 4.43995 ID6 -1.78943 ID7 -4.00257 ID8 -4.46327 ID9 -3.13956 ID10 2.52233 ID11 -1.81214 ID11 -1.78625 ID11 -1.61214 ID11 -1.52354 ID is the oligo ID (spot-ID), M is the corresponding M-value. Only one spot per block is present in replicates (4). Therefore I would like to use one of the following 2 options: 1. Average the M-values in rows that have the same ID and extract the data table with both columns. 2. or if the first option does not work: Extract the rows with unique ID (both columns) and remove the replicates. I tried using "unique" on ID column but I couldn't extend its use to more than one column in the data frame. I used R 2.4.0 and limma package for normalisation. Thank you in advance, Matjaz ---------------------------------------------------------------------- ------ Matjaz Hren National Institute of Biology Department of Plant Physiology and Biotechnology SLOVENIA ---------------------------------------------------------------------- ------ [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 17.5 years ago alex lam RI ▴ 310

0

Entering edit mode

Gorjanc Gregor ▴ 140

@gorjanc-gregor-1198

Last seen 9.6 years ago

Pozdravljen Matja? and hello Alex ;) It was nice to meet you in Goettingen! > I have data frames with 2 columns of normalised microarray data (more that 10k rows, custom-made array) > with the following layout (not real data): > ID M > ID1 -4.60138 > ID2 -3.28832 > ID is the oligo ID (spot-ID), M is the corresponding M-value. > > Only one spot per block is present in replicates (4). Therefore I would like to use one of the following 2 options: > You can get rows that have the same ID with (x is a data.frame) ## get IDs id <- x$id ## unique IDs uId <- unique(x$id) ## loop over unique IDs for(i in uId) { ## do whatever you want with rows that have the same ID x[i %in% id, ] } > 1. Average the M-values in rows that have the same ID and extract the data table with both columns. If I understand, this should work for(i in uId) { tmp <- x[i %in% id, ] print(tmp) # or anything else mean(tmp$m, na.rm=TRUE) } -- Lep pozdrav / With regards, Gregor Gorjanc ---------------------------------------------------------------------- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc <at> bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europe fax: +386 (0)1 72 17 888 ---------------------------------------------------------------------- "One must learn by doing the thing; for though you think you know it, you have no certainty until you try." Sophocles ~ 450 B.C.

ADD COMMENT • link 17.5 years ago Gorjanc Gregor ▴ 140

Login before adding your answer.