Selecting Unique rows in multiple column data frames
3
0
Entering edit mode
Matjaž Hren ▴ 50
@matjaz-hren-1333
Last seen 9.6 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20061106/ bfefc4d9/attachment.pl
• 830 views
ADD COMMENT
0
Entering edit mode
Jenny Drnevich ★ 2.2k
@jenny-drnevich-382
Last seen 9.6 years ago
Hi Matjaz, For option 2, if your data frame is called 'mydata' just do: mydata.unique <- mydata[ !duplicated(mydata$ID), ] This is will pull out the first instance of each ID, along with the M value. Cheers, Jenny At 04:32 AM 11/6/2006, alex lam \(RI\) wrote: >Hi Matjaz, >For option 1, have a look at the help page of the method "aggregate". > >I don't understand your option 2. Perhaps I am misreading what your are >saying. >If you want to select unique rows according to column 1 and 2, you can >create a third column by joining col1 and 2 > >Col3 <- paste(ID, M, sep="_") >Index <- unique(Col3) >YourData[Index,] > >But I can't see that any replicates would be having identical M values. > >Cheers, >Alex > >------------------------------------ >Alex Lam >PhD student >Department of Genetics and Genomics >Roslin Institute (Edinburgh) >Roslin >Midlothian EH25 9PS >Great Britain > >Phone +44 131 5274471 >Web http://www.roslin.ac.uk > > >-----Original Message----- >From: bioconductor-bounces at stat.math.ethz.ch >[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Matja? Hren >Sent: 06 November 2006 09:06 >To: Bioconductor >Subject: [BioC] Selecting Unique rows in multiple column data frames > >Dear list! > > > >I have data frames with 2 columns of normalised microarray data (more that >10k rows, custom-made array) with the following layout (not real data): > > > >ID M > >ID1 -4.60138 > >ID2 -3.28832 > >ID3 4.83560 > >ID4 6.45286 > >ID4 6.65235 > >ID4 6.38745 > >ID4 6.74514 > >ID5 4.43995 > >ID6 -1.78943 > >ID7 -4.00257 > >ID8 -4.46327 > >ID9 -3.13956 > >ID10 2.52233 > >ID11 -1.81214 > >ID11 -1.78625 > >ID11 -1.61214 > >ID11 -1.52354 > > > >ID is the oligo ID (spot-ID), M is the corresponding M-value. > > > >Only one spot per block is present in replicates (4). Therefore I would >like to use one of the following 2 options: > > > >1. Average the M-values in rows that have the same ID and extract the data >table with both columns. > >2. or if the first option does not work: Extract the rows with unique ID >(both columns) and remove the replicates. I tried using "unique" on ID >column but I couldn't extend its use to more than one column in the data frame. > > > >I used R 2.4.0 and limma package for normalisation. > > > > > >Thank you in advance, > > > > > >Matjaz > > > >--------------------------------------------------------------------- ------- > >Matjaz Hren > > > >National Institute of Biology > >Department of Plant Physiology and Biotechnology > >SLOVENIA > >--------------------------------------------------------------------- ------- > > > > > > > > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu
ADD COMMENT
0
Entering edit mode
alex lam RI ▴ 310
@alex-lam-ri-1491
Last seen 9.6 years ago
Hi Matjaz, For option 1, have a look at the help page of the method "aggregate". I don't understand your option 2. Perhaps I am misreading what your are saying. If you want to select unique rows according to column 1 and 2, you can create a third column by joining col1 and 2 Col3 <- paste(ID, M, sep="_") Index <- unique(Col3) YourData[Index,] But I can't see that any replicates would be having identical M values. Cheers, Alex ------------------------------------ Alex Lam PhD student Department of Genetics and Genomics Roslin Institute (Edinburgh) Roslin Midlothian EH25 9PS Great Britain Phone +44 131 5274471 Web http://www.roslin.ac.uk -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Matja? Hren Sent: 06 November 2006 09:06 To: Bioconductor Subject: [BioC] Selecting Unique rows in multiple column data frames Dear list! I have data frames with 2 columns of normalised microarray data (more that 10k rows, custom-made array) with the following layout (not real data): ID M ID1 -4.60138 ID2 -3.28832 ID3 4.83560 ID4 6.45286 ID4 6.65235 ID4 6.38745 ID4 6.74514 ID5 4.43995 ID6 -1.78943 ID7 -4.00257 ID8 -4.46327 ID9 -3.13956 ID10 2.52233 ID11 -1.81214 ID11 -1.78625 ID11 -1.61214 ID11 -1.52354 ID is the oligo ID (spot-ID), M is the corresponding M-value. Only one spot per block is present in replicates (4). Therefore I would like to use one of the following 2 options: 1. Average the M-values in rows that have the same ID and extract the data table with both columns. 2. or if the first option does not work: Extract the rows with unique ID (both columns) and remove the replicates. I tried using "unique" on ID column but I couldn't extend its use to more than one column in the data frame. I used R 2.4.0 and limma package for normalisation. Thank you in advance, Matjaz ---------------------------------------------------------------------- ------ Matjaz Hren National Institute of Biology Department of Plant Physiology and Biotechnology SLOVENIA ---------------------------------------------------------------------- ------ [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@gorjanc-gregor-1198
Last seen 9.6 years ago
Pozdravljen Matja? and hello Alex ;) It was nice to meet you in Goettingen! > I have data frames with 2 columns of normalised microarray data (more that 10k rows, custom-made array) > with the following layout (not real data): > ID M > ID1 -4.60138 > ID2 -3.28832 > ID is the oligo ID (spot-ID), M is the corresponding M-value. > > Only one spot per block is present in replicates (4). Therefore I would like to use one of the following 2 options: > You can get rows that have the same ID with (x is a data.frame) ## get IDs id <- x$id ## unique IDs uId <- unique(x$id) ## loop over unique IDs for(i in uId) { ## do whatever you want with rows that have the same ID x[i %in% id, ] } > 1. Average the M-values in rows that have the same ID and extract the data table with both columns. If I understand, this should work for(i in uId) { tmp <- x[i %in% id, ] print(tmp) # or anything else mean(tmp$m, na.rm=TRUE) } -- Lep pozdrav / With regards, Gregor Gorjanc ---------------------------------------------------------------------- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc <at> bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europe fax: +386 (0)1 72 17 888 ---------------------------------------------------------------------- "One must learn by doing the thing; for though you think you know it, you have no certainty until you try." Sophocles ~ 450 B.C.
ADD COMMENT

Login before adding your answer.

Traffic: 1098 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6