imputing missing data for 70mer array platform, need advice
1
0
Entering edit mode
@betty-gilbert-1120
Last seen 7.1 years ago
Hello, If this has been discussed in the archives, my apologies but I couldn't find it. I am comparing two array CGH datasets, one generated by Nimblegen which is very complete and one generated by myself on a 70mer array with over 10,000 elements which has 3-4 replicates for three species I have Nimblegen data for. I have calculated corrected pvalues for the nimblegen set using multtest and would like to do so for the 70mer set but have issues with missing data. I used t-tests, testing for variance, that filter out or disregard the missing data for the 70mer set already using the program ACUITY to calculate p-values. I wanted to compare the corrected p-values after using a method to impute the missing data to see how different the results are from filtered dataset. My question: For a 70mer array with one oligo per open reading frame what method of data imputation is best statistically. I looked over the knn method in the package impute (mostly recommended for expression data) and impute.lowess in the package aCGH which may be optimized for high density arrays from what i can tell and my apologies if that is not the case. Does anyone have any recommendations about which method for imputing data I should try for a 70mer platform? Thank you for your time. Sincerely, Betty Gilbert -- Betty Gilbert lgilbert at berkeley.edu Taylor Lab Plant and Microbial Biology 321 Koshland Hall U.C. Berkeley Berkeley, Ca 94720
aCGH CGH multtest impute aCGH oligo aCGH CGH multtest impute aCGH oligo • 689 views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 19 days ago
United States
On Wednesday 17 January 2007 19:34, Betty Gilbert wrote: > Hello, > If this has been discussed in the archives, my apologies but I > couldn't find it. I am comparing two array CGH datasets, one > generated by Nimblegen which is very complete and one generated by > myself on a 70mer array with over 10,000 elements which has 3-4 > replicates for three species I have Nimblegen data for. I have > calculated corrected pvalues for the nimblegen set using multtest and > would like to do so for the 70mer set but have issues with missing > data. I used t-tests, testing for variance, that filter out or > disregard the missing data for the 70mer set already using the > program ACUITY to calculate p-values. > > I wanted to compare the corrected p-values after using a method to > impute the missing data to see how different the results are from > filtered dataset. > > My question: For a 70mer array with one oligo per open reading frame > what method of data imputation is best statistically. I looked over > the knn method in the package impute (mostly recommended for > expression data) and impute.lowess in the package aCGH which may be > optimized for high density arrays from what i can tell and my > apologies if that is not the case. > > Does anyone have any recommendations about which method for imputing > data I should try for a 70mer platform? Thank you for your time. A couple of questions: 1) Why are the data "missing"? Is it due to quality of the spot or due to low intensity? These are two related but different situations. 2) Why not use a package like limma, or some other package that can account for missing data and/or downweight questionable values? I don't know about ACUITY, but it sounds like it may be doing something like that. Sean
ADD COMMENT
0
Entering edit mode
Quoting Sean Davis <sdavis2 at="" mail.nih.gov="">: > On Wednesday 17 January 2007 19:34, Betty Gilbert wrote: >> Hello, >> If this has been discussed in the archives, my apologies but I >> couldn't find it. I am comparing two array CGH datasets, one >> generated by Nimblegen which is very complete and one generated by >> myself on a 70mer array with over 10,000 elements which has 3-4 >> replicates for three species I have Nimblegen data for. I have >> calculated corrected pvalues for the nimblegen set using multtest and >> would like to do so for the 70mer set but have issues with missing >> data. I used t-tests, testing for variance, that filter out or >> disregard the missing data for the 70mer set already using the >> program ACUITY to calculate p-values. >> >> I wanted to compare the corrected p-values after using a method to >> impute the missing data to see how different the results are from >> filtered dataset. >> >> My question: For a 70mer array with one oligo per open reading frame >> what method of data imputation is best statistically. I looked over >> the knn method in the package impute (mostly recommended for >> expression data) and impute.lowess in the package aCGH which may be >> optimized for high density arrays from what i can tell and my >> apologies if that is not the case. >> >> Does anyone have any recommendations about which method for imputing >> data I should try for a 70mer platform? Thank you for your time. > > A couple of questions: > > 1) Why are the data "missing"? Is it due to quality of the spot or due to > low intensity? These are two related but different situations. > > 2) Why not use a package like limma, or some other package that can account > for missing data and/or downweight questionable values? I don't know about > ACUITY, but it sounds like it may be doing something like that. > > Sean I would second that second point. We have Acuity here, and while it's proven useful to obtain some info very quickly, it is irritatingly inflexible. Quite frankly, we've all gone off it. It appears to be a GeneSpring wannabe, but doesn't quite make it, and it's far too expensive for what it is. Using R (an in particular limma) seems a bit more hard work at first, but well worth it. And free! Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK
ADD REPLY

Login before adding your answer.

Traffic: 306 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6