Question

imputing missing data for 70mer array platform, need advice

0

Entering edit mode

Betty Gilbert ▴ 50

@betty-gilbert-1120

Last seen 9.6 years ago

Hello, If this has been discussed in the archives, my apologies but I couldn't find it. I am comparing two array CGH datasets, one generated by Nimblegen which is very complete and one generated by myself on a 70mer array with over 10,000 elements which has 3-4 replicates for three species I have Nimblegen data for. I have calculated corrected pvalues for the nimblegen set using multtest and would like to do so for the 70mer set but have issues with missing data. I used t-tests, testing for variance, that filter out or disregard the missing data for the 70mer set already using the program ACUITY to calculate p-values. I wanted to compare the corrected p-values after using a method to impute the missing data to see how different the results are from filtered dataset. My question: For a 70mer array with one oligo per open reading frame what method of data imputation is best statistically. I looked over the knn method in the package impute (mostly recommended for expression data) and impute.lowess in the package aCGH which may be optimized for high density arrays from what i can tell and my apologies if that is not the case. Does anyone have any recommendations about which method for imputing data I should try for a 70mer platform? Thank you for your time. Sincerely, Betty Gilbert -- Betty Gilbert lgilbert at berkeley.edu Taylor Lab Plant and Microbial Biology 321 Koshland Hall U.C. Berkeley Berkeley, Ca 94720

aCGH CGH multtest impute aCGH oligo aCGH CGH multtest impute aCGH oligo • 1.2k views

ADD COMMENT • link updated 17.3 years ago by Sean Davis 21k • written 17.3 years ago by Betty Gilbert ▴ 50

score 0 · Answer 1 · 2007-01-18

On Wednesday 17 January 2007 19:34, Betty Gilbert wrote: > Hello, > If this has been discussed in the archives, my apologies but I > couldn't find it. I am comparing two array CGH datasets, one > generated by Nimblegen which is very complete and one generated by > myself on a 70mer array with over 10,000 elements which has 3-4 > replicates for three species I have Nimblegen data for. I have > calculated corrected pvalues for the nimblegen set using multtest and > would like to do so for the 70mer set but have issues with missing > data. I used t-tests, testing for variance, that filter out or > disregard the missing data for the 70mer set already using the > program ACUITY to calculate p-values. > > I wanted to compare the corrected p-values after using a method to > impute the missing data to see how different the results are from > filtered dataset. > > My question: For a 70mer array with one oligo per open reading frame > what method of data imputation is best statistically. I looked over > the knn method in the package impute (mostly recommended for > expression data) and impute.lowess in the package aCGH which may be > optimized for high density arrays from what i can tell and my > apologies if that is not the case. > > Does anyone have any recommendations about which method for imputing > data I should try for a 70mer platform? Thank you for your time. A couple of questions: 1) Why are the data "missing"? Is it due to quality of the spot or due to low intensity? These are two related but different situations. 2) Why not use a package like limma, or some other package that can account for missing data and/or downweight questionable values? I don't know about ACUITY, but it sounds like it may be doing something like that. Sean