normalization of a microarray like dataframe and removing missing data by % missing
0
0
Entering edit mode
ALAN SMITH ▴ 40
@alan-smith-1941
Last seen 9.6 years ago
Hello, I have several questions about data normalization of a large matrix of intensity data (21269,72) (non-microarray data). summary(MYdata) #### example of data NOTE many NAs ######## ID a b c Min. : 1 Min. : 2003 Min. : 2008 Min. : 2001 1st Qu.: 5318 1st Qu.: 4027 1st Qu.: 4155 1st Qu.: 4331 Median :10635 Median : 7635 Median : 7570 Median : 8006 Mean :10635 Mean : 57586 Mean : 73246 Mean : 101309 3rd Qu.:15952 3rd Qu.: 17191 3rd Qu.: 18076 3rd Qu.: 18843 Max. :21269 Max. :20335320 Max. :30073282 Max. :27649912 NA's : 18323 NA's : 18467 NA's : 18471 ########################################################## What would be the best way to normalize or preprocess this type of data (80%+ missing)? A log2 transformation creates nice "similar" shaped distributions with different medians. Currently I do this to normalize divide column data by (column median/min column meidan) divide row data by (row median/min row meidan) Repeat 1 more time *the method I am using turns the distribution into a spike shape.* Is the above method acceptable for for future statistical applications? Are there better normalization methods I can use? NA question I would like to remove all of the rows with less than 30% missing values before continuing with normalization, but I cannot figure out how. Is there a way to remove all rows that have say more than 30% missing data? If i could just count the number of NAs in a row and divide it by the number of columns i would be in good shape, but I cant figure out how to do this. NA.OMIT is too harsh and remove most of the rows. Thanks much, Alan
Normalization Normalization • 921 views
ADD COMMENT

Login before adding your answer.

Traffic: 586 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6