I am new at using the packages limma, and GEOquery.
I have read the documentation available for both, and I am still a bit lost.
This is what I have done:
I downloaded a GDS data set and stored in my R session using GEOquery (i.e., GDS <- getGEO("GDSxxx"), where xxx is the identifier).
Examined Meta(GDS) entries and can see clear information about its contents (i.e: Table(gds)[1:5, ] for example).
I know there are 44 columns of sampled data with about 54K rows which correspond to genes. About 20 of these columns indicate that they are normal samples, and the rest are not.
I can also read the disease.state information and also in the description which columns are normal and which ones are NOT normal.
I need to identify the differential genes between the normal and abnormal. Therefore I needed to start by creating a design matrix.
I created the design matrix, and ran lmFit, and eBayes. They run fine, no errors.
The p-values of my output after eBayes is too small (in the order of 10^-67), and I am sure I am doing something wrong when getting the data ready for the analysis.
I have tried the following command:
eset <- GDS2eSet(GDS, do.log = TRUE) setting do.log to both TRUE and FALSE. With FALSE, the numbers are around 10^-54, with true around 10^-67. I also used topTreat, and that gives even smaller p-values in the order of 10^110.
These are some of the entries from my source GDS file:
ID_REF IDENTIFIER GSM549121 GSM549102 GSM549104
1 1007_s_at MIR4640 10.8591 1 1.1874 9.33047
2 1053_at RFC2 6.53584 6.33768 6.26199
3 117_at HSPA6 5.72289 5.79694 5.64208
4 121_at PAX8 9.28742 8.38915 8.19872
5 1255_g_at GUCA1A 3.65037 3.77146 3.63181
Any tips are appreciated.