Hi all,
I'm analyzing the microarray data using affymetrix , my data has downloaded from GEO, Series_geo_accession "GSE7700".
They used microarray to look at the targte genes regulated by Yes-associated protein (YAP) in normal breast luminal cell lines and breast cancer cell lines.
I would like to analysis my data using a linear model "limma", but I have some problem about (design matrix) .
Can any one explain to me which one of the following is right ?
The samples were :
GSM187268 MDA MB231 control cell pmig / pRS-IRES-GFP rep1
GSM187269 MDA MB231 control cell pmig / pRS-IRES-GFP rep2
GSM187271 MDA MB231 YAP knock down cell pmig-YAP/pRS-IRES-GFP-YAP rep1
GSM187272 MDA MB231 YAP knock down cell pmig-YAP/pRS-IRES-GFP-YAP rep2
GSM187273 Normal luminal control cell pmig / pRS-IRES-GFP rep1
GSM187274 Normal luminal control cell pmig / pRS-IRES-GFP rep2
GSM187275 Normal luminal control cell pmig / pRS-IRES-GFP rep3
GSM187276 Normal luminal YAP knock down cell pmig-YAP/pRS-IRES-GFP-YAP rep1
GSM187277 Normal luminal YAP knock down cell pmig-YAP/pRS-IRES-GFP-YAP rep2
GSM187278 Normal luminal YAP knock down cell pmig-YAP/pRS-IRES-GFP-YAP rep3
As it shown above ,I think we have 2 type of cell lines (normal & cancer breast cells) and 4 groups of them (cancer control cells, cancer YAP knock down cells. normal control cells and normal YAP knock down cells)
........................................................................................................................................................................................
> groups<-c("CC", "CC", "CT", "CT", "NC", "NC", "NC", "NT", "NT", "NT") > groups<-as.factor(groups) > design<-model.matrix(~groups) > design (Intercept) groupsCT groupsNC groupsNT 1 1 0 0 0 2 1 0 0 0 3 1 1 0 0 4 1 1 0 0 5 1 0 1 0 6 1 0 1 0 7 1 0 1 0 8 1 0 0 1 9 1 0 0 1 10 1 0 0 1 attr(,"assign") [1] 0 1 1 1 attr(,"contrasts") attr(,"contrasts")$groups [1] "contr.treatment"
> fit<-lmFit(data.f, design) > fit<-eBayes(fit) > toptable(fit, coef=2) logFC t P.Value adj.P.Val B 202350_s_at -1.564580 -14.70096 1.327425e-08 0.0005127857 9.043119 202238_s_at -1.282140 -14.05648 2.128833e-08 0.0005127857 8.727672 228450_at -1.505202 -13.68802 2.813639e-08 0.0005127857 8.536645 224895_at -1.590910 -13.21231 4.074283e-08 0.0005569036 8.277724 223395_at 1.852093 12.44041 7.624968e-08 0.0008337903 7.825825 213342_at -1.572131 -11.74764 1.378619e-07 0.0012562665 7.383662 219327_s_at 1.201714 11.42378 1.837460e-07 0.0014351875 7.164063 224894_at -1.752598 -11.18234 2.286805e-07 0.0015628883 6.994676 206067_s_at -1.158799 -10.79151 3.286868e-07 0.0019967725 6.709761 222486_s_at -1.152186 -10.48365 4.408662e-07 0.0022142264 6.475568
.............................................................................................
Or:
> design<-model.matrix(~0+factor(c("CC", "CC", "CT", "CT", "NC", "NC", "NC", "NT", "NT", "NT"))) > colnames(design)<-c("CC","CT","NC","NT") > colnames(design) [1] "CC" "CT" "NC" "NT" > design CC CT NC NT 1 1 0 0 0 2 1 0 0 0 3 0 1 0 0 4 0 1 0 0 5 0 0 1 0 6 0 0 1 0 7 0 0 1 0 8 0 0 0 1 9 0 0 0 1 10 0 0 0 1 attr(,"assign") [1] 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$`factor(c("CC", "CC", "CT", "CT", "NC", "NC", "NC", "NT", "NT", "NT"))` [1] "contr.treatment"
> fit<-lmFit(data.f, design) > fit<-eBayes(fit) > toptable(fit, coef=2) logFC t P.Value adj.P.Val B AFFX-hum_alu_at 13.84261 222.2422 1.563089e-21 2.535286e-18 34.98391 201492_s_at 13.88974 220.7521 1.683748e-21 2.535286e-18 34.95969 1553588_at 13.85768 220.3090 1.721551e-21 2.535286e-18 34.95241 208856_x_at 13.59860 217.8449 1.949441e-21 2.535286e-18 34.91130 201257_x_at 13.60074 217.6522 1.968609e-21 2.535286e-18 34.90803 200095_x_at 13.62670 217.6191 1.971917e-21 2.535286e-18 34.90747 201429_s_at 13.87166 217.0458 2.030260e-21 2.535286e-18 34.89772 1553570_x_at 13.64131 216.8168 2.054088e-21 2.535286e-18 34.89381 200834_s_at 13.49999 216.4295 2.095081e-21 2.535286e-18 34.88717 213084_x_at 13.91086 216.1968 2.120133e-21 2.535286e-18 34.88317
........................................................................................................................................................................................
Which of the above was right ?
were there any significantly differentially expressed genes from the topTable?
Cheers,
Tahani.H
Thanks a lot dear Aaron Lun for your answer . So I'll use the first approach because I really want to detect DE between CT and CC .
Here from the topTable above , is the first column (without a name) mean the names of the genes or ?
because I read it in some papers that the first column is the row
number in the original data!!!
The unnamed "first column" represents the row names of the table. In this case, they're probe names, as the
*_at
suffixes suggest. Clearly they're character strings, not numbers, so whatever papers you're reading don't apply here. If you want to convert from probe names to gene names, you need the relevant annotation package depending on the chip being used, e.g., hgu133a.db. You can thenselect
for symbol or Entrez ID using the probe names as keys.Thanks so much sir,
It's clear to me now .
Regards,
Tahani.H
Hi Aaron Lun
While I was trying to performing the gene set test for KEGG pathways I had a problem with (globaltest):
Also I tried:
where was the wrong? Any advice please?
There's no
globaltest
function in limma.Is there any method for performing the gene set test for KEGG pathways ??
Which function could I use?
?kegga
. Or?mroast
.Thank you .
Dear Aaron Lun :
I'm trying to Visualizing the K-means clustering of the same data above, the optimal number of clusters was about 49.
I use the following code :
where was the wrong?
Cheers.
Tahani.H
Well, the error message explains itself. You're probably trying to stuff too many plots onto a single device with the
mfrow
setting, such that the margins of the plots take up more space than the plotting area itself. You either need to increase with size of the image, or put each plot onto a separate page. Personally, I would wrap the code above in apdf
call without settingpar(mfrow)
to anything:... so each plot gets its own page in the PDF.
Yes I got it . Thank you so much