I have some gene expression data sets. In some of the data sets, I am comparing two age categories, and in some I am comparing multiple age categories. For each dataset, I want to do a two-tailed F test between the age categories to check if the slope of the curve is different from zero, which would indicate an association between expression signal and age.
I've been trying to understand the method here: https://www.biostars.org/p/222041/. Dr. Devon Ryan is so kindly trying to help me. I'm really struggling and I just don't understand specifically where in the code I am meant to change; I didn't want to keep bothering Dr. Ryan, so I emailed Prof. Smyth who said to ask here. The problem is I can get the code to run, but I want to make sure I am using the software correctly.
If someone had the time to glance at the code below and edit the line that I should change, so that the code reads: I input the pData (different age categories; sometimes two, sometimes more than 2 age categories) and gene expression data, and the output is a two-tailed F test with adjusted F test P values that tells me if the slope of the curve differs from 0, indicating a relationship between expression and age.I'm also not sure if I can use the same script for two age categories and more than two age categories.
The gene expression data (Table.tabbed) looks same for both 2-age, and more-than-2-age cases:
Gene Sample1 Sample2 Sample3 Sample4 Sample5 Sample6…. Gene1 5.2 5.05 5.01 4.96 5.31 4.834 Gene2 4.85 4.59 4.91 5.06 4.75 4.86 Gene3 7.68 7.30 7.58 7.64 7.44 7.69
and the pData file looks like this (Two-Age case):
Sample Age Sample1 Age2.5 Sample2 Age2.5 Sample3 Age2.5 Sample4 Age20 Sample5 Age20 Sample6 Age20
pData file: (More than 2 age category case):
Sample Age Sample1 3.0 Sample2 3.0 Sample3 10.0 Sample4 10.0 Sample5 50.0 Sample6 50.0
This is the code that I use for both the two-age, and more-than-two-age category data sets:
#Make an expression data set
expr <-as.matrix(read.table("Table",header=TRUE,sep="\t",row.names=1,as.is=TRUE))minimalSet <-ExpressionSet(assayData=expr) pData <-read.table("pData",row.names=1,header=TRUE,sep="\t") metadata <-data.frame(labelDescription=c("Age"),row.names=c("Age")) phenoData <-new("AnnotatedDataFrame",data=pData,varMetadata=metadata) exampleSet <-ExpressionSet(assayData=expr,phenoData=phenoData,annotation="Mouse430_2")
f <-factor(as.character(exampleSet$Age)) design <-model.matrix(~f) #I think this line is the first problem fit = eBayes(lmFit(exampleSet,design)) topTable(fit) #And I think this line is the second problem
The output (2-age case):
Removing intercept from test coefficients logFC AveExpr t P.Value adj.P.Val Gene1 -19.8 13.2 -19.6 9.3e-09 3.3e-05 Gene2 18.6 14.7 13.8 2.0e-07 3.6e-04 Gene3 13.0 12.4 11.0 1.4e-06 1.6e-03
The output (more than 2 age case):
Removing intercept from test coefficients fAge12.0 fAge23.0 fAge6.0 fAge9.0 AveExpr F PVal AdjPVal Gene1 5.3 10.08 2.4 3.2 2.0 5.7 3.5e-17 1.3e-13 Gene2 4.2 1.4 3.03 2.4 2.3 4.3 1.e-16 1.3e-12 Gene3 1.8 3.6 7.9 9.9 8.2 3.6 1.e-12 1.5e-11
If someone could tell me specifically what I should write/how I should edit the code so that I input the pData (different age categories; sometimes two, sometimes more than 2 age categories) and gene expression data, and the output is a two-tailed F test with adjusted F test P values that tells me if the slope of the curve differs from 0, indicating a relationship between expression and age. I have to stress, I can get the code to run, and I can get it to run if I modify certain things, it's just wanting to make sure that the code i'm running is actually doing what I want it to do. I appreciate it, thank you.