IMPORTANT: I am using a "dummy" dataset to just get my working right and get used to Limma workflows, as pointed out the values are raw fluorescence value while I await the appropriate transformed expression data. The "real" data will come in the same txt format but with the normalised and transformed data.
I have put explicit questions highlighted in yellow.
1) I have some expression data from a protein microarray in txt format:
>head(exprs[,1:5])
RF0001 RF0002 RF0005 RF0006 RF0008
COL4A3BP 4244 9961 8018 4867 3819
IRAK4 626 27775 8451 14370 336
BCL7C 998 1110 563 842 285
EIF4EBP3 763 2150 963 1748 805
PHGDH 2202 8624 4421 3264 5085
PTPN2 590 972 387 1718 679
2) And have constructed some corresponding phenotypic data:
> head(pData)
group days card
RF0001 case 8 yes
RF0002 control control_NA na
RF0005 case 8 yes
RF0006 control control_NA na
RF0008 case 7 yes
RF0009 case 19 yes
> summary(pData)
group days card
case :74 control_NA:78 na :78
control:75 10 : 5 no :15
7 : 5 yes:56
11 : 4
22 : 4
28 : 4
(Other) :49
As you can see I have 2 "groups" with 74 cases and 75 controls, but for this analysis I would also like to compare the "card" groups with "na (being the control group) vs yes", "na vs no" and "yes vs no".
3) I add some metadata and create an ExpressionSet
>metadata <- data.frame(labelDescription=c("Case/control status","days of blood draw","cardirus status or control"),row.names=c("case_control", "days", "card"))
>phenoData <- new("AnnotatedDataFrame",data=pData, varMetadata=metadata)
Which works fine:
>focussed_array_set <- ExpressionSet(assayData=exprs,phenoData=phenoData)
> focussed_array_set
ExpressionSet (storageMode: lockedEnvironment)
assayData: 34 features, 149 samples
element names: exprs
protocolData: none
phenoData
sampleNames: RF0001 RF0002 ... RF0650 (149 total)
varLabels: group days card
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:
This is where I get a little lost, and I have read the "Limma" user guide and a few tutorials to guide me but would like to make sure I am doing things correctly.
4) I set the factors to the groups in my ExpressionSet, set up my design,
>groups <- factor(focussed_array_set$card)
> head(design)
groupsna groupsno groupsyes
1 0 0 1
2 1 0 0
3 0 0 1
4 1 0 0
5 0 0 1
6 0 0 1
5) I fit the linear model, set a contrast matrix and run the contrasts and a bayes fit.
>fit <- lmFit(focussed_array_set, design)
>contrast.matrix <- makeContrasts(groupsyes-groupsno, groupsno-groupsna, groupsyes-groupsna,
levels=design)
>fit2 <- contrasts.fit(fit, contrast.matrix)
>fit2 <- eBayes(fit2)
Q1) I would like to re-nename the columns in the design matrix using something like:
>colnames(design) <- c("na","no","yes")
But if I try this I later get an error message which is fairly self explanatory but how would I get around this?
"Warning message:
In contrasts.fit(fit, contrast.matrix) :
row names of contrasts don't match col names of coefficients
6) I can then pull out the differentially expressed proteins for each contrast:
topTable(fit2, coef=1, adjust.method ="BH")
topTable(fit2, coef=2, adjust.method ="BH")
topTable(fit2, coef=3, adjust.method ="BH")
> topTable(fit2, coef=1, adjust.method ="BH")
logFC AveExpr t P.Value adj.P.Val B
BOD1L2 1938.9833 1902.443 3.446549 0.0007404505 0.02517532 -4.594686
SCHIP1 -2816.1060 1773.396 -2.234502 0.0269579456 0.42485775 -4.594953
PDGFC -1172.7036 1509.685 -1.989672 0.0484812293 0.42485775 -4.594996
RHBDD1 1729.8881 2693.557 1.767317 0.0792490734 0.42485775 -4.595030
TIGD4 -5225.1845 5548.067 -1.746642 0.0827869724 0.42485775 -4.595033
SLC4A1AP 3523.9500 5021.651 1.583170 0.1155306453 0.42485775 -4.595056
CLUL1 -875.0940 1611.497 -1.564839 0.1197697363 0.42485775 -4.595058
LASP1 581.8988 1160.899 1.561791 0.1204862427 0.42485775 -4.595059
COL4A3BP -3950.4571 8249.201 -1.558048 0.1213710510 0.42485775 -4.595059
IRAK4 6789.8845 9758.685 1.530200 0.1281152297 0.42485775 -4.595063
Q2) I understand that coef=1 will mean the first contrast ("groupsyes - groupsno") which I can get the order from:
> contrast.matrix
Contrasts
Levels groupsyes - groupsno groupsno - groupsna groupsyes - groupsna
groupsna 0 -1 -1
groupsno -1 1 0
groupsyes 1 0 1
But is there a way to explicitly ask for "groupsyes - groupsno" in order to not call the wrong contrast?
- Q3) How can I view and export a table giving the results of each contract (coef=1,2 and 3) as in the "topTable" call but with all of the proteins?
- And how can I extract the actual "expression" values for each protein in order to plot in heat-maps and clustering etc?
Q4) The main general question here is if a) is this analysis achieving what I want it to? b) have I excepted the code in the most efficient way?
No, your analysis isn't right. You've input raw fluescences to limma, whereas the expression data needs to be on the log-scale. (As the help page for lmFit will tell you.)
The solution might be as simple as logging the exprs matrix (with log2), but I want to first check that you don't have negative or zero values. Can you show me the output from this:
?
Thanks Gordon,
Sorry I should have clarified I am not using the right data. I am using the raw values to get my head around the structures etc in Lima. I have edited the post to highlight this.
No, you are misunderstanding. You don't need to wait for the appropriate transformed data. You just make the transformation quickly and easy as part of the analysis in R.
There is no point at all in doing a meaningless analysis. It is just as quick and easy to do the right analysis. And then you will have some chance of judging whether the results you get look correct.
Could you please answer the question I asked you? I can't help you unless you give me the information I need.
Whoops, miss read your comment again sorry, yes sure:
output of original data:
> summary( apply(exprs,2,min) ) Min. 1st Qu. Median Mean 3rd Qu. Max. 5.0 60.0 101.0 127.9 165.0 716.0
Same but with log2 transformed values:
> summary( apply(exprs,2,min) ) Min. 1st Qu. Median Mean 3rd Qu. Max. 2.322 5.907 6.658 6.599 7.366 9.484
So looks like no negative values,
Yes the data make allot more sense once log2 transformed thank you for that.
OK, thanks.
Now a 2nd question. The output you give from
head(pData)
andsummary(pData)
are contradictory. The output fromsummary
says thatpData$card
has 3 possible values: "na", "no" and "yes". However the output from head shows that the first 6 values are all either "carditis" or "control_NA", neither of which belong to 3 possible values above.What gives? It's pretty hard to figure out the structure of your experiment from conflicting information.
I feel like I will be apologising allot if I'm not careful, I will be next time, sorry.
I changed all me phenotypic data half way through writing the post as I realised some of my nomenclature was too long winded so I originally had control_NA which I replaced with na,
I have updated the post to reflect this now hope it makes sense