Question

significant analysis by moderate t test in limma package in R

0

Entering edit mode

meixia1019 • 0

@meixia1019-16760

Last seen 5.7 years ago

Dear friends,

I am using limma moderate t test to do statistics on my protein intensity data.

I have hundred proteins in treatment condition and several proteins in control from maxquant. dataset like this

noCL1	noCL2	noCl3	CL1	CL2	CL3
0	0	0	0	7448800	132190
254560	137360	0	1,94E+08	7,94E+08	1,37E+08
0	0	0	0	6227600	0
0	0	0	0	1603100	0
0	0	0	0	1529600	0
0	0	0	0	1257600	0
0	0	0	3808000	10646000	1576700
0	0	0	122300	1512100	0
0	0	0	85346	0	0
0	0	0	0	341570	0
0	0	0	0	879120	0
0	0	0	1235100	171310	0

I want to do the significant analysis between control and treatment using moderated t test because there is really less protein in control.

Is it possible to compare treatment intensity to 0 in control?

I can run limma sucessfully using raw intensity data, but this give strange result. I want to use limma-voom, but I do not know how to transform data to log2. The thing is how about the 0 ones? should I filter them out ?

here is the codes I use

d<- read.table(file="clipboard", sep="\t",header=TRUE)
d=data.matrix(d,rownames.force = NA)
colnames(d)<- c("nocl","nocl","nocl","cl","cl","cl")
design = model.matrix(~0+c(rep("nocl",3),rep("cl",3)))
colnames(design)<- c("nocl","cl")
fit1 = lmFit(d,design)
fit1$coefficients[1:10,]
cont.matrix <- makeContrasts(a=cl-nocl, levels=make.names(colnames(design)))
print(cont.matrix)
fit2 <- contrasts.fit(fit1, cont.matrix)
fit2 <- eBayes(fit2)

here is the result if I run in raw data

logFC AveExpr t P.Value adj.P.Val B
173 -10846021 5423010.5 -2.131776 0.09170832 0.4605321 -4.59512
268 -2164967 1082483.5 -2.129542 0.09195575 0.4605321 -4.59512
224 -1470787 735393.3 -2.128846 0.09203301 0.4605321 -4.59512
163 -3465934 1732966.8 -2.101319 0.09514384 0.4605321 -4.59512

here is the result if I run log2 (0 in control (which is impossible to log2) I change to 0 in the log2 dataset)

logFC AveExpr t P.Value adj.P.Val
198 -67.33333 35.66667 -12.764984 6.801732e-05 0.02115339
132 -78.66667 41.33333 -9.739464 2.396826e-04 0.03727064
169 -86.00000 45.00000 -8.177496 5.333209e-04 0.05528760
161 -123.00000 63.50000 -7.135694 9.842751e-04 0.05928887
110 -142.66667 73.33333 -6.986866 1.081028e-03 0.05928887

I just learn limma yestday and really confused about what can I do or not.

Thanks billions if you have some comments.

Best

Meixia

bioconductor limma • 788 views

ADD COMMENT • link updated 5.7 years ago by James W. MacDonald 65k • written 5.7 years ago by meixia1019 • 0

score 1 · Answer 1 · 2018-08-02

Analyzing proteomics data is challenging. There is likely a mixture of zeros due to two things; some proteins are probably below the detection limit for the mass spec, and other proteins are missing due to other technical issues (e.g., the proteins were probably there, but got masked by another mass or for other technical reasons). Ideally you would deal with the zeros in two separate ways; add in a small prior value for the below-detection proteins, and impute the proteins that are missing for technical reasons. But that is a non-trivial exercise.

So there are three issues here. First, you want to take logs (probably base 2) both for interpretability and because you almost certainly have a strong right skew. Second, you want to deal with the zeros so you can take logs (simple enough to just add some small constant prior to each value), and third, you may want to deal with the missing data that are due to technical issues rather than being below the limit of detection.

But do note that you are jumping into the deep end of the pool with this dataset, and if you are a novice you should seriously consider getting someone experienced with this sort of thing to help.