significant analysis by moderate t test in limma package in R
1
0
Entering edit mode
meixia1019 • 0
@meixia1019-16760
Last seen 5.7 years ago

Dear friends, 

I am using limma moderate t test to do statistics on my protein intensity data. 

I have hundred proteins in treatment condition and several proteins in control from maxquant. dataset like this

noCL1 noCL2 noCl3 CL1 CL2 CL3
0 0 0 0 7448800 132190
254560 137360 0 1,94E+08 7,94E+08 1,37E+08
0 0 0 0 6227600 0
0 0 0 0 1603100 0
0 0 0 0 1529600 0
0 0 0 0 1257600 0
0 0 0 3808000 10646000 1576700
0 0 0 122300 1512100 0
0 0 0 85346 0 0
0 0 0 0 341570 0
0 0 0 0 879120 0
0 0 0 1235100 171310 0

I  want to do the significant analysis between control and treatment using moderated t test because there is really less protein in control. 

Is it possible to compare treatment intensity to 0 in control

I can run limma sucessfully using raw intensity data, but this give strange result. I want to use limma-voom, but I do not know how to transform data to log2. The thing is how about the 0 ones? should I filter them out ? 

here is the codes I use

d<- read.table(file="clipboard", sep="\t",header=TRUE)
d=data.matrix(d,rownames.force = NA)
colnames(d)<- c("nocl","nocl","nocl","cl","cl","cl")
design = model.matrix(~0+c(rep("nocl",3),rep("cl",3)))
colnames(design)<- c("nocl","cl")
fit1 = lmFit(d,design)
fit1$coefficients[1:10,]
cont.matrix <- makeContrasts(a=cl-nocl, levels=make.names(colnames(design)))
print(cont.matrix)
fit2 <- contrasts.fit(fit1, cont.matrix)
fit2 <- eBayes(fit2)

here is the result if I run in raw data 

 logFC   AveExpr         t    P.Value adj.P.Val        B
173 -10846021 5423010.5 -2.131776 0.09170832 0.4605321 -4.59512
268  -2164967 1082483.5 -2.129542 0.09195575 0.4605321 -4.59512
224  -1470787  735393.3 -2.128846 0.09203301 0.4605321 -4.59512
163  -3465934 1732966.8 -2.101319 0.09514384 0.4605321 -4.59512

here is the result if I run log2 (0 in control (which is impossible to log2) I change to 0 in the log2 dataset)

 logFC  AveExpr          t      P.Value  adj.P.Val
198  -67.33333 35.66667 -12.764984 6.801732e-05 0.02115339
132  -78.66667 41.33333  -9.739464 2.396826e-04 0.03727064
169  -86.00000 45.00000  -8.177496 5.333209e-04 0.05528760
161 -123.00000 63.50000  -7.135694 9.842751e-04 0.05928887
110 -142.66667 73.33333  -6.986866 1.081028e-03 0.05928887

I just learn limma yestday and really confused about what can I do or not. 

Thanks billions if you have some comments. 

Best 

Meixia 

 

 

 

 

 

bioconductor limma • 788 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

Analyzing proteomics data is challenging. There is likely a mixture of zeros due to two things; some proteins are probably below the detection limit for the mass spec, and other proteins are missing due to other technical issues (e.g., the proteins were probably there, but got masked by another mass or for other technical reasons). Ideally you would deal with the zeros in two separate ways; add in a small prior value for the below-detection proteins, and impute the proteins that are missing for technical reasons. But that is a non-trivial exercise.

So there are three issues here. First, you want to take logs (probably base 2) both for interpretability and because you almost certainly have a strong right skew. Second, you want to deal with the zeros so you can take logs (simple enough to just add some small constant prior to each value), and third, you may want to deal with the missing data that are due to technical issues rather than being below the limit of detection.

But do note that you are jumping into the deep end of the pool with this dataset, and if you are a novice you should seriously consider getting someone experienced with this sort of thing to help.

ADD COMMENT

Login before adding your answer.

Traffic: 682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6