Good afternoon.
I'm incredibly new to processing data in R but I'll try to explain to the best of my ability.
I have a proteinGroups file from MaxQuant. I have filtered it for sparse signals as well as decoys, contaminants, and reverse signals. On top of this, I have made a separate set that is just my intensities. In this case, we are looking at a mutant vs wt organism for differential protein expression. These both have 4 replicates.
Once my data was filtered and I made a df (1014x18) containing just my intensities. In this df, I have 4 columns of mutant and 4 columns of wt intensities. I converted this df into a matrix and ran the command justvsn() for a log2 transformation and normalization.
mat<-as.matrix(prot2)
matvsn<-justvsn(mat)
rownames(matvsn)<-prot2$Protein.IDs
matvsn<-matvsn[,-1] #Just removing the NA column associated with protein IDs.
sessionInfo( )
After this, I tend to struggle with finding out what to do next.
I was told to use limma and was given a handful of resources including the limma vignette, this video (https://www.youtube.com/watch?v=Hg1abiNlPE4), and some online guides.
Where I struggle is finding out what to use for my model. Right now, I have:
design<-model.matrix(~matvsn, data=prot2)
I can almost guarantee this is wrong but I don't know what to do. If somebody could please explain what I need to do to create a linear model and get p-values for differential expression analysis, I would be extremely grateful.
Thanks.
There was a column associated with protein IDs. I turned that into row names and then removed the column where it was present due to the resulting NAs.
I have not tried the factor thing yet. As I said, I'm pretty new to this so the thought didn't cross my mind. Will do that now.
Your normalization code appears to be nonsense then. Coercing a general data.frame to matrix and running vsn on it will give nonsense. You must isolate the intensity columns into a numerical matrix of 8 columns before you can do anything.
It would be a good idea to spend more time reading the documentation before going any further so you can a firmer idea of the basic concepts. You don't have any of the basic components required for a DE analysis yet, so you're not yet at first line of the limma quick start guide and or the first line of the video.
Unless you meant running stringsasfactors for my data.frame? In which case, that had already been done when I imported the file.
I had already isolated intensities into a matrix and ran justvsn on said matrix beforehand.
I'm at a matrix of 1014x8 now. 4 columns representing mutant and 4 representing wt.
No, that's a different thing and not actually useful here. Your data.frame contains annotation for the proteins. To do a DE analysis you have to annote the samples.
OK, that's good then. Your code above shows something else however.