Question

In need of linear modeling help for differential expression (proteomics)

0

Entering edit mode

Chris • 0

@67b280f5

Last seen 3.6 years ago

Good afternoon.
I'm incredibly new to processing data in R but I'll try to explain to the best of my ability.

I have a proteinGroups file from MaxQuant. I have filtered it for sparse signals as well as decoys, contaminants, and reverse signals. On top of this, I have made a separate set that is just my intensities. In this case, we are looking at a mutant vs wt organism for differential protein expression. These both have 4 replicates.

Once my data was filtered and I made a df (1014x18) containing just my intensities. In this df, I have 4 columns of mutant and 4 columns of wt intensities. I converted this df into a matrix and ran the command justvsn() for a log2 transformation and normalization.


mat<-as.matrix(prot2)
matvsn<-justvsn(mat)

rownames(matvsn)<-prot2$Protein.IDs

matvsn<-matvsn[,-1] #Just removing the NA column associated with protein IDs.



sessionInfo( )

After this, I tend to struggle with finding out what to do next.
I was told to use limma and was given a handful of resources including the limma vignette, this video (https://www.youtube.com/watch?v=Hg1abiNlPE4), and some online guides.

Where I struggle is finding out what to use for my model. Right now, I have:


design<-model.matrix(~matvsn, data=prot2)

I can almost guarantee this is wrong but I don't know what to do. If somebody could please explain what I need to do to create a linear model and get p-values for differential expression analysis, I would be extremely grateful.

Thanks.

Proteomics limma vsn • 1.5k views

ADD COMMENT • link updated 3.6 years ago by Gordon Smyth 52k • written 3.6 years ago by Chris • 0

score 0 · Answer 1 · 2021-08-25

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 10 hours ago

WEHI, Melbourne, Australia

I made a df (1014x18) containing just my intensities.

You have 18 columns of intensities? Or did you mean to say that the df has 8 columns?

matvsn<-matvsn[,-1] #Just removing the NA column associated with protein IDs.

That doesn't make sense. According to your code, there is no column associated with protein IDs. The protein IDs were row.names, not a column.

Where I struggle is finding out what to use for my model

Well, you obviously can't do any DE analysis until you specify which columns are mutant and which are wt. Have you done that? Have you created a factor (of length 8 with two levels) that distinguishes mutant from wt?

ADD COMMENT • link 3.6 years ago Gordon Smyth 52k

0

Entering edit mode

Some are pooled data and others are QC. Only 8 of these are wt and mutant.

There was a column associated with protein IDs. I turned that into row names and then removed the column where it was present due to the resulting NAs.

I have not tried the factor thing yet. As I said, I'm pretty new to this so the thought didn't cross my mind. Will do that now.

ADD REPLY • link 3.6 years ago Chris • 0

0

Entering edit mode

Some are pooled data and others are QC. Only 8 of these are wt and mutant

Your normalization code appears to be nonsense then. Coercing a general data.frame to matrix and running vsn on it will give nonsense. You must isolate the intensity columns into a numerical matrix of 8 columns before you can do anything.

It would be a good idea to spend more time reading the documentation before going any further so you can a firmer idea of the basic concepts. You don't have any of the basic components required for a DE analysis yet, so you're not yet at first line of the limma quick start guide and or the first line of the video.

ADD REPLY • link 3.6 years ago Gordon Smyth 52k

0

Entering edit mode

Unless you meant running stringsasfactors for my data.frame? In which case, that had already been done when I imported the file.

I had already isolated intensities into a matrix and ran justvsn on said matrix beforehand.

I'm at a matrix of 1014x8 now. 4 columns representing mutant and 4 representing wt.

ADD REPLY • link 3.6 years ago Chris • 0

0

Entering edit mode

Unless you meant running stringsasfactors for my data.frame

No, that's a different thing and not actually useful here. Your data.frame contains annotation for the proteins. To do a DE analysis you have to annote the samples.

I'm at a matrix of 1014x8 now

OK, that's good then. Your code above shows something else however.

ADD REPLY • link 3.6 years ago Gordon Smyth 52k