Search
Question: I get nothing in my up and down regulated genes
1
gravatar for Nemo
15 months ago by
Nemo60
India
Nemo60 wrote:

Here is my clean data , I could not post the dput here

https://gist.github.com/anonymous/1f8788a5f0f3c40e55995d5c303970c6
 

Here I try to find up and down regulated genes based on LFQ intensities using limma 

design <- model.matrix(~c(rep(1,2),rep(0,2)))
fit <- lmFit(data, design)
fit2 <- eBayes(fit)
myt <- topTable(fit2, coef=2, n=Inf)

which are empty , it is because I don't have any adj.P.Val smaller than 0.05 but I don't know what criteria to select 

where do I make mistake ??

 

 

ADD COMMENTlink modified 15 months ago • written 15 months ago by Nemo60
1
gravatar for Laurent Gatto
15 months ago by
Laurent Gatto840
United Kingdom
Laurent Gatto840 wrote:

Your data contains a lot of 0 values (about 25%), which is arguable a bit suspicious.

Then, you need to log your data, or probably better data <- log2(data + 1), to get the logFC right (difference of mean intensities per group rather than ratio). If you do this, you will identify proteins that have a presence/absence pattern, relating back to my first point. With 25% of missing values, it is not unexpected to get such a pattern by chance.

ADD COMMENTlink modified 15 months ago • written 15 months ago by Laurent Gatto840

@Laurent Gatto it is right. I guess having the zeros for some proteins comes back to the fact that I analysis few groups of samples together. So, they might have not found for all samples of a group but could have intensities for another group. 

I read somewhere that he discarded proteins that had less than 50% zero values means I have 4 samples here and if there is not intensities for equal or more than 2 samples then I discard them. However, I am afraid how much this assumption hold because we have 4 samples 2 control and 2 treated. which means if I have one intensity value out of 4 in treated one, it might be ok! No?

that is why I removed all genes which had no intensities over all samples 

do you have any suggestion ? 

 

ADD REPLYlink written 15 months ago by Nemo60

The number of zeros in your data is concerning. Debating on the number of allowed 0s is not going to help, because filtering is not going to fix your issue. You should probably assess your data processing strategy in the light of this problem.

ADD REPLYlink written 15 months ago by Laurent Gatto840

@Laurent Gatto I accepted your answer and I appreciate your help. I found were those zeros are coming from and I solved the issue. 

however, I have two questions which are off topic here but seems like you know proteomics and I wanted to ask if you know or not. In a label free quantification. I have used MaxQuant and I identified many proteins. however, some of the genes are missing for some proteins , how do you handle this when you want to do pathway analysis using IPA? 

The other question is that when you want to do pathway analysis using IPA, do you use the LFQ intensities for control with all samples (biological replicate) and treated with all samples (biological replicate) or do you take the average of them and then perform pathway analysis ? 

ADD REPLYlink written 15 months ago by Nemo60

I am not familiar with IPA, so can't comment on that aspect.

I am not sure what leads to the absence of gene names. Where do the other ones come from? An online query, the protein fasta file, ...? I guess that tracking the provenance of that information will give a clue about the absence of some gene names.

ADD REPLYlink written 15 months ago by Laurent Gatto840
0
gravatar for Steve Lianoglou
15 months ago by
Genentech
Steve Lianoglou12k wrote:

What type of data is this? Why so many 0s?

It's also (obviously) not log transformed. Whatever data you've got, you'll most likely need to normalize it somehow (the how depends on the type of data), and this data will have to be passed into the lmFit function on the log2 scale.

ADD COMMENTlink written 15 months ago by Steve Lianoglou12k

@Steve Lianoglou they are LFQ intensities for proteomics. if you look at the column name, you see that I have two control and two treated (which have biological replicate). 

why so many zeros ? to be honest i don't know , it is like when you find a protein based on Mass for a sample, an intensity will be calculated. However, if it does not show up in another sample, you will have no intensity and it will be zero. what I can do is to remove those genes that have less than 50% intensities , I don't know if it is a good idea to remove all genes which have at least 1 zero, I really don't know (from scientific point of view) because as I explained above , it might appear to be for one sample of one group but not for another ! 

ADD REPLYlink written 15 months ago by Nemo60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 182 users visited in the last hour