Search
Question: I get nothing in my up and down regulated genes
1
2.1 years ago by
Nemo80
India
Nemo80 wrote:

Here is my clean data , I could not post the dput here

https://gist.github.com/anonymous/1f8788a5f0f3c40e55995d5c303970c6


Here I try to find up and down regulated genes based on LFQ intensities using limma

design <- model.matrix(~c(rep(1,2),rep(0,2)))
fit <- lmFit(data, design)
fit2 <- eBayes(fit)
myt <- topTable(fit2, coef=2, n=Inf)



which are empty , it is because I don't have any adj.P.Val smaller than 0.05 but I don't know what criteria to select

where do I make mistake ??

modified 2.1 years ago • written 2.1 years ago by Nemo80
1
2.1 years ago by
Laurent Gatto1.0k
United Kingdom
Laurent Gatto1.0k wrote:

Your data contains a lot of 0 values (about 25%), which is arguable a bit suspicious.

Then, you need to log your data, or probably better data <- log2(data + 1), to get the logFC right (difference of mean intensities per group rather than ratio). If you do this, you will identify proteins that have a presence/absence pattern, relating back to my first point. With 25% of missing values, it is not unexpected to get such a pattern by chance.

@Laurent Gatto it is right. I guess having the zeros for some proteins comes back to the fact that I analysis few groups of samples together. So, they might have not found for all samples of a group but could have intensities for another group.

I read somewhere that he discarded proteins that had less than 50% zero values means I have 4 samples here and if there is not intensities for equal or more than 2 samples then I discard them. However, I am afraid how much this assumption hold because we have 4 samples 2 control and 2 treated. which means if I have one intensity value out of 4 in treated one, it might be ok! No?

that is why I removed all genes which had no intensities over all samples

do you have any suggestion ?

The number of zeros in your data is concerning. Debating on the number of allowed 0s is not going to help, because filtering is not going to fix your issue. You should probably assess your data processing strategy in the light of this problem.

@Laurent Gatto I accepted your answer and I appreciate your help. I found were those zeros are coming from and I solved the issue.

however, I have two questions which are off topic here but seems like you know proteomics and I wanted to ask if you know or not. In a label free quantification. I have used MaxQuant and I identified many proteins. however, some of the genes are missing for some proteins , how do you handle this when you want to do pathway analysis using IPA?

The other question is that when you want to do pathway analysis using IPA, do you use the LFQ intensities for control with all samples (biological replicate) and treated with all samples (biological replicate) or do you take the average of them and then perform pathway analysis ?

I am not familiar with IPA, so can't comment on that aspect.

I am not sure what leads to the absence of gene names. Where do the other ones come from? An online query, the protein fasta file, ...? I guess that tracking the provenance of that information will give a clue about the absence of some gene names.

0
2.1 years ago by
Denali
Steve Lianoglou12k wrote:

What type of data is this? Why so many 0s?

It's also (obviously) not log transformed. Whatever data you've got, you'll most likely need to normalize it somehow (the how depends on the type of data), and this data will have to be passed into the lmFit function on the log2 scale.

@Steve Lianoglou they are LFQ intensities for proteomics. if you look at the column name, you see that I have two control and two treated (which have biological replicate).

why so many zeros ? to be honest i don't know , it is like when you find a protein based on Mass for a sample, an intensity will be calculated. However, if it does not show up in another sample, you will have no intensity and it will be zero. what I can do is to remove those genes that have less than 50% intensities , I don't know if it is a good idea to remove all genes which have at least 1 zero, I really don't know (from scientific point of view) because as I explained above , it might appear to be for one sample of one group but not for another !