Heatmap of a subset of genes
2
0
Entering edit mode
giroudpaul ▴ 40
@giroudpaul-10031
Last seen 21 months ago
France

Hi.

I am trying to reproduce this kind of figure (Chemokines differentially regulated in the five differents cells type on top) :

I would like to do the same, first with just M1 and M2 conditions, and eventually more. I would also like to be able to choose the genes in this heatmap (if you have advises on how to do this, I'll take it !)

So for now I have :

1. the AffyBatch form the CEL files
2. the annotated expression set normalized with rma().
3. The MarrayLM from lmfit(data.rma,design)
4. The MarrayLM from eBayes(data.fit) with pair-wise comparisons (contrasts)

From which one should I build the matrix for heatmap() ? How ?

I want to focus on gene coding for plasma membrane proteins, is there a way to do this ?

heatmap • 3.3k views
1
Entering edit mode
chris86 ▴ 390
@chris86-8408
Last seen 21 months ago
UCL, United Kingdom

So you want to work with the normalised expression matrix. I normally start with a data frame actually, they are easier to sort and subset etc. You can convert between them using as.data.frame() and as.matrix(), but it may be easier just to write the matrix then read it in again, that is what I do after neqc() normalisation.

I always have an annotation file (data frame) to use which will be in the same order as the columns in the matrix/ data frame. The annotation file can be used to add annotation in the heatmap such as the M1 and M2 conditions you describe. I prefer to use aheatmap function from NMF, it is easier to use than heatmap.2, and I think the output looks better. People normally select a subset of genes to use, such as the most variable or differentially expressed from limma.

You can easily subset the genes which should be the row names in your data frame using, I would do subsetteddf <- subset(newdf, row.names %in% listofgenes) to get a subsetted data frame. Then you can convert it back to a matrix for plotting.

0
Entering edit mode

So correct me if I'm wrong:

data.rma <- rma(data)
data.fit = lmFit(data.rma,design)
data.fit.eb <- eBayes(data.fit)

And using topTable, I extract expression values for M1 and M2, and then do the subset, and that's it ?

However, If I do this, I get Expression values, is it ok to plot this or do I have to transform it into something else (log ? But it won't be logFC since I don't compare to another condition)

0
Entering edit mode

You dont want to extract any expression values or fold changes from limma. Get the expression in normalised intensities after you normalise using some function you have. Then get the genes that are DE from limma seperately.  Then select those genes in the data frame and then run aheatmap or heatmap.2 or whatever you have.

0
Entering edit mode

Yeah, thanks, so I did extracted the exprs(data.rma)corresponding to the gene I identified with Limma.

I tried heatmap, heatmap.2 and aheatmap, and I also find aheatmap is the easiest and looks betters than the other two.

Two questions though. Should I Scale the data if I just take upregulated gene ? Because using scale=row make one condition, the upregulated one all red with about +1.5 Z-Score, and the other, all white with Z-scores about -1.5. Doesn't it make it confusing ? This genes are not downregulated in the second conditions, there just less expressed. Or is it the same. On the other hand, using normalized expression values may not be the most informative thing to plot right ?

Also, how do you control the row label size. It seems that cexRow does nothing.

1
Entering edit mode

I don't think there is a right way of doing it with the scaling, I scale each row, but I think the results should be similar. I have never found the row label size to be a problem with aheatmap, there may be more options in the developer version on github. If not you will have to edit the code yourself.

0
Entering edit mode

Yes, the devel version (0.22 or higher) is way better than the Bioconductor one (0.20)

However, I had a hard time installing it on windows... (Need to install Rtools first)

0
Entering edit mode
caroline • 0
@caroline-7721
Last seen 3.5 years ago

The chemokines family of proteins has broad, diverse functional repertoires. However, their structural variation is narrow. Chemokines are small (8-10 kDa), secreted single polypeptide chains 70-100 residues long. Across the family, the proteins have 20-95% amino acid sequence identity (including conserved cysteine residues) and new members are continuing to be identified at a rapid pace;