Question

Averaging over biological replicates with DESeq2 for heatmap plotting

0

Entering edit mode

chighfi • 0

@chighfi-17631

Last seen 6.4 years ago

I have a question on averaging biological replicates together. the code below plots data for each sample. When and how do I combine my biological replicates for plotting? I would like to combine over the CONDITION column and have tried a man ways. Thought you might have an answer.

colData(rld)

DataFrame with 6 rows and 10 columns sampleName fileName LINE EXPOSURE CONDITION TISSUE REP <factor> <factor> <factor> <factor> <factor> <factor> <integer> A1H_Acute A1H_Acute A1H_Acute CSB Acute Cocaine H 1 A2H_Acute A2H_Acute A2H_Acute CSB Acute Cocaine H 2 A3H_Acute A3H_Acute A3H_Acute CSB Acute Cocaine H 3 B1H_Acute B1H_Acute B1H_Acute CSB Acute Sucrose_C H 1 B2H_Acute B2H_Acute B2H_Acute CSB Acute Sucrose_C H 2 B3H_Acute B3H_Acute B3H_Acute CSB Acute Sucrose_C H 3 SEX individual sizeFactor <factor> <factor> <numeric> A1H_Acute M AM 1.23895646591537 A2H_Acute M AM 0.709636373005609 A3H_Acute M AM 1.39159832544129 B1H_Acute M BM 0.738832280319489 B2H_Acute M BM 0.908432365721923 B3H_Acute M BM 1.24898796150053

dds <- DESeqDataSetFromMatrix(countData = AcuteCountsMheadCO, colData = AcuteSampleTable1MheadCO, design = ~ CONDITION )

myTest<-DESeq(dds)

rld <- rlog(myTest, blind=F)

select <- order(rowMeans(counts(myTest,normalized=TRUE)),

decreasing=TRUE)[1:20]

df <- as.data.frame(colData(myTest)[,c("CONDITION","TISSUE")])

pheatmap(assay(rld)[select,], cluster_rows=FALSE, show_rownames=FALSE,

cluster_cols=FALSE, annotation_col=df)

deseq2 pheatmap average • 1.8k views

ADD COMMENT • link updated 6.4 years ago by James W. MacDonald 68k • written 6.4 years ago by chighfi • 0

score 0 · Answer 1 · 2018-10-03

There is usually no profit in doing a heatmap after combining replicates unless you have way more subjects than that. If you had maybe 20 different groups with like 6 replicates per group, it might make sense to use the mean expression values because in that scenario you might want to show broad differences in groups without confusing the issue with all those columns. But in your case you will have a 6-column heatmap, and using all the samples will allow people to see how similar the samples are, within each group. Collapsing that to a 2-column heatmap is A.) Boring and B.) Obscures information that people may want to see.

Also, unless you are really trying to show the top most highly expressed genes, your code doesn't make sense to me. In this scenario a heatmap is usually intended to show something about the set of differentially expressed genes, rather than the most highly expressed (which are probably just housekeeping genes that aren't even changing expression).