Question

plotCounts to visualize variations of a specific genes along conditions

1

Entering edit mode

Laia ▴ 10

@239caad3

Last seen 20 months ago

Belgium

Hi Michael,

Now that I could enter my full matrix of samples and I ran LRT for each factor in my design, I would like to see in which conditions are my genes of interest being up/downregulated.

I am doing this manually by plotting specific genes using plotCounts. But sometimes I see weird outliers that were not spotted in the initial PCA for all the samples.

This is an example; in this case, sample BN has very high counts for Gene X compared with the other red replicates in this condition

I am using the dds object to plot this:

d <- plotCounts(dds, gene="GeneX", intgroup=c("FactorSH", "FactorSTt"), returnData=TRUE)

ggplot(d, aes(x = FactorSH, y = count, color = FactorSTt)) +  geom_point(position=position_jitter(w = 0.1,h = 0)) +
geom_text_repel(aes(label = rownames(d)), max.overlaps = 5) + theme_bw() + ggtitle("GeneX") + 
theme(plot.title = element_text(hjust = 0.5))

If I get signif. results in the LRT test for this gene, but when I plot it I see this, should I discard that result? Or am I just missing one normalization step to visualize those counts?

Thank you.

Laia

DESeq2 plotCounts vst • 3.1k views

ADD COMMENT • link updated 2.7 years ago by Michael Love 43k • written 2.7 years ago by Laia ▴ 10

0

Entering edit mode

I found out that BN is indeed an outlier (By plotting a hierarchical clustering heatmap).. so I think I'll have to remove this sample. But why wasn't this obvious in the PCAplot?

Also, I see that I have 4 samples (belonging to the same sequencing "batch4") that give lower counts in general compared to the other samples.

When I want to apply the batch correction by adding it into the model (~batch + factorSH + factorSTt + Interaction), I get an error because batches are not totally overlapping between conditions. Batches correspond to different sequencing days (samples that needed to be sequenced again after some technical issue).

So I tried a different approach: ComBat_seq, to get a new "adjusted counts_matrix". But then some clusters that were expected in the PCA (and that appeared initially) disappear after correction.

Is it possible that ComBat_seq is being too harsh? Can this be adjusted? In fact, I can only see a batch effect for batch4. Can this one be "prioritized" for correction?

Thank you.

Laia

ADD REPLY • link 2.7 years ago Laia ▴ 10

0

Entering edit mode

If it is an outlier in a small number of genes, it won't be PC1 or PC2, also DESeq2 picks top genes by variance for PCA plot, you can change this in plotPCA.

We have a recommendation for correcting for batches, which is to use surrogate variables (svaseq) or "factors of unwanted variation" (RUVSeq), both of which have examples in the rnaseqGene workflow:

https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#removing-hidden-batch-effects

ADD REPLY • link 2.7 years ago Michael Love 43k

Michael Love · Answer 1 · 2022-10-27

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 2 days ago

San Diego

You can remove genes like that by filtering your dds object something like this: This will only keep genes where 5 samples have at least 10 counts.

keep <- rowSums(counts(dds) >= 10) >= 5
dds <- dds[keep,]

That sample might behave better after this kind of filtering.

ADD COMMENT • link updated 2.7 years ago by Michael Love 43k • written 2.7 years ago by swbarnes2 ★ 1.4k

0

Entering edit mode

Yes recommend this type of filtering when you have outliers like this. This would go above DESeq()

ADD REPLY • link 2.7 years ago Michael Love 43k