Question

Time series RNA-seq without conditions

0

Entering edit mode

Dunois • 0

@f7ec0822

Last seen 15 months ago

Universe

I have some general questions about doing a time series RNA-seq experiment.

The RNA is from a non-model organism. The experiment has no conditions for comparison (so just the "control" condition, essentially). Three biological replicates of around 500 individuals each were set up, and about 20-30 individuals (per replicate) were taken and sacrificed for sequencing every four hours for two days (thirteen time points in total).

So the data looks like this (with expression quantified for each Sample below):

Sample    Time    Replicate
t0_r1     00      01
t0_r2     00      02
t0_r3     00      03
.         .       .
.         .       .
.         .       .
t48_r1    48      01
t48_r2    48      02
t48_r3    48      03

The main objective of the study is to identify genes that are expressed in a circadian manner. I have decided that I will use MetaCycle to identify the "circadian genes". This analysis seems fairly straightforward. However, I looked at the PCA plot for this data, and the samples are not well-separated (and the replicates do not cluster together). But this is to be expected given that these samples are not from very "different" conditions? Should I attempt to impose some degree of separation upon the data by capturing this variation with latent variables? (E.g., using RUVSeq?)

My other question would be: what other analyses can I perform on this dataset?

For instance, would it make any sense to perform an all-vs.-all differential expression analysis and identify significantly expressed genes shared between all pairwise comparisons?

I am a bit stumped and I would be very grateful for some tips and/or pointers (publications included).

DESeq2 • 463 views

ADD COMMENT • link updated 15 months ago by ATpoint ★ 4.0k • written 15 months ago by Dunois • 0

score 1 · Answer 1 · 2023-01-03

I personally do not really go for RUVseq for this since it is unclear and not really feasable to automate the choice of k which has a strong impact on the number of genes you will call as oscillating (at least in my hands). Pairwise DE does not apply here as circadian analysis assumes that the expression follows a cosinor-like pattern so you need a framework that tests for that. Pairwise DE does not do that. In theory (I think) circadian genes should be a subset of what you get when running something like the LRT in DESeq2, just that circadian oscillation requires a good fit to the theoretical sin/cos curve. If batch effects or unwanted variation is a problem try limma (voom or trend) with a cosinor model (see the limma paper on design matrices for details) together with arrayWeights() to downweight rather than filter outlier samples. The latter helped me a lot in a project in which we had many timepoints, together with different tissues and genotypes and unwanted variation could not be regressed as it was sometimes nested with other covariates. This down-weighting is by best knowledge only available in limma, none of the established circadian tools do that.