Please accept my apologize in advance for such a lengthy question. I have a data set of 200 RNAseq samples from an experiment on two treatments(trtR/trtNR) and the sampling is done at 7 different time-points. I have different number of replicates: for trtR two replicates and for trtNR 5 replicates for each time-points (some time number of replicate is more).
I want to employ a generalized linear model that allows me to quantify the genes that are differentially expressed at each time point and also across all time points. Then a likelihood ratio test to identify differentially expressed genes. I learned from this question https://www.biostars.org/p/133304/ that if I want to have linear affect of time on gene expression, I need to introduce time as a numeric covariate rather than a factor. Therefore, here is my model:
>Time=as.numeric(sampleTable3$Timepoint) >Trt_type=sampleTable3$Treatment_type >coldata <- data.frame(Time, Treatment_type) >rownames(coldata) <- colnames(txi$counts) >ddsTxi_time <- DESeqDataSetFromTximport(txi, colData=coldata, design=~ Time + Trt_type:Time + Trt_type) >colData(ddsTxi_time)$Time<-factor(colData(ddsTxi_time)$Time, levels=c("0", "4", "24", "72", "336", "1008","2352")) >ddsTxi_time= ddsTxi_time[rowSums(counts(ddsTxi_time))>1,] >dds_time <- DESeq(ddsTxi_time, test="LRT", reduced = ~ 1) >res_time <- results(dds_time) >colData(dds_time) DataFrame with 63 rows and 3 columns Time2 Treatment_Response2 replaceable <numeric> <factor> <logical> F02242_NR_72h_23 72 NR FALSE F02245_NR_14w_23 2352 NR FALSE ... ... ... ... F02382_R_14w_29 2352 R TRUE F02384_R_72h_29 72 R FALSE >resultsNames(dds_time)  "Intercept" "Time"  "Trt_typeR_vs_NR" "Time.Trt_type.R" mcols(res_time5)$description  "mean of normalized counts for all samples"  "log2 fold change (MLE): Time.Trt_type.R"  "standard error: Time.Trt_type.R"  "LRT statistic: '~ Time + Trt_type:Time + Trt_type' vs '~ 1'"  "LRT p-value: '~ Time + Trt_type:Time + Trt_type' vs '~ 1'"  "BH adjusted p-values"
1: Is this model correct or should I introduce treatment type as a numeric covariate as well?
2: The L2FC is coming from Time.Treatment typeR wich it is the l2fc from the genes with lowest p-value over time effect in treatmentR. Correct?
For having the effect of time on Trt_type.NR (l2fc and p-value), I did:
is this correct? which this one is just giving me the l2fc in this level and not the p-values. How can I get the genes that are uniquely differentially expressed in trtNR with their corresponding p-value (or should I run the analysis two times separately for each treatment and then check the shared/unique DEGs).
The second part of my question is about removing the outlier, as you see below, I have many outliers.
>summary(res_time) out of 42823 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 373, 0.87% LFC < 0 (down) : 376, 0.88% outliers  : 112, 0.26% low counts  : 11622, 27% (mean count < 1)  see 'cooksCutoff' argument of ?results  see 'independentFiltering' argument of ?results
Should I remove this outliers from my result before further analysis?
I could not access ?results because of some memory issue. Can I also define the "cooksCutoff" from this command?
maxCooks <- apply(assays(dds_time4)[["cooks"]], 1, max).
I am using the last update of DEseq2.
Many thanks in advance for your reply.