Dear,
This is a follow-up on a previous question (to be found here: C: Adjusting for procedural effect and deciding on statistical test )
I am trying to analyze an RNA seq dataset with EdgeR. I have 44 different samples, coming from two controls (both the full organ, namely a plant root) and five different cell types. I have 6 samples for all but one of the controls, for which I have 8 samples. Half of the samples is treated, the other is not. We are interested in the effect of this treatment.
To get the individual cell types, we had to sort our protoplasts. This means that all the cell type samples are sorted. We also sorted one of the controls (the one with 8 samples), while we did not sort the other control. We hope to be able to use these controls to correct for the sorting effect when analyzing our data for treatment-induced effects with EdgeR. To summarize, this is our experimental design:
tissue <- c(rep("control_wholeRoot",6),rep("control_sorted", 8), rep(c("type1", "type2", "type3", "type4", "type5"), each=6)) group <- factor(paste0(tissue, ".", sorting, ".", treatment)) |
This design is used to construct the fit object from the digital gene expression list (DGEList) that I made from my counts table.
#make the DGEList #filter out lowly expressed genes #fit the data |
I then make contrasts to compare the treated vs the non-treated sample of each sample type (each of the two controls and each of the 5 cell types). This gives 7 lists of all genes with the FC in the specific sample due to the treatment.
We want to check whether including 'sorting' in our design matrix actually does anything. To check this, we reran the script, but without the 'sorting' in 'group'. We then made a scatterplot of FC of the genes in the unsorted control vs the FC of the same genes in the sorted control from both the resulting datasets.
We expected the scatterplot made from the dataset with 'sorting' in 'group' to show points that fall more along the identity line (and resulted in a larger R squared) then the scatterplot made from the dataset without 'sorting' in 'group'.
However, we got two graphs that are exactly the same. Are we missing something here and should we do a different check or is it indeed bad that the two graphs are the same?
Any help would be highly appreciated,
Eline Verbon and Ronnie de Jonge
For future reference, replies to existing answers should be added using "add comment", rather than using "add your answer", unless you're actually answering your own question.
Anyway, there is nothing extra you need to do to correct for the sorting effect when testing for the treatment effect. This is because the sorting effect is implicitly encoded in the
tissue
factor. When you compare between treated and untreated samples within each tissue, you are already accounting for the sorting effect; any given level oftissue
is either sorted or it isn't, there are no levels that have both sorted and unsorted samples.I don't really understand what you're trying to do with the sorting comparisons. What exactly are you correcting for? You don't have unsorted cell type groups, so there's nothing to correct here. There is only one sorting effect you can measure, and that's between the sorted and unsorted control samples. Even if you did calculate the sorting-induced log-fold change - so what?
As for the low-depth sample; if it is a clear outlier on a MDS plot, I would definitely throw it out.