I am analyzing RNA-Seq dataset using
EdgeR package and have a question about filtering by
filterByExpr that would keep important genes based on a variable column of the sample metadata.
I have worked earlier with dataset with only 1 timepoint (High dose vs. Control), and have performed
filterByExpr on this treatment column. I am now working with the new dataset with same treatment column, however corresponding to 3 timepoints (see example below). My question is, should I perform filtering on the
Treatment column or the
Treatment_Timepoint column. I assume
Treatment column is the right one since this is the core of the experiment. Please advise.
dput(Sample.info) #> Donor Treatment Timepoint Treatment_Timepoint #> Sample.1 P1 Control 6hr Control_6hr #> Sample.2 P2 Control 6hr Control_6hr #> Sample.3 P3 Control 6hr Control_6hr #> Sample.4 P4 Control 6hr Control_6hr #> Sample.5 P1 High 6hr High_6hr #> Sample.6 P2 High 6hr High_6hr #> Sample.7 P3 High 6hr High_6hr #> Sample.8 P4 High 6hr High_6hr #> Sample.9 P1 Control 24hr Control_24hr #> Sample.10 P2 Control 24hr Control_24hr #> Sample.11 P3 Control 24hr Control_24hr #> Sample.12 P4 Control 24hr Control_24hr #> Sample.13 P1 High 24hr High_24hr #> Sample.14 P2 High 24hr High_24hr #> Sample.15 P3 High 24hr High_24hr #> Sample.16 P4 High 24hr High_24hr #> Sample.17 P1 Control 48hr Control_48hr #> Sample.18 P2 Control 48hr Control_48hr #> Sample.19 P3 Control 48hr Control_48hr #> Sample.20 P4 Control 48hr Control_48hr #> Sample.21 P1 High 48hr High_48hr #> Sample.22 P2 High 48hr High_48hr #> Sample.23 P3 High 48hr High_48hr #> Sample.24 P4 High 48hr High_48hr
Thank you in advance.
Gordon Smyth thank you very much.
Then, I would just use like the below:
Your code is correct. Your code is equivalent to what I suggested, just somewhat longer and more complicated. Why not use the
groupargument, which saves you having to create extra design matrix?
Gordon Smyth this is noted, thank you, I will write as you suggested.
Gordon Smyth I have a follow up question, lets say If I am working with multivariable experiment (perform statistical analysis on each variable column separately; in the above case compare
Treatment: High vs. Controland
Timepoint: 24hr vs. 6 hr & 48hr vs. 6hr). At times, more variables depending on the experiment leading to complex set-up. In this scenario, what would be my
filterByExprcolumn based on? In the above case, I know
Treatmentvariable plays a crucial role with different incubation time which forms the basis of the experiment. To avoid confusion, Is it a good idea to simply use
rowSumsfunction (below) If I am unsure about about the right experimental conditions or It does not affect or change much? Sometimes, I use public RNAseq dataset from GEO for validation studies. Though for filtering purpose,
filterByExpris my choice function.
Lets assume another example to compare
Septic patients vs. Healthy Controls, these
Septicpatients are classified into
severewhich are again sub-classified into
outcome statuswhich are
Non-Recovered. From this data, I am interested primarily to compare
Septic patients vs. Healthy Controlstranscriptomic signatures, and then proceed to different levels of analysis involving
Outcome Status. Is my
filterbyexprcolumn would be
Septic and Healthycolumn?
You enter the whole design matrix to filterByExpr(). You don't choose which experimental conditions to use.
Gordon Smyth Meaning, something like the the below?
Something like what? I advised you to use "all treatment factors" and use the "whole design matrix" but you've done the opposite, omitting the design matrix entirely.
Reading your previous comment, you're making this trickier than it actually is. In reality, there's nothing to think about. You don't have to decide which treatments to use, you don't use different filtering for different contrasts, you just input the complete design matrix to
filterByExpr, same as you use for
lmFit. The only change you might make for filtering purposes is to remove a blocking variable from the design matrix.
Hi Gordon Smyth apologies for the confusion. I did use use
group=Treatment_Timepointfor the data that I was working earlier, however, just had an different additional question regarding multi-conditional experiment. Thank you very much for the inputs.