i'm currently struggle with my first RNA-seq analysis. The experiment design is quite complex and therefore it's hard for me to fit any existing tutorial. The experiment delivered samples from multiple patients from different timepoints, also with a different dose of treatment. For all patients i have a sample for time point 0, called screening. Additionally i have for some patients additional timepoints i.e. day 10, day 20, day 28 etc. For most of the patients i have a sample for the end of treatment time point. All samples have no replicates.
After reading a lot about different approaches to do an analysis without replicates, i ended up with the manual of edgeR (chapter 3.5) and tried to fit it for my experiment. What i have done so far (beside mapping the reads and gathering the counts):
I created a design matrix which looks like this:
Please follow the link for better readability: https://pastebin.com/P8nP7GCX
As you can see there are all my samples for each patient and a variable top rank or low rank and also responder status. Unfortunately i have no information about remission state for the patients, so the classification for responder has been done by using the median value of the overall survival. Furthermore the top / low rank is just a ranking order by the overall survival time. I'm not sure if this is a good approach to group the samples, but for now i couldn't think of another approach to group.
Then i created my counts matrix with the correct order of the counts:
matrix <- readDGE(files)
After that i created the DGEList object:
y <- DGEList(counts=matrix) filtergroup = c(1,1,1,1,1,1,1,2,1,2,2,2,1,1,1,1,1,1,1,1,2,2,2,2,2,2,1,1,1,1) keep <- filterByExpr(y, group=filtergroup) y <- y[keep, , keep.lib.sizes=FALSE]
Please note, that i used the classification of responder / non-responder as grouping.
Now i've created a contrast and run the analysis:
contrasts <- makeContrasts(ScrTopRankVsScrLowRank = Scr.LowRank-Scr.TopRank, levels = design) y <- estimateDisp(y, design) fit <- glmQLFit(y, design) qlf <- glmQLFTest(fit, contrast=contrasts[,"ScrTopRankVsScrLowRank"]) topTags(qlf)
So basically if compared all of my top ranked samples from screening time point (4) vs. the lower ranked (9).
For the next step i want to create a heatmap of the normalized counts of the top DGEs. Since the result set is quite huge, i was looking for a cutoff to minimize it. The problem is, that all approaches i've found so far are using the FDR value as a cutoff, which happens to be 1 for all my DGE.
So my questions are now:
Is the general approach i'm following usable for the design of the experiment?
How to minimize the DGE, or shouldn't i do this?
I'm also very open for any suggestions which contrasts to analyze, as i'm not a biologist but a medical student who is really struggling with this analysis.
If needed i can provide some screenshots of the samples table.
Thanks for help in advance!