Hello,
I was given a file that contains only FPKM values from an RNA-seq experience for genes that have been measured at different time points. I read Best DEG tool for datasets with FPKM counts? and Differential expression of RNA-seq data using limma and voom() posts about performing a differential gene expression analysis when only having FPKM values. However, as not being an expert in using the limma package, I would like to know how can I analyze such data knowing that we have a different time point please.
Here is a couple of lines from my file:
tracking_id Total_FPKM 1hour 3hour 6hour ctrl gene_id gene Log2(FPKM)
XLOC_000003 0.0912937 0.0542082 0.070862 0.0912937 0.0348845 XLOC_000003 LOC100132062 -3.4533409
XLOC_000005 0.4593880 0.3969370 0.413311 0.4582030 0.4593880 XLOC_000005 - -1.1222149
XLOC_000009 3.3084000 3.2661400 3.308400 3.2586100 3.1813200 XLOC_000009 LOC643837 1.7261337
So, the first step is to create the design. (The guide on 9.6.1 explains how to perform analysis on different time points and so I tried to adapt it):
filename <- read.table("filename.csv", header = T, sep = "\t", stringsAsFactors = F , check.names = F, encoding = "UTF-8")
colnames_log2fpkm <- c("Log2(1hour)", "Log2(3hour)", "Log2(6hour)", "Log2(ctrl)")
log2fpkm <- c("Log2(1hour)"=log2(filename[,3]), "Log2(3hour)"=log2(filename[,4]), "Log2(6hour)"=log2(filename[,5]), "Log2(ctrl)"=log2(filename[,6]))
filename <- cbind(filename, setNames( lapply(colnames_log2fpkm, function(x) x=log2fpkm), colnames_log2fpkm) )
filename <- filename[!is.infinite(rowSums(filename[,c(9,10,11,12)])),]
targetframe <- data.frame("Column"=c("filename[,12]", "filename[,10]", "filename[,11]", "filename[,12]"), "Targets"=c("ctrl", "1hour", "3hour", "6hour"))
lev <- c("ctrl", "1hour", "3hour", "6hour")
f <- factor(targetframe$Targets, levels=lev)
design <- model.matrix(~0+f)
colnames(design) <- lev
> design
ctrl 1hour 3hour 6hour
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$`f`
[1] "contr.treatment"
At this point, I am not sure how to proceed... so any help would be appreciated.
sessioninfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] limma_3.38.3
loaded via a namespace (and not attached):
[1] BiocManager_1.30.4 compiler_3.5.1 htmltools_0.3.6 tools_3.5.1 yaml_2.2.0
[6] Rcpp_1.0.0 rmarkdown_1.11 knitr_1.21 digest_0.6.18 xfun_0.4
[11] evaluate_0.12
Thanks in advance.
Dear Aaron,
Thanks for your kind answer. I will try to follow your directives. Also, could you please tell me if my design is properly formatted ?
I will come back later with the results that I have obtained.
Thanks
Hello again @Aaron Lun,
I tried following your suggested answer but I am getting an error when using the eBayes function:
which means that my data don't have any replicates. What can I do in this case ? This is my code:
```{r}
filename <- read.table("filename.csv", header = T, sep = "\t", stringsAsFactors = F , check.names = F, encoding = "UTF-8")
filename["Log2(ctrl)"] <- log2(filename$ctrl)
filename["Log2(1hour)"] <- log2(filename$`1hour`)
filename["Log2(3hour)"] <- log2(filename$`3hour`)
filename["Log2(6hour)"] <- log2(filename$`6hour`)
filename <- filename[!is.infinite(rowSums(filename[,c(9,10,11,12)])),]
rownames(filename) <- filename[,1]
filename <- filename[,c(9,10,11,12)]
targetframe <- data.frame("Column"=c("filename[,1]", "filename[,2]", "filename[,3]", "filename[,4]"), "Group"=c("ctrl", "1hour", "3hour", "6hour"), "Time"=c(0, 1, 3, 6))
lev <- c("ctrl", "1hour", "3hour", "6hour")
X <- ns(targetframe$Time, df=2)
Group <- factor(targetframe$Group, levels = lev)
design <- model.matrix(~Group*X)
fit <- lmFit(filename, design)
fit <- eBayes(fit)
```
And when I execute lmFit function, it gives me this:
```{r}
```
Thanks in advance
You would do well to understand the code before you try to use it. The scenario in 9.6.2 is not the same as your experimental design, it is not a copy-and-paste situation. Clue:
X
andGroup
are redundant.Hello @Aarun Lun,
Thank you for your last message and sorry for the late reply as I am busy working on other projects. I was able to get past my issue by simply doing:
So, after running eBayes, I obtain the fit table which has lines as following:
And there are lots of columns. You mentioned I should drop the spline coefficient (columns 1 to 3) using topTable with the coeff argument. But at that point, I am really unsure about what to do next. Also, shouldn't I focus only on the p-values ? Why should I drop the spline coefficients ? Sorry for the trouble and thank you in advance.
One question at a time.
@Aaron Lun
I was requested to do a heatmap of the log2FC for the DEG. I know it is not meaningful at all but they need to have some kind of lead. They have a list of genes that seems interesting to them and they want to verify them experimentally. So, my question is:
1. If I understood you correctly, the spline coefficients corresponds to log-fold changes but which column? Is it coefficients..Intercept ? or coefficients.X1 ? or coefficients.X2 ?
Thank you again and Merry Christmas.
No spline coefficient corresponds to a simple log-fold change. What do you expect? If the trend is parabolic, what kind of log-fold change would you be able to define to describe it?
If you must have a log-fold change, refit the model with a linear time relationship and report the log-fold change per day with the p-value from the model fitted with the spline. This allows you to have your cake (in terms of a p-value that accounts for non-linear trends) and eat it too (in terms of having a single metric that describes, roughly, the average direction and magnitude of the change). If you want more detail, you will need to examine the trend directly, e.g., by plotting the log-expression versus time for genes of interest.
Hello again Aaron Lun,
Thank you for your last comment. I think I got a glimpse of what you are trying to say. Putting it differently, I will use the fitted model obtained previously by taking out the P-value column because it conserves the general trend of my spline, right ?
But I want to get something clear first. Did you mean to report the log-fold change per hour instead of per day ?
Kindly, I ask you to help me out on this one seeing how more knowledgeable you are. I have no experience in using Limma, let alone all it using these types of data with only FPKM. I would learn a lot from you to be honest.
As you already figured out, I am expecting to have a dataframe with gene names, log2FC and their p-values of the control vs treatment at the end of my analysis. The ctrl is at time 0 of the treatment and the rest are at different timing like I mentioned in my original post.
Can you please guide me on how to achieve such results ?
Thank you a lot.
sort.by="none"
intopTable
in analyses performed with both models; take the p-value and FDR from the spline model; and use them to replace the same fields in the linear model. This gives you the desired table of results, as long as you remember that the p-value and log-fold change are derived from different models.Thank you for your reply.
This is what I was able to write:
[1] "logFC", "AveExpr", "t", "P.Value", "adj.P.Val", "B"
[1] "coefficients", "df.residual", "sigma", "stdev.unscaled", "Amean", "s2.post", "t", "df.total",, "p.value",, "lods", "F", "F.p.value"
Now, I need to replace the columns from the spline model in the linear model but before that, I noticed that the Log2FC does not change so the only values that need to be replaced are the pval and the FDR:
Is this the correct way ?
Thank you in advance.
Your code is so wrong that it's difficult to describe all the errors.
So, here's the correct code instead:
Thank you Aaron Lun...
I guess I really still have a long way to go. Sorry for the messed up code. Today, I was able to learn new things thanks to you. This solved my issue.
Best regards,
John
Bump ... I need help on this please