Dear All,
i would like to ask you a more "beginner" question about a initial comparison of DE lists i have acquired from limma, between two microarray datasets. Although the platform is the same (Agilent), the comparisons are somehow-different due to the different time point of comparison: in the one dataset, i have compared bystander samples vs controls in 4hours, whereas in the other i have performed the same comparison, but in the time-point of 30 min. Also, the same cell type was used in both experiments--IMR-90 human lung fibroblasts--. I understand due to the different time-point comparison, any general comparisons might be inappropriate-but, as I'm highly interested of finding common DE genes between both time-points--which could indicate interesting patterns or groups of genes in both time-points:
thus, for a start, i could compare the DE probe-sets (i.e. with adjusted p-val < 0.05) in a VENN diagram ? or i could also compare the final gene symbols, in case i miss anything for different DE probe-sets in the two datasets, annotated in the same gene symbol ?
Finally, also a scatter-plot would be helpful for this cause ? And if my notion is correct, i should similarly use the logFCs or the t-statistics from the common probe-sets/gene symbols ?
Thank you,
Konstantinos
Yes, and yes. Also, use the "Add comment" to respond to answers, don't make a new answer.
Dear Aaron,
please excuse me for returning to this matter, but i would like to ask you two specific questions about the interpretation of the created scatter plot. My below data frame, has the logFCs for the common probesets along with the gene symbols as the row names:
Then, i firstly used :
# Also there is a relatively high correlation:
But also, how could i add a slope in order to make the plot more interpretable ? Or even from the plot (the link below) i ca state that there is an obvious correlation for my two vectors of logFCs for both comparisons ? (just to pinpoint, in both comparisons the common probesets are all up-regulated, which is interesting for further investigation).
Also, the link to the figure of the scatterplot:
https://www.dropbox.com/s/3bkz4ltgu78pni6/Rplot.png?dl=0
Thank you,
Konstantinos
It's generally more informative to make the logFC-logFC plot using all features, rather than just those that are in the intersection. If you restrict the plot to genes that are DE in both comparisons, you'll generally be selecting for points in the corners of the plot; this can result in some spuriously large correlations. Anyway, as to adding a line, I'll point you in one direction; try using
lm
to perform a linear regression, and then supply the coefficients toabline
.Aaron thank you again for your recommendation. I wrongfully thought in the beginning that plotting only the common DE-probesets would be mostly interesting--so you suggest above to use all the DE genes, not the common in both lists ? because one of the two DE lists, is relatively bigger(~400 genes vs ~60 genes)-or it could be still informative ?
Moreover, regarding the argument lm, you mean something like:
But in the above lm function, usually does not take a predictor and a dependent variable in the linear model ?