Enter the body of text here Hello,
I am trying to do secondary research on published data that lists only the fold change in genes between 2 stages for 2:3 different tissue types without any further analysis. I am trying to apply the protocol here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4934518/pdf/f1000research-5-9996.pdf
in order to connect the differentially expressed genes to pathways, filter what is relevant and further analyse the network.
I appreciate any advice on how to estimate the dispersion and continue the analysis, starting from the log change and not gene counts. I understand that edgeR from documentation specify that quantification of change is not important, and the ratio of change is what the analysis is based on. I doubt a silly work around of starting with arbitrary count for each gene and multiply for each group by the fold change would be any good.
Thank you in advance.
Code should be placed in three backticks as shown below
# include your problematic code here with any corresponding output
xpressStage2<-data.frame(in[2:(in[["Nr_samples"]]+1)], row.names=in$"GeneName")
xpressStage3<-data.frame(in[(in[["Nr_samples"]]+2):length(in)], row.names=in$"GeneName")
names(xpressStage2) = c( "Whole_larvae", "Brain", "Muscle")
names(xpressStage3) = c( "Whole_larvae", "Brain", "Muscle")
#data frame with raw read counts from normal and tumor cells, here is the difference, this code is for read #counts, and I am trying to apply it for fold change per gene per group
xpress<- data.frame(xpressStage3, xpressStage2)
# DGEList object containing read counts, information about the samples and genes
dge <- DGEList(counts=xpress)
dge <- calcNormFactors(dge)
Tissue <- factor(c(colnames(xpressStage3), colnames(xpressStage2)))
Stage <- factor(c(rep("_3",length(xpressStage3)), rep("_2",length(xpressStage2))))
design <- model.matrix(~Tissue+Stage)
rownames(design) <- colnames(dge)
# estimate dispersion parameter for generalized linear model (glm), the code run fine, but make the majority # of genes as insignificant
dge <- estimateDisp(dge, design, robust=TRUE)
# fit glm
fit <- glmFit(dge, design)
# apply likelihoodratio test
lrt <- glmLRT(fit)
cat ("lrt coefficients \n\n")
head(lrt$coefficients)
# extract depth-normalized read counts (counts-per-million)
cpm <- cpm(dge)
normReads <- data.frame(cpm)
diffExpression <- as.data.frame(topTags(lrt, n=length(knime.in$"GeneName"))$table)
diffExpression <- diffExpression[sort(rownames(diffExpression)),]
out <- data.frame(cbind(normReads, diffExpression), row.names=rownames(diffExpression))
# The output for Stage_3 , Down 33, NotSig 3080, Up 36
cat ("summary \n\n")
summary(decideTests(lrt))
please also include the results of running the following in an R session
sessionInfo( )
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.34.0 limma_3.48.0
loaded via a namespace (and not attached):
[1] compiler_4.1.0 Rcpp_1.0.6 splines_4.1.0 grid_4.1.0
[5] locfit_1.5-9.4 statmod_1.4.36 lattice_0.20-44
No, not submitted in any database, and not sure if it is LogFC, they only mentioned fold change above the threshold 1.5. The highest value is 269. They put the downregulated in tables, and the upregulated in other tables. I added negative sign to the downregulated to merge.