Enter the body of text here Hello,
I am trying to do secondary research on published data that lists only the fold change in genes between 2 stages for 2:3 different tissue types without any further analysis. I am trying to apply the protocol here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4934518/pdf/f1000research-5-9996.pdf
in order to connect the differentially expressed genes to pathways, filter what is relevant and further analyse the network.
I appreciate any advice on how to estimate the dispersion and continue the analysis, starting from the log change and not gene counts. I understand that edgeR from documentation specify that quantification of change is not important, and the ratio of change is what the analysis is based on. I doubt a silly work around of starting with arbitrary count for each gene and multiply for each group by the fold change would be any good.
Thank you in advance.
Code should be placed in three backticks as shown below
# include your problematic code here with any corresponding output xpressStage2<-data.frame(in[2:(in[["Nr_samples"]]+1)], row.names=in$"GeneName") xpressStage3<-data.frame(in[(in[["Nr_samples"]]+2):length(in)], row.names=in$"GeneName") names(xpressStage2) = c( "Whole_larvae", "Brain", "Muscle") names(xpressStage3) = c( "Whole_larvae", "Brain", "Muscle") #data frame with raw read counts from normal and tumor cells, here is the difference, this code is for read #counts, and I am trying to apply it for fold change per gene per group xpress<- data.frame(xpressStage3, xpressStage2) # DGEList object containing read counts, information about the samples and genes dge <- DGEList(counts=xpress) dge <- calcNormFactors(dge) Tissue <- factor(c(colnames(xpressStage3), colnames(xpressStage2))) Stage <- factor(c(rep("_3",length(xpressStage3)), rep("_2",length(xpressStage2)))) design <- model.matrix(~Tissue+Stage) rownames(design) <- colnames(dge) # estimate dispersion parameter for generalized linear model (glm), the code run fine, but make the majority # of genes as insignificant dge <- estimateDisp(dge, design, robust=TRUE) # fit glm fit <- glmFit(dge, design) # apply likelihoodratio test lrt <- glmLRT(fit) cat ("lrt coefficients \n\n") head(lrt$coefficients) # extract depth-normalized read counts (counts-per-million) cpm <- cpm(dge) normReads <- data.frame(cpm) diffExpression <- as.data.frame(topTags(lrt, n=length(knime.in$"GeneName"))$table) diffExpression <- diffExpression[sort(rownames(diffExpression)),] out <- data.frame(cbind(normReads, diffExpression), row.names=rownames(diffExpression)) # The output for Stage_3 , Down 33, NotSig 3080, Up 36 cat ("summary \n\n") summary(decideTests(lrt))
please also include the results of running the following in an R session
R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042) Matrix products: default locale:  LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252  LC_MONETARY=English_United Kingdom.1252  LC_NUMERIC=C  LC_TIME=English_United Kingdom.1252 attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  edgeR_3.34.0 limma_3.48.0 loaded via a namespace (and not attached):  compiler_4.1.0 Rcpp_1.0.6 splines_4.1.0 grid_4.1.0  locfit_1.5-9.4 statmod_1.4.36 lattice_0.20-44