No counts, just fold change data
3
0
Entering edit mode
Manal • 0
@2cec7878
Last seen 5 weeks ago
United Kingdom

Enter the body of text here Hello,

I am trying to do secondary research on published data that lists only the fold change in genes between 2 stages for 2:3 different tissue types without any further analysis. I am trying to apply the protocol here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4934518/pdf/f1000research-5-9996.pdf

in order to connect the differentially expressed genes to pathways, filter what is relevant and further analyse the network.

I appreciate any advice on how to estimate the dispersion and continue the analysis, starting from the log change and not gene counts. I understand that edgeR from documentation specify that quantification of change is not important, and the ratio of change is what the analysis is based on. I doubt a silly work around of starting with arbitrary count for each gene and multiply for each group by the fold change would be any good.

Code should be placed in three backticks as shown below


# include your problematic code here with any corresponding output

xpressStage2<-data.frame(in[2:(in[["Nr_samples"]]+1)], row.names=in$"GeneName") xpressStage3<-data.frame(in[(in[["Nr_samples"]]+2):length(in)], row.names=in$"GeneName")
names(xpressStage2) = c( "Whole_larvae", "Brain", "Muscle")
names(xpressStage3) = c( "Whole_larvae", "Brain", "Muscle")

#data frame with raw read counts from normal and tumor cells, here is the difference, this code is for read #counts, and I am trying to apply it for fold change per gene per group
xpress<- data.frame(xpressStage3, xpressStage2)
# DGEList object containing read counts, information about the samples and genes
dge <- DGEList(counts=xpress)
dge <- calcNormFactors(dge)

Tissue <- factor(c(colnames(xpressStage3), colnames(xpressStage2)))
Stage <- factor(c(rep("_3",length(xpressStage3)), rep("_2",length(xpressStage2))))

design <- model.matrix(~Tissue+Stage)

rownames(design) <- colnames(dge)

# estimate dispersion parameter for generalized linear model (glm), the code run fine, but make the majority # of genes as insignificant
dge <- estimateDisp(dge, design, robust=TRUE)

# fit glm
fit <- glmFit(dge, design)
# apply likelihoodratio test
lrt <- glmLRT(fit)
cat ("lrt  coefficients \n\n")
head(lrt$coefficients) # extract depth-normalized read counts (counts-per-million) cpm <- cpm(dge) normReads <- data.frame(cpm) diffExpression <- as.data.frame(topTags(lrt, n=length(knime.in$"GeneName"))\$table)
diffExpression <- diffExpression[sort(rownames(diffExpression)),]

#        The output for Stage_3 , Down        33, NotSig    3080, Up          36
cat ("summary \n\n")
summary(decideTests(lrt))


please also include the results of running the following in an R session

sessionInfo( )

R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] edgeR_3.34.0 limma_3.48.0

loaded via a namespace (and not attached):
[1] compiler_4.1.0  Rcpp_1.0.6      splines_4.1.0   grid_4.1.0
[5] locfit_1.5-9.4  statmod_1.4.36  lattice_0.20-44

EdgeR Estimation Dispersion • 205 views
1
Entering edit mode
@james-w-macdonald-5106
Last seen 11 minutes ago
United States

If all you have are the logFC values from a publication, then you won't be able to do anything. Did they not submit the data to a public repository (GEO or similar)?

0
Entering edit mode

No, not submitted in any database, and not sure if it is LogFC, they only mentioned fold change above the threshold 1.5. The highest value is 269. They put the downregulated in tables, and the upregulated in other tables. I added negative sign to the downregulated to merge.

1
Entering edit mode
swbarnes2 ▴ 800
@swbarnes2-14086
Last seen 6 hours ago
San Diego

in order to connect the differentially expressed genes to pathways, filter what is relevant and further analyse the network.

I think you can do this, but that is downstream of edgeR. EdgeR would get you fold changes, but since you have them, you probably don't need EdgeR.

You want pathway analysis tools.

0
Entering edit mode

Thank you for that. I did now. I am just hesitant if the values logFC logCPM LR PValue FDR coming out from EdgeR are the valuable, but the pathways linked seem relevant.

1
Entering edit mode
@steve-lianoglou-2771
Last seen 53 minutes ago
Denali

Assuming you've got logFC's for the (mostly) the same genes among your comparisons, and also that these stats are from all of the genes under test (ie. they aren't preselected to be just the significant ones) you can use a gene set enrichment analysis that is based on ranks – in this case, you would rank your genes based on their logFC.

You could then use a package like fgsea, or the limma::cameraPR() function, too (check the help in ?limma::cameraPR)