Search
Question: Ballgown up and down regulated genes
0
2.5 years ago by
psbpedrobarbosa0 wrote:

Hi,

I would like to know if it's possible to distinguish between up and down regulated feature after running the stattest function in the ballgown package. In the github page refers to this "For two- and multi-group comparisons, a significant result indicates that the feature is differentially expressed in at least one of the groups", but i'm interested to know in which of the groups the feature is over or under expressed. The displayed fold change values are always positive, so i'm wondering if it's possible to extract this kind of information with this package.

Furthermore, FDR values seem weird, as in all the features with pvalue < 0.05 there isn't any with qvalue < 0.05 too. Why does this happen? Does it has something to do with the statistical model employed in ballgown?

Number of significant features:

> sum(bgresults_c1c3$pval < 0.05, na.rm = TRUE) [1] 2759 Subset with signigicant features: > subset <- subset(bgresults_c1c3, bgresults_c1c3$pval<0.05, na.rm=TRUE)

Histogram of qvalues:

> hist(subset\$qval, main='Ballgown q-values: c1-c3 comparison', col="grey", xlab='Range of q-values for the more significant transcripts')

Pedro Barbosa

modified 15 months ago by kapeelc0 • written 2.5 years ago by psbpedrobarbosa0
1
2.5 years ago by
Alyssa Frazee190
San Francisco, CA, USA
Alyssa Frazee190 wrote:

Are you doing two-group or multi-group comparisons?

If you are doing two-group comparisons, you can use the getFC option in stattest to get the estimated fold change, which indicates direction. (fold change < 1 indicates downregulation, fold change > 1 indicates upregulation, and the reference group is determined by whatever category from your group variable the lm function in R would consider the reference category. Generally this is the category that comes first alphabetically. If you used 0/1 labels, the reference category will be category 0, so FC > 1 means upregulated in the 1 group.)

If you are doing multigroup comparisons, there is not an automated way to identify which category is causing the statistical significance. It might be useful to make your own plots for this, if it would be useful. (This situation is similar to ANOVA - a significant one-way ANOVA result does not give any information about which sample mean is driving the statistically significant result).

As for the p-value and q-value distribution, it generally depends on how strong of a signal there is in your data, and having p-values less than 0.05 does not mean there will necessarily be small q-values in the data set after FDR correction (especially if there isn't a really strong signal in your data set). The statistical model is well-understood and is based on nested linear model comparisons, so it should perform accurately, but if you find problems please let us know. You can read more about the statistical model in our paper: http://biorxiv.org/content/early/2014/09/05/003665 -- the paper also contains several examples that show the statistical model to be performing reasonably in data sets with known signal.

I'm doing two-group comparisons. Thanks about the fold change clarification. I expected fold changes to be outputted as the log2 function (as edgeR, DESeq2) so i would have negative values for the down regulated features.

I did read the paper, but biology is my background, so i've been in trouble to understand what nested llinear models mean in this context. What i observe is when i do multi-group comparisons the range of q-values is more acceptable, but i'm not interested on this type of results. Since the signal might not be strong, should i rely in counting methods rather than estimated abundances ?

Another issue relates to the gene expression calculation. Sometimes i get this following error, am i doing something wrong ?

> bgresults_genes <- stattest(gown=bg,  meas="FPKM",covariate="group", feature="gene", getFC = TRUE)
Error in solve.default(t(mod) %*% mod) :
system is computationally singular: reciprocal condition number = 5.61945e-17

Hi Alyssa,

I just have started to use Ballgown and R. I have some questions for you:

1) How is it possible to make multiple comparisons (I have 4 different treatments);

2) How does the program identify the replicates?

3) how can I use the  ≥tFC in stattest?

Thank you,

0
15 months ago by
kapeelc0
kapeelc0 wrote:

Hi,

I am having similar issue with regard to p-value and q-value distribution. I am doing a pariwise comparison. I do not see any genes differentially expression for q value < 0.05. I ran the same comparison using DESeq2 and Cuffdiff I see 1865(padj < 0.05) and 1429(q < 0.05) respectively. I followed the protocol listed for ballgown in the paper,  below are the p value distribution and q value distribution for the comparison.

Any clues why the q value so high.

Thanks

Kapeel