Code to address batch effect in total number of reads/ sample for DEG using Ballgown
1
0
Entering edit mode
amy16 • 0
@amy16-14425
Last seen 3.1 years ago

Hi,

I am hoping that someone might be able to give me some advice on adjusting or controlling for batch effects in RNA-Seq data using Ballgown?

I have got RNA-seq data for different batches of sequencing for the replicates of the sample. I followed new tuxedo pipeline for my preliminary analysis of the data and found that there were difference in total number of PE reads between the two batches of the sequencing (batch 1: 27 million reads; batch 2: 20 million reads)- therefore I think this difference could be due to batch effects within my dataset.

Is it possible to control for batch effects within Ballgown pipeline using the stattest function?

Any advice on this would be gratefully appreciated!

Many thanks,

Amy

0
Entering edit mode

Hi everyone,

Also, to add to my question above, will this difference in number of reads cause a batch effect while estimating the transcripts for each samples. The reason, I am asking this question is that because using stringtie merge function, I merged the transcripts from all samples and then used it that merge file to estimate the transcript for each sample.

-Amy

1
Entering edit mode
@james-w-macdonald-5106
Last seen 17 hours ago
United States

Just because you have different library sizes doesn't mean that you have a batch effect. It's common to have different library sizes, and that will be accounted for during the analysis. You should do either MDS or PCA plots in order to see if there are any batch effects, however.

As for fitting a model with a batch effect (if needed), you can specify any model matrix you like. See the vignette (which you should have already read, BTW), particularly the Differential expression analysis section.

0
Entering edit mode

Can you please suggest me how I should use the transcript estimate data (output from string tie-merge) for doing PCA plot?

-Amy

0
Entering edit mode

The usual thing to do is to use prcomp. Do note that you need to transpose your data matrix first, as prcomp expects samples in rows and observations in columns.

0
Entering edit mode

Hi James,
Thank you for your suggestion on PCA analysis. I am not a stats person and I am not able understand the Eigen vectors in PCA analysis:
Below is the PCA output for the transcript estimate data for one sample; (B1C1 was sequenced in batch 1 and B1C2 , B1C3 were sequenced in the second batch)
Eigenvectors:
F1      F2        F3
B1C1  0.569 0.822 -0.007
B1C2  0.582 -0.397 0.710
B1C3  0.581 -0.408 -0.704
Any advice on this would be gratefully appreciated!
Many Thanks
-Amy

0
Entering edit mode

Hi Amy,

It's a real problem if you are trying to do sophisticated statistical analysis when you are, by your own admission, not a stats person. If you are really planning to do the analysis, then you need to become more familiar with what you are doing, which will require quite a bit of reading. Just asking for advice on a support site like this isn't really going to help.

0
Entering edit mode

-AA