Question

Code to address batch effect in total number of reads/ sample for DEG using Ballgown

0

Entering edit mode

amy16 • 0

@amy16-14425

Last seen 5.6 years ago

Hi,

I am hoping that someone might be able to give me some advice on adjusting or controlling for batch effects in RNA-Seq data using Ballgown?

I have got RNA-seq data for different batches of sequencing for the replicates of the sample. I followed new tuxedo pipeline for my preliminary analysis of the data and found that there were difference in total number of PE reads between the two batches of the sequencing (batch 1: 27 million reads; batch 2: 20 million reads)- therefore I think this difference could be due to batch effects within my dataset.

Is it possible to control for batch effects within Ballgown pipeline using the stattest function?

Any advice on this would be gratefully appreciated!

Many thanks,

Amy

differential expression analysis ballgown r batch effects total number of reads • 1.3k views

ADD COMMENT • link updated 5.6 years ago by James W. MacDonald 65k • written 5.6 years ago by amy16 • 0

0

Entering edit mode

Hi everyone,

Also, to add to my question above, will this difference in number of reads cause a batch effect while estimating the transcripts for each samples. The reason, I am asking this question is that because using stringtie merge function, I merged the transcripts from all samples and then used it that merge file to estimate the transcript for each sample.

-Amy

ADD REPLY • link 5.6 years ago amy16 • 0

score 1 · Answer 1 · 2018-10-01

1

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 5 hours ago

United States

Just because you have different library sizes doesn't mean that you have a batch effect. It's common to have different library sizes, and that will be accounted for during the analysis. You should do either MDS or PCA plots in order to see if there are any batch effects, however.

As for fitting a model with a batch effect (if needed), you can specify any model matrix you like. See the vignette (which you should have already read, BTW), particularly the Differential expression analysis section.

ADD COMMENT • link 5.6 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks for the reply.

Can you please suggest me how I should use the transcript estimate data (output from string tie-merge) for doing PCA plot?

-Amy

ADD REPLY • link 5.6 years ago amy16 • 0

0

Entering edit mode

The usual thing to do is to use prcomp. Do note that you need to transpose your data matrix first, as prcomp expects samples in rows and observations in columns.

ADD REPLY • link 5.6 years ago James W. MacDonald 65k

0

Entering edit mode

Hi James,
Thank you for your suggestion on PCA analysis. I am not a stats person and I am not able understand the Eigen vectors in PCA analysis:
Below is the PCA output for the transcript estimate data for one sample; (B1C1 was sequenced in batch 1 and B1C2 , B1C3 were sequenced in the second batch)
Eigenvectors:
F1 F2 F3
B1C1 0.569 0.822 -0.007
B1C2 0.582 -0.397 0.710
B1C3 0.581 -0.408 -0.704
Any advice on this would be gratefully appreciated!
Many Thanks
-Amy

ADD REPLY • link 5.6 years ago amy16 • 0

0

Entering edit mode

Hi Amy,

It's a real problem if you are trying to do sophisticated statistical analysis when you are, by your own admission, not a stats person. If you are really planning to do the analysis, then you need to become more familiar with what you are doing, which will require quite a bit of reading. Just asking for advice on a support site like this isn't really going to help.

The other alternative is to find a local statistician who does know about this stuff, and have that person help you. Which is what I would recommend.

ADD REPLY • link 5.6 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks for your advice.

-AA

ADD REPLY • link 5.6 years ago amy16 • 0