Question

scRNA-seq: a question about the names of the columns in a Seurat object

0

Entering edit mode

Bogdan ▴ 670

@bogdan-2367

Last seen 2.3 years ago

Palo Alto, CA, USA

Dear all,

please would you advise on the following : I am running the package Seurat on the dataset that was published in :

https://www.ncbi.nlm.nih.gov/pubmed/27565351.

https://github.com/broadinstitute/BipolarCell2016

In the article, there are 6 datasets on BIPOLAR cells that are all together in a matrix, and the columns are labelled :

Bipolar1_barcode1, ...., Bipolar1_barcodeXYZ,

Bipolar2_barcode1, ...., Bipolar2_barcodeXYZ,

Bipolar3_barcode1, ...., Bipolar3_barcodeXYZ,

Bipolar4_barcode1, ...., Bipolar4_barcodeXYZ,

Bipolar5_barcode1, ...., Bipolar5_barcodeXYZ,

Bipolar6_barcode1, ...., Bipolar6_barcodeXYZ,

Shall I understand that, when we would like to include multiple experiments in the same matrix, for the analysis with Seurat, we just need to label the columns according to a scheme : ExperimentA_Barcode, ...., ExperimentX_Barcode ;

thanks a lot,

-- bogdan

scRNA-seq SEURAT • 3.7k views

ADD COMMENT • link updated 7.1 years ago by Steve Lianoglou ★ 13k • written 7.1 years ago by Bogdan ▴ 670

Steve Lianoglou · Answer 1 · 2018-12-12

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 11 weeks ago

United States

For better or for worse, Seurat isn't a Bioconductor package, so this board is technically not the right/best place to get help on using it.

That having been said:

You are likely seeing the <ExperimentN>_<barcodeY> column names because the same barcodes are used across samples/experiments (where barcode is the cell barcode from a 10x-like experiment, or perhaps the ID of a well in some other type of expt). So, if you have count matrices from different experiments, they may just have the <barcodeY> column names, in which case you will have to prefix it with something unique to the experiment.
Just spanking count matrices together from different experiments can be problematic due to batch effects.
- The upcoming Seruat v3 release has an improved method and documentation on how you might correct for these batch effects, which you can read about here
- If you want to stay in the Bioconductor universe, take a look at Aaron Lun's simpleSingleCell workflows. In particular, there is a vignette that described how to deal with these batch effects using a mutual nearest neighbors approach.
- The Macosko lab has a preprint that uses yet another method called LIGER

ADD COMMENT • link 7.1 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Thank you Steve ! Yes, I am using both pipelines : 1) Seurat and 2) the workflow based on simpleSingleCell.

Talking about batch effects, If I may add a question, as I have noted 2-3 strategies :

a. a strategy where the samples from multiple experiments are concatenated in a large matrix (as I have described above).

talking about the batch correction : one may apply the COMBAT function in SVA package on the matrix.

https://ucdavis-bioinformatics-training.github.io/2017_2018-single-cell-RNA-sequencing-Workshop-UCD_UCB_UCSF/day2/scRNA_Workshop-PART3.html

b. another strategy to use CCA (canonical correlation analysis), as recently published :

https://satijalab.org/seurat/immune_alignment.html

c. MNN-based correction, as presented at the link you've provided :

https://bioconductor.org/packages/devel/workflows/vignettes/simpleSingleCell/inst/doc/work-5-mnn.html

would a strategy work better than other ? what would you advise ? thanks !

ADD REPLY • link updated 7.1 years ago by Steve Lianoglou ★ 13k • written 7.1 years ago by Bogdan ▴ 670

0

Entering edit mode

I would advise you to reference the relevant literature ;-)

The MNN paper does a comparison against COMBAT and shows their method to be superior, and the Seurat preprint claims their method to be superior to MNN.

If it were me, I'd likely ignore COMBAT and take my time with MNN, LIGER, and Seurat v3 to see how they compare to each other. Each has their own set of parameters you should spend some time playing with to understand how they effect the results.

(Note that I've updated the original answer to add reference to LIGER as a 3rd approach to tackle dataset integration)