scRNA-seq: a question about the names of the columns in a Seurat object
1
0
Entering edit mode
Bogdan ▴ 620
@bogdan-2367
Last seen 7 days ago
Palo Alto, CA, USA

Dear all,

please would you advise on the following : I am running the package Seurat on the dataset that was published in :

https://www.ncbi.nlm.nih.gov/pubmed/27565351.

https://github.com/broadinstitute/BipolarCell2016

In the article, there are 6 datasets on BIPOLAR cells that are all together in a matrix, and the columns are labelled :

Bipolar1_barcode1, ...., Bipolar1_barcodeXYZ,

Bipolar2_barcode1, ...., Bipolar2_barcodeXYZ,

Bipolar3_barcode1, ...., Bipolar3_barcodeXYZ,

Bipolar4_barcode1, ...., Bipolar4_barcodeXYZ,

Bipolar5_barcode1, ...., Bipolar5_barcodeXYZ,

Bipolar6_barcode1, ...., Bipolar6_barcodeXYZ,

Shall I understand that, when we would like to include multiple experiments in the same matrix, for the analysis with Seurat, we just need to label the columns according to a scheme : ExperimentA_Barcode, ...., ExperimentX_Barcode ;

thanks a lot,

-- bogdan

scRNA-seq SEURAT • 834 views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 1 day ago
Denali

For better or for worse, Seurat isn't a Bioconductor package, so this board is technically not the right/best place to get help on using it.

That having been said:

  1. You are likely seeing the <ExperimentN>_<barcodeY> column names because the same barcodes are used across samples/experiments (where barcode is the cell barcode from a 10x-like experiment, or perhaps the ID of a well in some other type of expt). So, if you have count matrices from different experiments, they may just have the <barcodeY> column names, in which case you will have to prefix it with something unique to the experiment.

  2. Just spanking count matrices together from different experiments can be problematic due to batch effects.

ADD COMMENT
0
Entering edit mode

Thank you Steve ! Yes, I am using both pipelines : 1) Seurat and 2) the workflow based on simpleSingleCell.

Talking about batch effects, If I may add a question, as I have noted 2-3 strategies : 

a. a strategy where the samples from multiple experiments are concatenated in a large matrix (as I have described above).

talking about the batch correction :  one may apply the COMBAT function in SVA package on the matrix.

https://ucdavis-bioinformatics-training.github.io/2017_2018-single-cell-RNA-sequencing-Workshop-UCD_UCB_UCSF/day2/scRNA_Workshop-PART3.html

b. another strategy to use CCA (canonical correlation analysis), as recently published : 

https://satijalab.org/seurat/immune_alignment.html

c. MNN-based correction, as presented at the link you've provided :

https://bioconductor.org/packages/devel/workflows/vignettes/simpleSingleCell/inst/doc/work-5-mnn.html

would a strategy work better than other ? what would you advise ? thanks !

ADD REPLY
0
Entering edit mode

I would advise you to reference the relevant literature ;-)

The MNN paper does a comparison against COMBAT and shows their method to be superior, and the Seurat preprint claims their method to be superior to MNN.

If it were me, I'd likely ignore COMBAT and take my time with MNN, LIGER, and Seurat v3 to see how they compare to each other. Each has their own set of parameters you should spend some time playing with to understand how they effect the results.

(Note that I've updated the original answer to add reference to LIGER as a 3rd approach to tackle dataset integration)

ADD REPLY
0
Entering edit mode

thanks a lot Steve ! that is very helpful and very informative !

ADD REPLY

Login before adding your answer.

Traffic: 634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6