Raw RNA-seq data: column names are sequences
1
0
Entering edit mode
Anne • 0
@38e1c8a1
Last seen 9 weeks ago

Hello,

Sorry for that very basic question: I have raw RNA-seq count data. The row names are genes, the column names are short sequences (e.g., AAACCTGCAATCTACG.1). Aren't these supposed to be sample names? What is the name of such a file format? (Couldn't find anything online, though don't know what I have to search for.) The ultimate goal is to have a count matrix of genes vs. samples.

Thanks a lot for any help on this.

RNASeq • 174 views
ADD COMMENT
2
Entering edit mode
ATpoint ▴ 860
@atpoint-13662
Last seen 3 days ago
Germany

Looks like cellular barcodes. Is this single-cell data?

ADD COMMENT
0
Entering edit mode

Yes, it is single-cell data. That makes sense, so 1 barcode corresponds a sample (all barcodes are unique in the file). Thank you!!

ADD REPLY
1
Entering edit mode

The sequences do not correspond to one sample, but rather the identify of a single droplet, which (you hope) has the read counts from one and only one cell.

ADD REPLY
1
Entering edit mode

Be sure to run the standard QC on this count matrix, e.g. following https://bioconductor.org/books/release/OSCA/. As Steve Lianoglou says it is not samples but droplets. Ideally you captured a single cell per droplet but it could also be an empty droplet or doublet/multiplets which need to be removed. Also damaged cells and poor-quality ones need removal. OSCA will guide you through the essential steps.

ADD REPLY

Login before adding your answer.

Traffic: 302 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6