I need to do a differential expression analysis of T-cell receptor (TCR) clonotypes. Simplified, TCRs are special proteins (part of immune system) which can recognize various another proteins. Because of their dynamic recognition abilities, gene encoding TCR has variable regions which are different in various T-cells - those differing T-cells are called clonotypes.
I have RNA-Seq data from 300bp paired-end MiSeq run. Four different samples, three biological replicates, 12 samples in total. Reads are such that everyone contains the variable region of TCR and UMI barcode on 5' end. Doing the standard pipeline for TCR analysis (MIGEC), I got a count matrix where columns are samples and rows are clonotypes. In my opinion, this is very similar to normal RNA-Seq count matrix where rows are genes.
Unfortunately, sequencing didn't go very well, so there are large differences in depth. Next thing is there are some dominating clonotypes, highly abundant across all samples, and on the other hand some clonotypes are very rare, with zero counts in almost all samples. Overall, I have 615 clonotypes. To get rid of those "zero" clonotypes, I did a standard rowSums thresholding:
dds[rowSums(counts(dds)) >= 10, ]
but only 60 clonotypes left! With threshold of 5, 134 clonotypes left.
My question is whether this type of data is suitable for analysis with DESeq2.
To see my existing results, you can download RMarkdown HTML report: https://owncloud.cesnet.cz/index.php/s/UtWukFacNR6kD3Y
Thank you in advance for any help!