Detection of cell markers from single cell RNA-seq with sc2marker

Li, Ronghui; Zimmer-Bensch, Geraldine Marion (Thesis advisor); Costa, Ivan G. (Thesis advisor)

Aachen : RWTH Aachen University (2022)
Dissertation / PhD Thesis

Dissertation, RWTH Aachen University, 2022


Advances in single-cell RNA sequencing (scRNA-seq) methods have revolutionized biomedical research by allowing the transcriptomes of millions of cells to be studied at the same time. This has helped to uncover molecular processes that drive cell differentiation and complex diseases. Many computational methods have been proposed for scRNA-seq data analysis, including unsupervised analysis to characterize novel and disease-specific cell sub-populations such as rare stromal and immune cells that cannot be detected by other methods. The delineation of small panels with marker genes that characterize such sub-populations is of particular importance for further molecular characterization and validation of the detected cells. For example, flow cytometry can be used to physically isolate cells and quantify cell populations or the expression of markers for both research and clinical applications. However, flow cytometry requires a small panel of antibodies ($<$50) that target previously characterized cell surface proteins that can be used as markers for cell types of interest. Multiplex immunohistochemistry (IHC) imaging allows protein abundance to be measured at a cellular level in tissue cross-sections, which allows cell identification in a spatial context. IHC can also be performed on small panels of markers with IHC compatible antibodies. However, there is a lack of computational methods to explore the high gene coverage of scRNA-seq for delineation of cell markers from novel cell subtypes, as detected by cluster analysis of scRNA-seq data, for further delineation with antibody-based flow cytometry or IHC imaging. Here, I propose sc2marker, which uses a non-parametric feature selection method based on maximum margin to search for marker genes in clustered scRNA-seq data. sc2marker considers the distance of true positive and true negative cells to the optimal threshold (maximum margin) to score the best marker genes, which is not explored by competing methods. sc2marker has databases that contain markers with antibodies tailored for particular applications, including IHC and flow cytometry. These databases support the feature selection task because feature selection can be restricted to the gene spaces related to these proteins. I evaluated the capability of sc2marker and competing methods to detect known flow cytometry and imaging markers of well characterized immune cells. For this, I used five publicly available scRNA-seq data sets of immune cells and evaluated the performances of these methods in ranking known marker genes. I also used sc2marker with a novel scRNA-seq dataset of mouse bone marrow stromal cells, which was subsequently validated with a flow analysis. Altogether, our analysis results showed that sc2marker is a useful computational approach to detect cell type specific markers in scRNA-seq data.


  • Department of Biology [160000]
  • Neuroepigenetics Teaching and Research Unit [164620]