Publishing my First Paper
Barcode Selection and Layout Optimization in Spatial Transcriptomics
I’m thrilled to announce that my recent paper, “Barcode Selection and Layout Optimization in Spatial Transcriptomics,” has been published in the proceedings of the 22nd International Symposium on Experimental Algorithms (SEA 2024), which took place in beautiful Vienna, Austria.
Even more exciting, the paper was honored with the Best Paper Award at the conference! This achievement marks a major milestone in my professional journey, as the work is primarily derived from the results of my bachelor’s thesis. The research was guided by my excellent supervisors, Prof. Dr. Matthias Müller-Hannemann and Dr. Steffen Schüler , from the Data Structures and Efficient Algorithms group at Martin Luther University Halle-Wittenberg. I was also fortunate to collaborate with brilliant researchers Antonia Schmidt and Robert Mank, whose contributions were instrumental to our success.
The Problem: Spatial Transcriptomics and Barcode Layout Optimization
Our paper addresses a unique and challenging optimization problem in the context of spatial transcriptomics, an exciting field that enables high-resolution spatial mapping of gene expression. At the heart of this problem is the selection and layout of barcodes—short DNA sequences that act as unique identifiers within a two-dimensional microarray. The arrays in question are enormous, with typical dimensions of 768×1024, which means over 786,000 barcodes need to be placed optimally.
The goal is to assign barcodes to positions on the microarray in a way that minimizes a position-dependent cost function. In essence, we need to solve a very specialized version of the quadratic assignment problem (QAP), which is known for its computational complexity. Not only is the general QAP difficult, but we also demonstrated that the barcode layout problem is MaxSNP-hard, meaning it’s also hard to approximate.
Approach and Results: Heuristics and CUDA for Faster Runtime
Given the complexity of the problem, exact methods like Integer Linear Programming (ILP) are infeasible for large instances—think solving for over 786,000 variables! While ILP can theoretically yield optimal results, it’s only applicable to very small-scale cases. To tackle realistic, large-scale instances, we developed a set of heuristics, including:
- A sorting-based algorithm
- A greedy algorithm
- 2-OPT-based local search
- A genetic algorithm
Exploiting Barcode Surplus and Biochemical Constraints
One particularly interesting aspect of our research was the surplus of barcodes. Since the pool of possible barcodes is much larger than the number actually needed, we explored the impact of selecting barcodes from this larger set. By incorporating surplus barcodes, we were able to significantly improve the layout quality, though this came with a slight increase in runtime.
Another noteworthy discovery was that biochemical constraints, which limit the barcode design space, were actually beneficial for the overall layout cost. These constraints, rather than hindering the process, helped reduce complexity in the layout optimization, leading to better solutions.
Conclusion and Looking Ahead
The journey from my bachelor’s thesis to an award-winning paper has been both challenging and rewarding. I’m deeply grateful for the guidance and support from my supervisors and collaborators, as well as the opportunity to present this work at SEA 2024 in Vienna. The results of our research show promising advances in solving large-scale combinatorial optimization problems, and I’m excited about future applications in spatial transcriptomics and beyond.