Topics for MSc and BSc theses (MEP/BEP)
Curious to know how we cluster time course data using string matching algorithms?
Interested in exploring representation learning on graphs?
The projects below are just examples. Please get in touch to know more!
Temporal 3-way clustering to mine response patterns over time and across samples (e.g. patients, single-cells)
Pathways underlying a biological process of interest can be identified by grouping genes showing coherent expression patterns in certain time frames. Computationally, we define this task as a 2-way clustering problem, which is NP-hard in the general case but becomes tractable for time courses, since the temporal dimension pertains an order and biological processes occur within contiguous time points. Grouping can be accomplished across samples (e.g. patients, single-cells), leading to a temporal 3-way clustering problem. When samples are associated with different types (labels, e.g. sensitive and resistant), we are further interested in identifying which patterns are exclusively present in one type or shared by multiple types. Goal: develop an approach to analyze the results of a particular temporal 3-way clustering algorithm applied to (genes,time points, samples) data cuboids and identify clusters exclusive to samples with specific labels. Application: detect biological processes or pathways that are disrupted in resistant versus sensitive cancer cell lines.
multi-way clustering, 3-way clustering, biclustering, time course data, gene expression data
Figure: Joana Gonçalves
Representation learning from directed graphs applied to regulatory network analysis
Representation learning is the machine learning task of automatically discovering features or representations of the data, which can be used for modelling, clustering and classification. Goal of the project: (i) investigate approaches for learning structural encodings of directed networks/graphs that can be easily exploited by machine learning models, and (ii) assess their performance against commonly used measures based on a limited predefined number of network features. Application: analysis of regulatory data, e.g. transcription factor-target binding and eQTL data.
network representation learning, deep learning, graph theory, tf-target interaction data, eqtl data
Figure: Mark Heimann
Machine learning of molecular features contributing to gene essentiality in cancer cell lines
Gene essentiality screens disrupt large numbers of genes, individually, to detect those that are essential for cell survival. In particular, genes deemed essential for the survival of cancer cells are potential candidates for new targeted therapy. This project involves the application of supervised machine learning techniques to identify molecular features inherent to cancer cell lines that change the sensitivity of cancer cells to the disruption of certain target genes.
supervised learning, feature selection, gene perturbation data, cell line molecular profile data
Figure: Tscherniak et al. (2017) Defining a Cancer Dependency Map.