Biology now generates huge amount of data. In this talk we will touch on some mathematical problems inspired by the study of real genomic data that require the use of combinatorics, graph theory and formal language thoery. All examples are taken from our own bioinformatics work in recent years. The first problem concerns the number of longer missing strings (of length K+i, i>=1) taken away by the absence of one or more K-strings. The exact solution of the problem may be obtained by using the Golden-Jackson cluster method in combinatorics and by making use of a special kind of formal languages, namely, the factorizable language. The second problem consists in explaining the fine structure observed in one-dimensional K-string histograms of some randomized genomes. The third problem is the uniqueness of reconstructing a protein sequence from its constituent K-peptides. The latter problem has a natural connection with the number of Eulerian loops in a graph. To tell whether a protein sequence has a unique reconstruction at a given K the factorizable language again comes to our help.
乌克兰哈尔可夫国立大学物理数学系毕业(1959),中国科学院理论物理研究所研究员(1978-2005),副所长(1984-1987),所长(1990-1994),现任复旦大学教授(2005-),中国科学院院士(1980),第三世界科学院院士(1995)。
研究领域: 基于实际基因组数据的生物信息学和计算生物学研究。