Travelling Salesman Problem

The traveling salesman problem consists of a salesman and a set of cities. The salesman has to visit each one of the cities starting from a certain one (e.g. the hometown) and returning to the same city. The challenge of the problem is that the traveling salesman wants to minimize the total length of the trip. The steps include initialization of a population of individuals and representing each possible paths i,e sequences of cities as chromosomes. The encoding used in the chromosomal representation is a simple integer encoding and we perform genetic operations like crossover, mutation and selection on this population using an appropriate fitness function.

Genealogy GWAS

Genome-Wide Association Studies have been known to test for associations between thousands of Single Nucleotide Polymorphisms(SNPs) and disease phenotypes. GWAS studies aim to identify associations of genotypes and phenotypes by testing for the differences in allele frequencies between individuals that are ancestrally similar but genetically different. Though GWAS has worked particularly well for common diseases it still has some pitfalls when dealing with rare diseases. Often rare diseases need to have large effect sizes to be correctly identified by GWAS studies so rare variants causing rare diseases tend to be associated with low values of statistical power. In addition to this, GWAS is also susceptible to the problem of multiple testing in which we need to account for the number of independent SNPs being tested in the study. In this study we try to improve upon the conventional GWAS method by simulating genomes of a well studied population of individuals living in the Lac-Saint-Jean Region of Quebec. The simulations are generated using an extended version of the msprime software that leverages the BALSAC French-Canadian genealogy data of more than 3 million individuals of Quebec

Deep Forests

We use subnet structures in the form of tree blocks and aggregate them in layers to perform deep learning on the dataset. The advantages of using this method is that we can select any functional set to use as nodes in the trees. Genetic approaches to the optimization or training of these structures could also significantly reduce overfitting in these models. The model works well specially to replicate complex functions after tuning the tree structure and the hyperparameters. It is promising as it captures the idea of preserving diversity in the population of individuals and helps lower the error more, without falling into premature convergence.

Machine Learning

I code a variety of machine learning algorihtms in Python for classification, regression tasks. I use libraries like sklearn, Numpy, pandas etc to do the same.