PyCon CZ 2015

Using Django for Genome Engineering

Anastassiya Zidkova, Filip Sedlak
This talk is for people interested to see how their programming knowledge can be applied in biology research, particularly genomics. Currently, various technologies allow to change genetic information. Among these technologies, CRISPR is the most efficient and safe. CRISPR contains sequence of 20 letters and it is searching for match in DNA (think of it as a very long string). If it succeeds, then DNA is damaged by the cut. This ability can be used for targeted genome editing, where regions of interest can be changed. In order to change DNA at the right place, scientists need to design the sequence which would match the target DNA. The designed sequence shouldn't match anywhere else in the genome. Since this is biology, things are not perfect. The sequences may match even if there are a few mismatched letters. When we identify the candidate sequences, the most computationally intensive part is to look for places in the whole genome where the candidate could at least partially match. Our tool for designing such sequences is written in Python, using Django and Berkeley DB. For one analysis, we run tens of millions queries against a database of 700 million sites in genome.