Routine - Algorithm Noodling

![[GA_CoverImage.png.png]]Genetic algorithms have always been fascinating to me. Since introduced to me early in college, the concept of introducing basic evolutionary concepts to computer science is exciting to consider for various problem spaces. This well-studied optimization approach helps solve some of computer sciences most difficult and uncomputable problems by helping find more optimal approaches to these problems. AI models, scheduling problems and bioinformatics all leverage this approach to iterating on things from what is the best neural net to what is the best optimization for a protein fold to improve a drug or understand how DNA works. I eventually want to explore the use of bioinformatics and GA's but for now, I plan on using it with [[Routine - The scheduler with a new approach]]? > [!INFO] > This project is a work in progress. I will update this as I can with my learning. This is largely unstructured and unfinished # Terminology The first step for me was to get familiar with some common nomenclature in the GA world. A few Google search will reveal some great articles (as well as some general research help from our friends in the LLM world.) - Genetic Algorithm: the creation of a population then splitting the population using a set of decisions to find a more optimal set of children. - Fitness Score: the "meat" of the decision that determines a score for a population set - Cross-over: the opportunity for two (or more) parent's to create a child population with some exchange of the two solution sets - Mutation: a random change to a population set to determine if a more optimal solution set might occur. - Hamming Distance: IDK, still learning but sounds cool - And more I know there are more concepts I'll be delving into but for now, these are the high level. # Hypothesis Genetic algorithms have been the core of my initial research but there are a number of different approaches that can be used. Given the problem space, I'm leaning more towards Genetic algorithms but will be designing the approach in such a way that I can test the performance of each. These will be: | Approach | Solution Quality | Speed | Scalability | Implementation Complexity | | -------- | ---------------- | ----------- | ----------- | ------------------------- | | GA | High | Medium | Good | Medium | | CP | Optimal* | Fast-Medium | Limited | High | | SA | Good | Fast | Excellent | Low | | Quantum | Promising | Variable | Limited | Very High | In my (limited) research the solution quality for each of these approaches seem to have different benefits from speed, scalability and implementation complexity. During the process, I will test each option to see which one does produce a more optimal approach. Given the problem space, I suspect Genetic algorithms will come out on top with the given assumptions: - Under 20 appointments per day - Compute is less of a problem in this particular problem space - Generate multiple options to display testing multiple different approaches - Lower complexity approach. ## Data points to track I want to track a number of different datapoints to see which solution might be more optimal. - Number of questions asked to each person being scheduled (is there an optimal number of questions to ask and do more questions result in a better optimal schedule) - Speed of finding the most optimal scheduling solution - If the set of solutions presented are preferred by the scheduler - If the people being scheduled felt the time selected was acceptable to them - Is this approach faster than a manual approach and by how much # Experimentation I plan to experiment on different approaches and once enough data is collected, I will update various data points here. Some considerations around GA's will be: - Is it possible to have a population size modify itself without crossovers or mutations? - Tracking where the most optimal solution set was produced by