Finding a way to get models into production seems like a universal problem these days. 1.75 billion results as of last count, with countless companies devoted to solving this in one way or another. The Problem At Windfall, we used a variety of models built by data scientists to drive our products. The problem was that different people had slightly different ways of building them, and so there was no standard way of running them. We relied heavily on whoever built the model in order to run it whenever we needed predictions. We also were exploring tying in model predictions directly into our front-end systems.
I often get asked by co-op students at work about how they can get started with using R. While sites like Kaggle are great for finding lots of datasets and entering competitions to see how many tenths of a point you can extract from your model, my advice to those starting it is to pick a topic or question that actually interests you. It’s a hundred times easier to do an analysis on something that you’ve been pondering than on fifty columns of anonymized, standardized numbers. One example I gave was baby names. I have this feeling that the way people are naming their babies have changed over the years.
With Trump’s victory, I thought it would be interesting to scope out some Trump-related datasets and get some practice with different machine learning algorithms. I’ve seen some work with generative text with Markov Chains, but thought a Recurrent Neural Network might be a little more fun to play with. Web Scraping Grabbing a large enough dataset was the first problem to solve. I did find a Github that had some Trump speeches but the data was 3 months old as of this post. I decided instead to create a simple scraper to capture all 55 of his speeches from The American Presidency Project.
The purpose of A/B testing is to determine through the use of statistical methods whether an experiment generates enough of a practically significant effect to support implementation.