It’s true, a picture tells a thousand words!
After cleaning and exploring my dataset for my NLP project, I wanted to model my data using both a Random Forest Classifier as well as a Neural Network Classifier. To prepare the data for these models I had to take a couple of different methods. After a lot of googling, I thought it would be helpful to describe these methods in a cohesive blog!
Here I sit about to complete my 5 month immersive course in Data Science with the Flatiron School this week! Before I leave this course, I want to speak about why I decided to enter this field and change my profession at age 36.
While working through my first modeling project as a Data Scientist, I found an excellent way to compare my models was using a ROC Curve! However, I ran into a bit of a glitch because for the first time I had to create a ROC Curve using a dataset with multiclass predictions instead of binary predictions. I also had to learn how to create a ROC Curve using a Random Forest Classifier for the first time. Since it took me an entire afternoon googling to figure these things out, I thought I would blog about them to hopefully help someone in the future, that being you!
When looking at time series and considering fitting the ARIMA model to your data, as always it’s important to develop train/test splits of your data. However, when doing this for time series the process is a bit different. Rather than using a random sample as you may do when fitting a regression model, you’ll want to split the data based on your datetime.