By Neha Kanneganti
Importance of Cross-Validation
Cross validation is an important method in assessing the machine learning model’s capacity. One can’t just give the model the training data and then test it in a real setting. They need to use cross validation to verify the model’s accuracy and limit its bias and variance.
Cross validation is done through the following steps:
Split the sample dataset into two parts: the training set and the testing set.
Use the training set to train the model
Use the testing set to test the model and assess its accuracy
These are the general steps for cross validation. Cross validation also includes many techniques that split the sample dataset differently and find the best algorithm for the model. The specific steps for each technique vary.
Common Types of Cross-Validation
Hold-out is when one splits the sample dataset into a training set and testing set. The training set is used to train the model, while the testing set is used to assess the model’s accuracy. One of the most common splits is 80% of the sample dataset for the training set and the remaining 20% for the testing set.
This is a straightforward method, since it follows the general steps of cross validation. This method is well-suited for beginner projects and large sample datasets, especially one has limited time.
K-fold cross method is when one splits the sample dataset into k subsets. One subset is used as the testing set, while the rest of the subsets are used as the training set. After the model is trained and tested, the process is repeated again until the each unique group has been used as the testing set.
Since this is a holdout method repeated k times, this is one way to improve from the holdout method.
Hopefully, you are now familiar with cross validation and have another useful concept added to your machine learning toolkit. Remember that the holdout and k-fold cross methods are the most common types of cross validation. I encourage you to research the other types later.