스생 과제2 | Notion

(a)

[x] Split the dataset into 70% train / 30% test. Train models with default model parameters.
[x] Compute accuracy on training and test sets.
[ ] Report results by creating a table with columns training accuracy, test accuracy, and AUC.

(b)

[ ] Use 10 repeated train/test splits.
[ ] Report the mean and standard deviation of test accuracy for each model.
[ ] Which model has the lowest/highest variance and why?

(c)

Plot train vs test accuracy for increasing model complexity:

[ ] For Decision Tree: vary max_depth
[ ] For Random Forest / Bagging: vary n_estimators
[ ] For AdaBoost: vary n_estimators
[ ] Interpret curves in terms of bias–variance trade-off.

(d)

draw feature importance measures (mean decrease in impurity)
draw feature importance measures (permutation importance)
Plot them as bar charts sorted in descending order.
Identify the top 3 most important features for each model.
- [ ] decision tree
- [ ] random forest
- [ ] AdaBoost