(本文所使用的Python库和版本号: Python 3.5, Numpy 1.14, scikit-learn 0.19, matplotlib 2.2 )
1. 准备数据集
2. 使用GridSearch函数来寻找最优参数
from sklearn import svm, grid_search, cross_validationfrom sklearn.metrics import classification_reportparameter_grid = [ { 'kernel': ['linear'], 'C': [1, 10, 50, 600]}, # 需要优化的参数及其候选值 { 'kernel': ['poly'], 'degree': [2, 3]}, { 'kernel': ['rbf'], 'gamma': [0.01, 0.001], 'C': [1, 10, 50, 600]}, ]metrics = ['precision', 'recall_weighted'] # 评价指标好坏的标准for metric in metrics: print("Searching optimal hyperparameters for: {}".format(metric)) classifier = grid_search.GridSearchCV(svm.SVC(C=1), parameter_grid, cv=5, scoring=metric) classifier.fit(train_X, train_y) print("\nScores across the parameter grid:") for params, avg_score, _ in classifier.grid_scores_: # 打印出该参数下的模型得分 print('{}: avg_scores: {}'.format(params,round(avg_score,3))) print("\nHighest scoring parameter set: {}".format(classifier.best_params_)) y_pred =classifier.predict(test_X) # 此处自动调用最佳参数?? print("\nFull performance report:\n {}".format(classification_report(test_y,y_pred)))复制代码
Searching optimal hyperparameters for: precision
Scores across the parameter grid: {'C': 1, 'kernel': 'linear'}: avg_scores: 0.809 {'C': 10, 'kernel': 'linear'}: avg_scores: 0.809 {'C': 50, 'kernel': 'linear'}: avg_scores: 0.809 {'C': 600, 'kernel': 'linear'}: avg_scores: 0.809 {'degree': 2, 'kernel': 'poly'}: avg_scores: 0.859 {'degree': 3, 'kernel': 'poly'}: avg_scores: 0.852 {'C': 1, 'gamma': 0.01, 'kernel': 'rbf'}: avg_scores: 1.0 {'C': 1, 'gamma': 0.001, 'kernel': 'rbf'}: avg_scores: 0.0 {'C': 10, 'gamma': 0.01, 'kernel': 'rbf'}: avg_scores: 0.968 {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}: avg_scores: 0.855 {'C': 50, 'gamma': 0.01, 'kernel': 'rbf'}: avg_scores: 0.946 {'C': 50, 'gamma': 0.001, 'kernel': 'rbf'}: avg_scores: 0.975 {'C': 600, 'gamma': 0.01, 'kernel': 'rbf'}: avg_scores: 0.948 {'C': 600, 'gamma': 0.001, 'kernel': 'rbf'}: avg_scores: 0.968Highest scoring parameter set: {'C': 1, 'gamma': 0.01, 'kernel': 'rbf'}
Full performance report:
---- | precision | recall | f1-score | support |
0 | 0.75 | 1.00 | 0.86 | 36 |
1 | 1.00 | 0.69 | 0.82 | 39 |
avg / total | 0.88 | 0.84 | 0.84 | 75 |
Searching optimal hyperparameters for: recall_weighted
Scores across the parameter grid:
{'C': 1, 'kernel': 'linear'}: avg_scores: 0.653 {'C': 10, 'kernel': 'linear'}: avg_scores: 0.653 {'C': 50, 'kernel': 'linear'}: avg_scores: 0.653 {'C': 600, 'kernel': 'linear'}: avg_scores: 0.653 {'degree': 2, 'kernel': 'poly'}: avg_scores: 0.889 {'degree': 3, 'kernel': 'poly'}: avg_scores: 0.884 {'C': 1, 'gamma': 0.01, 'kernel': 'rbf'}: avg_scores: 0.76 {'C': 1, 'gamma': 0.001, 'kernel': 'rbf'}: avg_scores: 0.507 {'C': 10, 'gamma': 0.01, 'kernel': 'rbf'}: avg_scores: 0.907 {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}: avg_scores: 0.658 {'C': 50, 'gamma': 0.01, 'kernel': 'rbf'}: avg_scores: 0.92 {'C': 50, 'gamma': 0.001, 'kernel': 'rbf'}: avg_scores: 0.72 {'C': 600, 'gamma': 0.01, 'kernel': 'rbf'}: avg_scores: 0.933 {'C': 600, 'gamma': 0.001, 'kernel': 'rbf'}: avg_scores: 0.902Highest scoring parameter set: {'C': 600, 'gamma': 0.01, 'kernel': 'rbf'}
Full performance report:
---- | precision | recall | f1-score | support |
0 | 1.00 | 0.92 | 0.96 | 36 |
1 | 0.93 | 1.00 | 0.96 | 39 |
avg / total | 0.96 | 0.96 | 0.96 | 75 |
1. 使用GridSearch中的GridSearchCV可以实现最佳参数组合的搜索,但需要指定候选参数和模型的评价指标。
2. 使用classifier.best_params_函数可以直接把最佳的参数组合打印出来,方便以后参数的直接调用
3. classifier.predict函数是自动调用最佳的参数组合来预测,从而得到该模型在测试集或训练集上的预测值。
best_classifier=svm.SVC(C=600,gamma=0.01,kernel='rbf') # 上面的full performance report的确使用的是最佳参数组合best_classifier.fit(train_X, train_y)y_pred =best_classifier.predict(test_X)print("\nFull performance report:\n {}".format(classification_report(test_y,y_pred)))复制代码
得到的结果和上面full performance report一模一样。
1, Python机器学习经典实例,Prateek Joshi著,陶俊杰,陈小莉译