|
EDA365欢迎您登录!
您需要 登录 才可以下载或查看,没有帐号?注册
x
一、前言
7 o0 B& _+ V: V. \- i2 H" u, F- ~6 | 支持向量数据描述(Support Vector Data Description,SVDD)是一种单值分类算法,能够实现目标样本和非目标样本的区分,算法的具体描述可以参考以下文献:! C) H) J1 i+ \( x
(1)Tax D M J, Duin R P W. Support vector domain description[J]. Pattern recognition letters, 1999, 20(11-13): 1191-1199.* @7 P: M5 {$ o. ^7 y7 o5 ^. t
(2)Tax D M J, Duin R P W. Support vector data description[J]. Machine learning, 2004, 54(1): 45-66.5 w5 O+ O/ L% C. G! Z1 F2 `: H
" f2 L; d& B9 R# }+ w% }3 m8 D 台湾大学林智仁 (Lin Chih-Jen) 教授等开发设计的 libsvm 工具箱提供了SVDD算法的MATLAB接口,其中两个关键参数 c 和 g 直接影响SVDD的单值分类结果。笔者在此基础上,通过引入鲸鱼优化算法(Whale Optimization Algorithm,WOA),实现对 libsvm 工具箱中的SVDD算法的参数优化。
" a0 G. {/ g1 jWOA的具体描述可以参考以下文献:
4 r& L- V( p: R$ f# D0 X6 B! h T(1)Mirjalili S, Lewis A. The whale optimization algorithm[J]. Advances in engineering software, 2016, 95: 51-67.& O& v v w. W: _" B/ m
O; j# j6 o, q: X$ i4 Q
9 F1 O7 X4 q" N& [该算法的提出者已经把代码开源在mathworks。9 ` T; J3 [/ {
, Z* t3 l& A; v) R
注:(1)笔者已把 libsvm工具箱的svmtrain和svmpredict函数的名字分别改为libsvmtrain和libsvmpredict。
- i; U$ ^# W! i+ `! x1 _ (2)WOA算法和其他群智能优化算法一样,容易陷入局部最优,若寻优结果出现异常,可以尝试多运行几次。
) e. X" n T! V1 ?) L. h( n$ V' d4 z6 q
二、例子1 (libsvm 工具箱提供的heart_scale data)
; u; q# o5 I. h) {1 ~2 N M! H! {. l8 P' l$ [7 e
1. 数据说明
! x; l5 N7 D2 J 该数据集共有13个属性,270个样本,包括120个正样本和150个负样本。在该例子中,把正样本作为训练集,标签为1;负样本作为测试集,标签为-1。# m& y9 D: l# d* x# E3 a
) }2 T- E9 A8 q9 k0 ?8 x2. 主程序代码
6 M6 l0 W7 w+ I X3 g5 m7 b! U- Q9 Q4 Y2 a) }' H* \
- clc
- clear all
- close all
- addpath(genpath(pwd))
- global traindata trainlabel
- % heart_scale data
- [traindata, testdata, trainlabel, testlabel] = prepareData;
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 20; % Maximum numbef of iterations
- lb = [10^-3,2^-4]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^4]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);
! ^& F8 W$ X! M * a9 n8 O0 w2 E; l3 m
. }/ H s. n. X2 s5 K
最后一次迭代的结果以及最终的分类结果:8 _/ r4 @( j. g% ^& U& Z7 [
) w# U" I) E$ V
- ans =
- 19.0000 0.0667
- Accuracy = 80% (96/120) (classification)
- Accuracy = 66.6667% (80/120) (classification)
- Accuracy = 60% (72/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 53.3333% (64/120) (classification)
- Accuracy = 54.1667% (65/120) (classification)
- Accuracy = 42.5% (51/120) (classification)
- Accuracy = 35% (42/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 35% (42/120) (classification)
- ans =
- 20.0000 0.0667
- Accuracy = 100% (150/150) (classification)- M5 }- Q% o! i' ^& P8 ~* j
" j: f+ }* O% i% j3 `! _
9 `5 M) R( v: `9 a/ g! B; C
可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为93.33%,测试集的正确率为100%。
, A& T4 G! W k) c! H
' r/ ]& N7 P' h) B三、例子2 (工业过程数据)
0 X2 S7 x$ }7 ^
. _0 w& W& j. x U8 [2 L* R$ j' F9 [1. 数据说明
0 C f. }: j1 Q/ w 采用某工业过程数据,该数据集共有10个属性,训练集有400个正样本,测试集有80个样本(前40个样本为正样本,后40个样本为负样本)。
# O5 T. K# W+ s3 g3 m$ w0 h+ |
, m1 ]. M1 V" Y( Z+ @2. 主程序代码+ g+ Y4 P: O2 G* L
) v0 b! R4 ~3 T" b' t1 C7 h6 S
- clc
- clear all
- addpath(genpath(pwd))
- global traindata trainlabel
- % Industrial process data
- load ('.\data\data_2.mat')
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 30; % Maximum numbef of iterations
- lb = [10^-3,2^-7]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^7]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);
- % Visualize the results
- plotResult(testlabel,predictlabel)( a: N3 |; g/ f
% O6 k) _( V* V* L+ B
2 d7 ]2 I5 W2 N: n) ^, J最后一次迭代的结果以及最终的分类结果:
, l. ]- f+ h' L8 x9 w& ~$ M5 z2 t" g' }0 M: g. r
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- ans =
- 30.0000 0.0025
- Accuracy = 93.75% (75/80) (classification)
8 X, E) a' i" s2 _ 5 D" y! o2 o* {( l/ h6 S L! Q
$ W9 \: P1 M8 q5 b/ w
可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为99.75%,测试集的正确率为93.75%。& k @8 v7 y1 S
可视化结果如下:
4 b8 B) D* d- ^# a$ e' [. u
5 Z0 i" s0 J& i1 H( E1 w8 O: _
( t3 R8 C3 a P2 X& |
7 T1 T+ A0 g( c
" }% x/ b; b _- [2 M" @0 c/ u/ |, I3 @! g) K: i' g
|
|