|
EDA365欢迎您登录!
您需要 登录 才可以下载或查看,没有帐号?注册
x
一、前言
, |. r% U i) E, B 支持向量数据描述(Support Vector Data Description,SVDD)是一种单值分类算法,能够实现目标样本和非目标样本的区分,算法的具体描述可以参考以下文献:
0 o: S8 O& q% G6 e. m/ _' {! u(1)Tax D M J, Duin R P W. Support vector domain description[J]. Pattern recognition letters, 1999, 20(11-13): 1191-1199.
9 d5 E& \; i% S/ n' L d(2)Tax D M J, Duin R P W. Support vector data description[J]. Machine learning, 2004, 54(1): 45-66.: P' t6 `) m: F6 ^# t# q0 X
6 D% z3 c9 X2 d4 C/ h 台湾大学林智仁 (Lin Chih-Jen) 教授等开发设计的 libsvm 工具箱提供了SVDD算法的MATLAB接口,其中两个关键参数 c 和 g 直接影响SVDD的单值分类结果。笔者在此基础上,通过引入鲸鱼优化算法(Whale Optimization Algorithm,WOA),实现对 libsvm 工具箱中的SVDD算法的参数优化。/ P) W+ M/ i) X) L* A# Z! r
WOA的具体描述可以参考以下文献:
5 G$ K: V- Z v(1)Mirjalili S, Lewis A. The whale optimization algorithm[J]. Advances in engineering software, 2016, 95: 51-67.% q: |$ c/ X% M$ ?" M& ~8 i( B. E
0 A7 L2 N3 i. H: |6 h* s- }
M0 `% b, K! u S2 X/ O3 X该算法的提出者已经把代码开源在mathworks。( {1 J& l, ?, Y/ ?' r6 N7 |
: B* l' K6 g7 T7 s, h
注:(1)笔者已把 libsvm工具箱的svmtrain和svmpredict函数的名字分别改为libsvmtrain和libsvmpredict。, G C- N. t5 K% u0 G( K/ W6 l1 l- E
(2)WOA算法和其他群智能优化算法一样,容易陷入局部最优,若寻优结果出现异常,可以尝试多运行几次。$ z: E- z; A: V0 R/ s) f) L4 d
0 n) P% Q. D5 b- ]% w- \" K2 @0 I二、例子1 (libsvm 工具箱提供的heart_scale data)) n2 o; P+ f+ q% M* k5 w% `
% C0 P- H5 W5 K6 f: [. O1. 数据说明# @9 c4 L& R0 L
该数据集共有13个属性,270个样本,包括120个正样本和150个负样本。在该例子中,把正样本作为训练集,标签为1;负样本作为测试集,标签为-1。
4 l- ^' m3 `8 R- w
w+ v3 y6 E1 {6 F2. 主程序代码
2 j. f3 v8 m W8 E" A" K) x
( n: f& d/ u8 o4 o- clc
- clear all
- close all
- addpath(genpath(pwd))
- global traindata trainlabel
- % heart_scale data
- [traindata, testdata, trainlabel, testlabel] = prepareData;
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 20; % Maximum numbef of iterations
- lb = [10^-3,2^-4]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^4]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);5 [3 q' e* ~/ k `
' K- d; d8 Q0 [& T6 f+ B! V
. I/ a, E; O) M) k0 M% R最后一次迭代的结果以及最终的分类结果:
" \ i) x" q8 q. m
& M" B* O6 D0 f. j8 x- ans =
- 19.0000 0.0667
- Accuracy = 80% (96/120) (classification)
- Accuracy = 66.6667% (80/120) (classification)
- Accuracy = 60% (72/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 53.3333% (64/120) (classification)
- Accuracy = 54.1667% (65/120) (classification)
- Accuracy = 42.5% (51/120) (classification)
- Accuracy = 35% (42/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 35% (42/120) (classification)
- ans =
- 20.0000 0.0667
- Accuracy = 100% (150/150) (classification)
# }( s! S( g; I, m/ O- i7 W9 M+ h
: L# T9 I, F8 M+ i2 n' d( g, C) j; H0 V3 T) W8 Z4 a
可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为93.33%,测试集的正确率为100%。. K, V' E( z1 E8 s' Y# ?& b- }
4 g! u! j# h1 u0 x! l: _) F4 ^
三、例子2 (工业过程数据)8 F% p3 n4 x* k: P% n* k% o
' t4 t0 t& l. U' [& r. t$ k1. 数据说明0 v8 X$ \/ W% t0 H: x
采用某工业过程数据,该数据集共有10个属性,训练集有400个正样本,测试集有80个样本(前40个样本为正样本,后40个样本为负样本)。
5 q( ~, I8 Q- x, k# X& ~' {8 b6 C' |+ p
2. 主程序代码7 V3 H% K6 w# m t
6 Y6 u# `! ~' P8 n: {5 [5 Y2 P# l- clc
- clear all
- addpath(genpath(pwd))
- global traindata trainlabel
- % Industrial process data
- load ('.\data\data_2.mat')
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 30; % Maximum numbef of iterations
- lb = [10^-3,2^-7]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^7]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);
- % Visualize the results
- plotResult(testlabel,predictlabel)
, O- F) w/ f) W5 _/ ]; _
) O7 ?) x( K( o2 J- x, j3 k
+ }. R; X; c b& p& S/ R# W: U* ^最后一次迭代的结果以及最终的分类结果:
& a1 L" S; G( S+ A6 E8 h5 h" b( X- b9 l8 H. s" x
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- ans =
- 30.0000 0.0025
- Accuracy = 93.75% (75/80) (classification)! m3 Q1 a% e+ \+ x0 A( z
! W z/ T! q5 E; o9 Y8 h8 k/ ?
5 ~0 T. Q4 Z, n2 R9 J可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为99.75%,测试集的正确率为93.75%。! R! u. q d8 G# {) n, C: {
可视化结果如下:+ W1 n8 K) G6 K- f3 L1 X
. C' a0 _$ A+ X/ y3 O
1 P* E, ?- p" A3 O; `8 L4 g8 s
+ u* `) c% o t- L% x
+ ~$ Z; Y# i& f4 a, N) D" G
- J: s/ ?5 c" M6 Z |
|