LOADING
1846 字
9 分钟
vectorbt学习_43DMA之三滑窗网格参数优选

本文在上一篇文章(vectorbt学习_17DMA之二网格参数优选)基础上,采用滚动窗口+网格参数优选,分析出动态最优参数。

01,基础配置信息#

#conda envs:vectorbt_env
import warnings
import vectorbt as vbt
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import pytz
from dateutil.parser import parse
import ipywidgets as widgets
from copy import deepcopy
from tqdm import tqdm
import imageio
from IPython import display
import plotly.graph_objects as go
import itertools
import dateparser
import gc
import math
from tools import dbtools
warnings.filterwarnings("ignore")
pd.set_option('display.max_rows',500)
pd.set_option('display.max_columns',500)
pd.set_option('display.width',1000)

02,行情获取和可视化#

a,时间交易参数配置#

# Enter your parameters here
seed = 42
symbol = '002594.XSHE'
metric = 'total_return'
start_date = datetime(2020, 1, 1, tzinfo=pytz.utc) # time period for analysis, must be timezone-aware
end_date = datetime(2023,1,1, tzinfo=pytz.utc)
time_buffer = timedelta(days=100) # buffer before to pre-calculate SMA/EMA, best to set to max window
freq = '1D'
vbt.settings.portfolio['init_cash'] = 10000. # 100$
vbt.settings.portfolio['fees'] = 0.0025 # 0.25%
vbt.settings.portfolio['slippage'] = 0.0025 # 0.25%

b,获取行情和行情mask#

# Download data with time buffer
cols = ['Open', 'High', 'Low', 'Close', 'Volume']
# ohlcv_wbuf = vbt.YFData.download(symbol, start=start_date-time_buffer, end=end_date).get(cols)
ohlcv_wbuf=dbtools.MySQLData.download(symbol).get() # 自带工具类查询
assert(~ohlcv_wbuf.empty)
ohlcv_wbuf = ohlcv_wbuf.astype(np.float64)
print("origin ohlcv_wbuf size:",ohlcv_wbuf.shape)
print(ohlcv_wbuf.columns)
# Create a copy of data without time buffer
wobuf_mask = (ohlcv_wbuf.index >= start_date) & (ohlcv_wbuf.index <= end_date) # mask without buffer
ohlcv = ohlcv_wbuf.loc[wobuf_mask, :]
print("wobuf_mask ohlcv size:",ohlcv.shape)
# Plot the OHLC data
ohlcv.vbt.ohlcv.plot().show_svg() # 绘制蜡烛图
# remove show_svg() to display interactive chart!
origin ohlcv_wbuf size: (978, 5)
Index(['Open', 'High', 'Low', 'Close', 'Volume'], dtype='object')
wobuf_mask ohlcv size: (728, 5)

svg

20,行情的滑窗处理#

注意点:
01,训练集和验证集比例3:1,或者2:1,对应:window_len和set_lens为4<1>(或3<1>),过大了历史包袱沉重,无法及时响应最新行情,过小了则容易参数跳变,形成类似过拟合效果
02,直观感受是验证集最好收尾相接,实际并非最佳,验证集过短会导致无法触发信号生成,从而形成无交易区间。

a,参数设置和效果预览#

# 滚动周期参数设置和大致效果可视化
start_end_days=((end_date-start_date).days*5/7)
bar_days= 80 # 训练,验证集时间长度,以此为单位
test_bar_num=2 # 训练集时间长度
verify_bar_num=1 # 验证集时间长度
verify_overlap=0 # 验证集重叠时间长度
pre_test_days=40 # 由于测试集一部分时间用于计算指标,导致实际训练时间不足,这个是一定程度补充的days周期
# n取值需要满足:确保验证集合收尾相接
# => (n-1)*(verify_bar_num-verify_overlap)+(verify_bar_num+test_bar_num)=start_end_days/bar_days
# => n=(start_end_days/bar_days-test_bar_num-verify_overlap)/(verify_bar_num-verify_overlap)
calc_n=(start_end_days/bar_days-test_bar_num-verify_overlap)/(verify_bar_num-verify_overlap)
split_kwargs = dict(
n=int(calc_n),
window_len=int(bar_days*(test_bar_num+verify_bar_num)+pre_test_days),
set_lens=(int(bar_days*verify_bar_num),),
left_to_right=False
) # 10 windows, each 2 years long, reserve 180 days for test
# 合理设置n,最好确保验证集,连续且无重复
pf_kwargs = dict(
direction='both', # long and short
freq='d'
)
windows = np.arange(10, 50)
def roll_in_and_out_samples(price, **kwargs):
return price.vbt.rolling_split(**kwargs)
price=ohlcv['Close']
# 验证:单列数据验证,橘黄色验证集连续且无重复
roll_in_and_out_samples(price, **split_kwargs, plot=True, trace_names=['in-sample', 'out-sample']).show_svg()
# 大致观察数据特征
(in_price, in_indexes), (out_price, out_indexes) = roll_in_and_out_samples(price, **split_kwargs)
print(in_price.shape, len(in_indexes)) # in-sample
print(out_price.shape, len(out_indexes)) # out-sample
print(in_price.columns)
print(in_price[0:3])
# 这里仅仅用于print数据是否符合期望。
def simulate_all_params(price, windows, **kwargs):
fast_ma, slow_ma = vbt.MA.run_combs(price, windows, r=2, short_names=['fast', 'slow'])
entries = fast_ma.ma_crossed_above(slow_ma)
exits = fast_ma.ma_crossed_below(slow_ma)
pf = vbt.Portfolio.from_signals(price, entries, exits, **kwargs)
return pf.sharpe_ratio()
# Simulate all params for in-sample ranges
in_sharpe = simulate_all_params(in_price, windows, **pf_kwargs)
print(in_sharpe[:3])

svg

(200, 7) 7
(80, 7) 7
Int64Index([0, 1, 2, 3, 4, 5, 6], dtype='int64', name='split_idx')
split_idx 0 1 2 3 4 5 6
0 48.17 56.98 81.93 175.29 169.00 223.97 310.26
1 48.04 56.98 82.92 177.97 164.51 227.50 311.99
2 48.28 58.00 82.18 173.24 169.07 241.23 306.78
fast_window slow_window split_idx
10 11 0 -0.354158
1 1.117491
2 0.551415
Name: sharpe_ratio, dtype: float64

b,根据滑窗参数切分行情数据#

(in_price, in_indexes), (out_price, out_indexes) = roll_in_and_out_samples(price, **split_kwargs)
print(in_price.shape, len(in_indexes)) # in-sample
print(out_price.shape, len(out_indexes)) # out-sample
print(in_indexes[0:3])
print("###################")
print(in_indexes[0][0])
print(in_indexes[1][0])
print(in_indexes[0][25:27])
(200, 7) 7
(80, 7) 7
[DatetimeIndex(['2020-01-02 00:00:00+00:00', '2020-01-03 00:00:00+00:00', '2020-01-06 00:00:00+00:00', '2020-01-07 00:00:00+00:00', '2020-01-08 00:00:00+00:00', '2020-01-09 00:00:00+00:00', '2020-01-10 00:00:00+00:00', '2020-01-13 00:00:00+00:00', '2020-01-14 00:00:00+00:00', '2020-01-15 00:00:00+00:00',
...
'2020-10-20 00:00:00+00:00', '2020-10-21 00:00:00+00:00', '2020-10-22 00:00:00+00:00', '2020-10-23 00:00:00+00:00', '2020-10-26 00:00:00+00:00', '2020-10-27 00:00:00+00:00', '2020-10-28 00:00:00+00:00', '2020-10-29 00:00:00+00:00', '2020-10-30 00:00:00+00:00', '2020-11-02 00:00:00+00:00'], dtype='datetime64[ns, UTC]', name='split_0', length=200, freq=None), DatetimeIndex(['2020-04-27 00:00:00+00:00', '2020-04-28 00:00:00+00:00', '2020-04-29 00:00:00+00:00', '2020-04-30 00:00:00+00:00', '2020-05-06 00:00:00+00:00', '2020-05-07 00:00:00+00:00', '2020-05-08 00:00:00+00:00', '2020-05-11 00:00:00+00:00', '2020-05-12 00:00:00+00:00', '2020-05-13 00:00:00+00:00',
...
'2021-02-03 00:00:00+00:00', '2021-02-04 00:00:00+00:00', '2021-02-05 00:00:00+00:00', '2021-02-08 00:00:00+00:00', '2021-02-09 00:00:00+00:00', '2021-02-10 00:00:00+00:00', '2021-02-18 00:00:00+00:00', '2021-02-19 00:00:00+00:00', '2021-02-22 00:00:00+00:00', '2021-02-23 00:00:00+00:00'], dtype='datetime64[ns, UTC]', name='split_1', length=200, freq=None), DatetimeIndex(['2020-08-14 00:00:00+00:00', '2020-08-17 00:00:00+00:00', '2020-08-18 00:00:00+00:00', '2020-08-19 00:00:00+00:00', '2020-08-20 00:00:00+00:00', '2020-08-21 00:00:00+00:00', '2020-08-24 00:00:00+00:00', '2020-08-25 00:00:00+00:00', '2020-08-26 00:00:00+00:00', '2020-08-27 00:00:00+00:00',
...
'2021-05-31 00:00:00+00:00', '2021-06-01 00:00:00+00:00', '2021-06-02 00:00:00+00:00', '2021-06-03 00:00:00+00:00', '2021-06-04 00:00:00+00:00', '2021-06-07 00:00:00+00:00', '2021-06-08 00:00:00+00:00', '2021-06-09 00:00:00+00:00', '2021-06-10 00:00:00+00:00', '2021-06-11 00:00:00+00:00'], dtype='datetime64[ns, UTC]', name='split_2', length=200, freq=None)]
###################
2020-01-02 00:00:00+00:00
2020-04-27 00:00:00+00:00
DatetimeIndex(['2020-02-14 00:00:00+00:00', '2020-02-17 00:00:00+00:00'], dtype='datetime64[ns, UTC]', name='split_0', freq=None)

21,滑窗的收益数据计算#

a,持有参数收益#

在此区间,基础标的物表现

def simulate_holding(price, **kwargs):
pf = vbt.Portfolio.from_holding(price, **kwargs)
return pf.sharpe_ratio()
in_hold_sharpe = simulate_holding(in_price, **pf_kwargs)
print(in_hold_sharpe.head(5))
out_hold_sharpe = simulate_holding(out_price, **pf_kwargs)
print(out_hold_sharpe.head(5))
split_idx
0 3.604669
1 3.897711
2 2.890238
3 1.095362
4 1.425303
Name: sharpe_ratio, dtype: float64
split_idx
0 1.849248
1 1.152267
2 1.266940
3 -0.093093
4 1.274854
Name: sharpe_ratio, dtype: float64

b,网格参数收益(训练集和验证集)#

def simulate_all_params(price, windows, **kwargs):
fast_ma, slow_ma = vbt.MA.run_combs(price, windows, r=2, short_names=['fast', 'slow'])
entries = fast_ma.ma_crossed_above(slow_ma)
exits = fast_ma.ma_crossed_below(slow_ma)
pf = vbt.Portfolio.from_signals(price, entries, exits, **kwargs)
return pf.sharpe_ratio()
# Simulate all params for in-sample ranges
in_sharpe = simulate_all_params(in_price, windows, **pf_kwargs)
print(in_sharpe.shape)
print(in_sharpe)
# Simulate all params for out-sample ranges
out_sharpe = simulate_all_params(out_price, windows, **pf_kwargs)
print(out_sharpe)
(5460,)
fast_window slow_window split_idx
10 11 0 -0.354158
1 1.117491
2 0.551415
3 0.336980
4 -0.918363
...
48 49 2 -0.758895
3 -0.629667
4 -0.100832
5 -1.404637
6 -0.398260
Name: sharpe_ratio, Length: 5460, dtype: float64
fast_window slow_window split_idx
10 11 0 1.827234
1 -1.103760
2 -2.128081
3 -1.757578
4 1.088042
...
48 49 2 inf
3 1.676608
4 -3.392528
5 3.175129
6 -2.545182
Name: sharpe_ratio, Length: 5460, dtype: float64

c,训练集上的最佳参数用于验证集#

大致思路:
01,获取各split_idx的最佳收益(sharp_radio)的参数组合idxmax,也就是fast_window,slow_window,split_idx,三维索引元组
02,按照split_idx进行聚类,取得各split_idx对应的最佳参数。实际含义就是各滑动窗口的最佳参数

def get_best_index(performance, higher_better=True):
if higher_better:
return performance[performance.groupby('split_idx').idxmax()].index
return performance[performance.groupby('split_idx').idxmin()].index
in_best_index = get_best_index(in_sharpe)
print(in_best_index[:5])
def get_best_params(best_index, level_name):
return best_index.get_level_values(level_name).to_numpy()
in_best_fast_windows = get_best_params(in_best_index, 'fast_window')
in_best_slow_windows = get_best_params(in_best_index, 'slow_window')
in_best_window_pairs = np.array(list(zip(in_best_fast_windows, in_best_slow_windows)))
print(in_best_window_pairs[:5][:])
pd.DataFrame(in_best_window_pairs, columns=['fast_window', 'slow_window']).vbt.plot().show_svg()
MultiIndex([(40, 44, 0),
(12, 13, 1),
(10, 13, 2),
(10, 40, 3),
(12, 37, 4)],
names=['fast_window', 'slow_window', 'split_idx'])
[[40 44]
[12 13]
[10 13]
[10 40]
[12 37]]

svg

将滚动获取的最佳参数用于验证集,统计收益信息

def simulate_best_params(price, best_fast_windows, best_slow_windows, **kwargs):
fast_ma = vbt.MA.run(price, window=best_fast_windows, per_column=True)
slow_ma = vbt.MA.run(price, window=best_slow_windows, per_column=True)
entries = fast_ma.ma_crossed_above(slow_ma)
exits = fast_ma.ma_crossed_below(slow_ma)
pf = vbt.Portfolio.from_signals(price, entries, exits, **kwargs)
return pf.sharpe_ratio()
# Use best params from in-sample ranges and simulate them for out-sample ranges
out_test_sharpe = simulate_best_params(out_price, in_best_fast_windows, in_best_slow_windows, **pf_kwargs)
print(out_test_sharpe.head(5))
ma_window ma_window split_idx
40 44 0 -0.863821
12 13 1 0.441460
10 13 2 -0.895217
40 3 3.233424
12 37 4 2.764636
Name: sharpe_ratio, dtype: float64

22,sharp ratio的汇总可视化#

cv_results_df = pd.DataFrame({
'in_sample_hold': in_hold_sharpe.values,
'in_sample_median': in_sharpe.groupby('split_idx').median().values,
'in_sample_best': in_sharpe[in_best_index].values,
'out_sample_hold': out_hold_sharpe.values,
'out_sample_median': out_sharpe.groupby('split_idx').median().values,
'out_sample_test': out_test_sharpe.values
})
color_schema = vbt.settings['plotting']['color_schema']
cv_results_df.vbt.plot(
trace_kwargs=[
dict(line_color=color_schema['blue']),
dict(line_color=color_schema['blue'], line_dash='dash'),
dict(line_color=color_schema['blue'], line_dash='dot'),
dict(line_color=color_schema['orange']),
dict(line_color=color_schema['orange'], line_dash='dash'),
dict(line_color=color_schema['orange'], line_dash='dot')
]
).show_svg()

svg

关注点:

蓝色部分 正常排序是(从上到下):点线,实现,线段,

橘色部分

实线对实线
说明测试集和验证集的周期收益情况,二者同时出现0轴同侧较好(同时上涨,同时下跌,保持行情的稳定性or延续性)

线段对线段
二者一方面随着各自颜色的实线趋势变化(受各自实线影响较大),其他应该无必然联系

点线对点线
蓝色点高于橘色点线,蓝色是训练集内最佳,橘色则是训练集得到最优参数用于验证集结果收益,大概率低于验证集。

测试,验证集时间长度差异,引入偏差
由于测试集一般是验证集的2-3倍(或更多),对于单边行情(假如上涨),则(测试集的)实线收益。蓝色线大概率位于橘色线上方。
如果下跌,则相反。蓝色由于时间长,大概率位于橘色下方。

注意: 01,202406,对于当前case,y周取值为sharp ratio夏普比,而非收益率。所以数据点高低并不反映收益率。 所以,以上结论需要稍斟酌,并不完全准确。

23,滚动回测收益可视化#

svg

可见,整体结果并不很理想,由于参数是滚动的,相比固定参数,期望取得更好收益,实际上并非如此。
大概率是由于技术指标的预热问题,下一篇会修复此问题。

vectorbt学习_43DMA之三滑窗网格参数优选
/posts/quant/63a4531d/
作者
思想的巨人
发布于
2024-06-18
许可协议
CC BY-NC-SA 4.0

部分信息可能已经过时