LOADING
1079 字
5 分钟
vectorbt学习_53DMA之十三参数优选和自适应周期验证

之前有部分想当然的结论,重新做下验证

结论:
01,4个参数优选方法中,
猜测的是in_test_best_index_basic最优,
实际是in_test_best_index_nb_coord稍稍胜出

02,自适应周期计算中,
猜测的是自适应周期最优(此处是通过傅里叶变换计算的),
实际是fixed60最优,本质是自适应周期只要不是最优,就基本说明这个思路是无效的了。

01,基础数据格式#

csv中的数据格式

Start,End,Period,Start Value,End Value,Total Return [%],Benchmark Return [%],Max Gross Exposure [%],Total Fees Paid,Max Drawdown [%],Max Drawdown Duration,Total Trades,Total Closed Trades,Total Open Trades,Open Trade PnL,Win Rate [%],Best Trade [%],Worst Trade [%],Avg Winning Trade [%],Avg Losing Trade [%],Avg Winning Trade Duration,Avg Losing Trade Duration,Profit Factor,Expectancy,Sharpe Ratio,Calmar Ratio,Omega Ratio,Sortino Ratio,choose_method,file_name,symbol
0,79,80 days,10000.00,10134.98,1.35,6.77,100.00,91.46,12.38,54 days 14:24:00,1.80,1.80,0.00,0.00,30.00,1.87,-2.59,7.81,-4.94,11 days 06:00:00,16 days 15:00:00,inf,-12.60,-0.67,0.65,0.85,-0.55,in_test_best_index_basic,92reportConfig/01_05dma_06rollingFilterGridParamExitFuncV2/002230XSHE_20200101_20240801_bar80_filter1.md,002230.XSHE
0,79,80 days,10000.00,10482.11,4.82,-0.01,100.00,51.21,10.04,52 days 04:48:00,1.00,1.00,0.00,0.00,40.00,4.83,4.83,18.41,-4.22,20 days 00:00:00,32 days 08:00:00,inf,482.11,0.22,5.98,1.20,1.28,in_test_best_index_nb_coord,92reportConfig/01_05dma_06rollingFilterGridParamExitFuncV2/002230XSHE_20200101_20240801_bar80_filter1.md,002230.XSHE
0,79,80 days,10000.00,10358.56,3.59,4.13,100.00,69.45,10.67,54 days 08:00:00,1.33,1.33,0.00,0.00,41.67,2.68,-0.44,7.57,-4.30,20 days 20:00:00,17 days 18:00:00,inf,125.02,-0.12,1.29,0.96,0.14,in_test_best_index_nb_mean,92reportConfig/01_05dma_06rollingFilterGridParamExitFuncV2/002230XSHE_20200101_20240801_bar80_filter1.md,002230.XSHE
0,79,80 days,10000.00,10160.66,1.61,4.13,100.00,76.51,12.49,54 days 08:00:00,1.50,1.50,0.00,0.00,38.89,2.48,-1.77,7.13,-5.17,19 days 20:00:00,16 days 12:00:00,inf,52.32,-0.40,0.97,0.89,-0.16,in_test_best_index_nb_median,92reportConfig/01_05dma_06rollingFilterGridParamExitFuncV2/002230XSHE_20200101_20240801_bar80_filter1.md,002230.XSHE
0,99,100 days,10000.00,10258.43,2.58,2.67,100.00,54.29,13.10,69 days 06:00:00,1.50,0.75,0.75,699.01,0.00,-5.89,-5.89,,-5.89,,19 days 08:00:00,0.00,-587.44,0.07,1.86,0.99,0.42,in_test_best_index_basic,92reportConfig/01_05dma_06rollingFilterGridParamExitFuncV2/002180XSHE_20200101_20240801_bar100_filter1.md,002180.XSHE
0,99,100 days,10000.00,10335.80,3.36,4.75,100.00,95.13,12.27,65 days 19:12:00,2.20,1.60,0.60,574.26,13.33,-2.35,-4.66,6.93,-4.63,11 days 12:00:00,15 days 19:12:00,0.95,-333.04,0.27,2.01,1.07,0.69,in_test_best_index_nb_coord,92reportConfig/01_05dma_06rollingFilterGridParamExitFuncV2/002180XSHE_20200101_20240801_bar100_filter1.md,002180.XSHE
0,99,100 days,10000.00,10329.60,3.30,2.67,100.00,54.63,12.47,69 days 06:00:00,1.50,0.75,0.75,702.18,0.00,-4.98,-4.98,,-4.98,,20 days 00:00:00,0.00,-496.76,0.16,1.93,1.01,0.53,in_test_best_index_nb_mean,92reportConfig/01_05dma_06rollingFilterGridParamExitFuncV2/002180XSHE_20200101_20240801_bar100_filter1.md,002180.XSHE
0,99,100 days,10000.00,10302.37,3.02,2.67,100.00,54.50,12.71,69 days 06:00:00,1.50,0.75,0.75,700.97,0.00,-5.33,-5.33,,-5.33,,20 days 08:00:00,0.00,-531.47,0.13,1.90,1.00,0.49,in_test_best_index_nb_median,92reportConfig/01_05dma_06rollingFilterGridParamExitFuncV2/002180XSHE_20200101_20240801_bar100_filter1.md,002180.XSHE

02,4种参数优选方法的比较#

交叉表统计,各优选方法的rank计数
主要用来验证常见的4种参数优选方法的优劣。

import pandas as pd
pd.set_option('display.max_rows',500)
pd.set_option('display.max_columns',500)
pd.set_option('display.width',1000)
# 步骤1: 读取数据
data_path = '/home/john/git/repo_quant/myvectorbt/92reportConfig/01_05dma_06rollingFilterGridParamExitFuncV2_rst.csv'
df = pd.read_csv(data_path)
# 步骤2: 数据清洗
df = df.dropna(subset=['Start Value'])
# 步骤3: 排序和分组
df['End Value'] = pd.to_numeric(df['End Value'], errors='coerce')
df_sorted = df.sort_values(by=['symbol', 'End Value'], ascending=[True, False])
# 计算排名
df_sorted['rank'] = df_sorted.groupby('symbol')['End Value'].rank("dense", ascending=False)
# 步骤4: 创建交叉表
result_table = pd.crosstab(index=df_sorted['rank'], columns=df_sorted['choose_method'])
# 打印结果表格
print(result_table)
choose_method in_test_best_index_basic in_test_best_index_nb_coord in_test_best_index_nb_mean in_test_best_index_nb_median
rank
1.0 19 26 20 21
2.0 21 12 30 22
3.0 20 21 18 22
4.0 20 21 12 15

数据含义:列名表示优选方法,
第一列第一行表示in_test_best_index_basic优选方法,在同一个标的的4中优选方法,是最优的情况出现19次。
第一列第二行表示in_test_best_index_basic优选方法,在同一个标的的4中优选方法,是次优的情况出现21次。
注意下:由于采用的rank(“dense”)方式,所以可能存在并列第一(并列第二,第三都有可能)的情况。 所有理论上的每一行的sum应该相等,且等于总标的*4并不成立,而且部分标的部分择优方法可能存在缺失值,都会导致rank1的总数量大于rank4(的总数)

从结果上看,in_test_best_index_nb_coord均值方法最佳,但也说不上显著。

03,自适应周期和固定周期的比较#

在做滚动性的周期回测时,需要设定训练-预测周期,比例和基础周期,
比如 80天,2:1
就是训练集80*2=160天,
预测集80*1=80天.(可以认为多久重置一次参数的周期)。
太长了,可能学习到真实的参数,但是对市场反应过于迟钝了,学习到的正确参数可能已经不再适用当下,
太短了,可能并未学习到真实参数,只是过拟合出一个不错的结果,而且切分出的验证集次数过多,累乘时,容易偏向于过拟合。

认为自适应计算出的标的的 训练-预测 周期,优于固定的周期。所以,理想的自适应周期的回测结果应该优于任何一组固定周期。

import pandas as pd
# 读取和合并数据
files = {
'adaptive': '../92reportConfig/01_05dma_06rollingFilterGridParamExitFuncV2_rst.csv',
'fixed_40': '../92reportConfig/01_05dma_06rollingFilterGridParamExitFuncV2_cycle40_rst.csv',
'fixed_60': '../92reportConfig/01_05dma_06rollingFilterGridParamExitFuncV2_cycle60_rst.csv',
'fixed_80': '../92reportConfig/01_05dma_06rollingFilterGridParamExitFuncV2_cycle80_rst.csv'
}
dfs = []
for key, file in files.items():
df = pd.read_csv(file)
df = df[df['choose_method'] == 'in_test_best_index_basic']
df['source'] = key
dfs.append(df)
combined_df = pd.concat(dfs)
# print(combined_df.head(3))
# 计算排名
combined_df['rank'] = combined_df.groupby(['symbol'])['End Value'].rank(method='dense', ascending=False)
# print(combined_df.head(3))
# 创建交叉表
crosstab_result = pd.crosstab(index=combined_df['rank'] , columns=combined_df['source'])
# 打印结果
print(crosstab_result)
source adaptive fixed_40 fixed_60 fixed_80
rank
1.0 21 23 28 21
2.0 25 23 22 21
3.0 21 28 22 17
4.0 13 6 9 20

本以为自适应周期的应该是最优的,实际好像也未必
上面看fixed_60的表现是最佳的,并不符合最初猜测。

当然此时,并不能说明猜测一定是错误的。
原因如下:
01,自己的最佳周期计算函数有问题(傅里叶变换),计算出的最佳周期未必真正的最佳周期。
02,最佳周期计算采用202001-202301数据得来的,实际测试周期2020-202408,所以并非完全卡着周期的,这也会引入偏差。只是这部分误差可能难以消除,毕竟,不可能提前预知未来一段时间的最佳周期参数。

vectorbt学习_53DMA之十三参数优选和自适应周期验证
/posts/quant/1e29c3b6/
作者
思想的巨人
发布于
2024-08-11
许可协议
CC BY-NC-SA 4.0

部分信息可能已经过时