堡主从4月中旬开始,陆陆续续爬取了天龙畅易阁的部分成交角色数据,将数据库导出excel表格,有兴趣的话可以注册本站下载上面的表格,更细致的数据还有待进一步重新爬取成交页面。
畅易阁4万条成交角色数据.xlsx (下载0)堡主准备把这些成交页面都下载到本地,然后进行数据分析。暂时列出以下几项对价格可能存在影响的因素:
- 历史最高装备评分
- 宝石进阶评分
- 等级
- 区服
- 是否有重楼肩、玉、戒、链、甲
- 是否有30附体
- 神鼎
- 武意等级
- 穿刺和减免
- 血量
- 命中
- 属性攻击
- 减抗
- 减抗下限
- 是否有九黎妖虎
- 时装数目
- 坐骑数目
- 稀有时装或坐骑持有量
另一种方案是将价格分为两个部分,一个是硬件部分,另一个是外观部分。
硬件部分完全将人物属性的力、灵、体、定、身、外功、内功、防御、穿刺、属性攻击等等作为输入;外观部分作为一个额外的价格增量,把两部分累加起来。
今天就开始爬数据了,4W条,一秒一条,也需要4W秒,慢慢来。
一共爬取了23.2G的数据,共44975条。

环境:Python 3.7,TensorFlow 2.5

价格拟合
先尝试一下堡主认为比较重要的8维数据属性【装备评分,宝石评分,30附体,武意等级,穿刺伤害,穿刺减免,血量,主属性】,设置个一个简单的两层全连接网络。网络输出就是待拟合的价格(取log)
model = tf.keras.models.Sequential([ tf.keras.layers.Dense(256, activation=tf.nn.relu), tf.keras.layers.Dense(128, activation=tf.nn.relu), tf.keras.layers.Dense(1) ]) model.compile(optimizer=tf.keras.optimizers.Adam(0.001), loss=tf.keras.losses.mean_squared_error) history = model.fit(x_train_scaled, y_train, validation_data=(x_valid_scaled, y_valid), epochs=100, batch_size=128, callbacks = [cp_callback])
训练100个epoch,得到如下图所示损失函数曲线(收敛到0.2左右)

额外添加两条数据:【重楼肩,命中】,用同样的网络训练,损失收敛到0.17,可见相关性很大,尤其是重楼肩,一个肩膀对应大几千的价格。

再添加三条数据:【神鼎(体力)、神鼎(属性)、是否有妖虎】,用同样的网络训练,损失收敛到0.15,可见相关性很大。

加上其他重楼装备后,重新进行拟合,收敛到0.14左右。

今晚打算把数据存的全一些,刚码了代码。一共32维数据,第一维是价格标签。另外修改了一处bug,重楼玉、戒指需要累加,万一大佬号上重楼多呢。
目前的思路是爬取之后,按照区服、门派计算一个价格指数,网络输出的值乘上价格指数方为实际价格。
for name in file_cut: file_html = open("./html/"+name, "r",encoding='utf-8') # 读取文件内容 html = file_html.read() index_data = html.find('charObj') to_data = html.find(';', index_data) data = html[index_data+10: to_data] dict_data = json.loads(data) # str转为dict # 1 current_price_index = html.find("¥") if current_price_index != -1: big_text = html[current_price_index: current_price_index + 11] for m in range(11): if big_text[m] == '<': current_price = big_text[1:m] break price = int(current_price) # 2 menpai = dict_data['menpai'] #3 sex = dict_data['sex'] #4 rank_raw = dict_data['level'] if rank_raw <= 89: rank = 1 if rank_raw>=90 and rank_raw<=99: rank = 2 if rank_raw>=100 and rank_raw<=109: rank = 3 if rank_raw>100: rank = 4 #5 score_equipment = dict_data['equipScoreHH'] #6 score_diamond = dict_data['gemJinJieScore'] #7 three10 = 0 for item in dict_data['petList']: if item['savvy'] == 10 and item['lingXing'] == 10 and item['fitValue'] == 10: three10 = 1 break #8 wuyi_level = dict_data['martialDB']['martialLevel'] #9 chuanCiShangHai = dict_data['chuanCiShangHai'] #10 chuanCiJianMian = dict_data['chuanCiJianMian'] #11 blood = dict_data['maxHp'] #12 attack_arr = [dict_data["coldAtt"], dict_data["fireAtt"], dict_data["lightAtt"], dict_data["postionAtt"] ] attack = max(attack_arr) # 13jiankang jiankang_arr = [dict_data["resistColdDef"], dict_data["resistFireDef"], dict_data["resistLightDef"], dict_data["resistPostionDef"] ] jiankang = max(jiankang_arr) # 14xiaxian xiaxian_arr = [dict_data["resistColdDefLimit"], dict_data["resistFireDefLimit"], dict_data["resistLightDefLimit"], dict_data["resistPostionDefLimit"] ] xiaxian = max(xiaxian_arr) # 15命中 mingzhong = dict_data['hit'] # 16闪避 shanbi = dict_data["miss"] # 17,18.19 20 21重楼 chonglou_jian = 0 chonglou_jia = 0 chonglou_jie = 0 chonglou_yu = 0 chonglou_lian = 0 for item in dict_data['items']['equip']: if dict_data['items']['equip'][item]['dataId'] == 10413102: chonglou_jia += 1 if dict_data['items']['equip'][item]['dataId'] == 10422016: chonglou_jie += 1 if dict_data['items']['equip'][item]['dataId'] == 10420090: chonglou_lian += 1 if dict_data['items']['equip'][item]['dataId'] == 10423024: chonglou_yu += 1 if dict_data['items']['equip'][item]['dataId'] == 10415056: chonglou_jian = 2 if dict_data['items']['equip'][item]['dataId'] == 10415055: chonglou_jian = max(chonglou_jian, 1) # 24 妖虎 yaohu = 0 for item in dict_data['petList']: if item['strPerception'] == 6005 and item['conPerception'] == 4289: yaohu = 1 break # 22,23shending shending1 = dict_data['shenDing']['danYaoCount'] shending2 = dict_data['shenDing']['lianDanCount'] # 25, 26zhenyuan zhenyuan_level = 0 zhenyuan_grade = 0 for item in dict_data["zhenYuanList"]: zhenyuan_grade += item['grade'] zhenyuan_level += item['level'] # 27 xinfa xinfa = dict_data["xinFaScore"] # 28 xiulian xiulian = dict_data["xiuLianScore"] # 29 clothes clothes_number = len(dict_data["tujianInfo"]["playerDressInfo"]) # 30huanshi huanshi_number = len(dict_data["tujianInfo"]["playerExWeaponInfo"]) # 31 ride ride_number = len(dict_data["tujianInfo"]["playerExRideInfo"]) # 32 yuanbao yuanbao = dict_data["bkBgBaseInfo"]["yuanBao"] / 40 current_data = np.array([price, menpai, sex, rank, score_equipment, score_diamond,three10,wuyi_level,chuanCiShangHai,chuanCiJianMian, blood,attack,jiankang,xiaxian,mingzhong,shanbi,chonglou_jian,chonglou_jia, chonglou_yu,chonglou_lian, chonglou_jie, shending1, shending2, yaohu, zhenyuan_level, zhenyuan_grade, xinfa, xiulian, clothes_number,huanshi_number, ride_number,yuanbao]) # print(current_data) np_data[count, :] = current_data print(count) count += 1 file_html.close()
下面是训练部分代码:
data = np.load("data3.npy") filtered_data = np.zeros(data.shape) count_index = 0 for i in range(data.shape[0]): if 100<=data[i][1]<=30000 and data[i][1] / data[i][5] > 0.001 and data[i][4]==4: filtered_data[count_index, :] = data[i, :] count_index += 1 filtered_data = filtered_data[:count_index, :] np.save("filtered_data.npy", filtered_data) print(filtered_data.shape) x_train_all, x_test, y_train_all, y_test = train_test_split(np.hstack((filtered_data[:, 5:18], filtered_data[:,22:-1])), np.log(filtered_data[:,1:2]), random_state=5, test_size=0.1) x_train, x_valid, y_train, y_valid = train_test_split(x_train_all, y_train_all, random_state=5, test_size=0.1) x_train_scaled = scaler.fit_transform(x_train) x_valid_scaled = scaler.transform(x_valid) x_test_scaled = scaler.transform(x_test) # 打印数据集的维度 print(x_train.shape, y_train.shape) print(x_test.shape, y_test.shape) print(x_valid.shape, y_valid.shape) model = tf.keras.models.Sequential([ tf.keras.layers.Dense(128, activation=tf.nn.leaky_relu), tf.keras.layers.Dense(64, activation=tf.nn.leaky_relu), tf.keras.layers.Dense(1) ]) model.compile(optimizer=tf.keras.optimizers.Adam(0.001), loss=tf.keras.losses.mean_squared_error) # 载入模型测试 model.load_weights(checkpoint_path) # 直接从检查点加载权重 output = model.predict(x_valid_scaled[50:100,:]) print(np.hstack([np.exp(output), np.exp(y_valid[50:100,:])])) # tfjs.converters.save_keras_model(model, "./model_js") cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path, save_weights_only=True, verbose=1) # 训练 history = model.fit(x_train_scaled, y_train, validation_data=(x_valid_scaled, y_valid), epochs=100, batch_size=64, callbacks = [cp_callback]) # 绘图 def plot_learning_curves(history): pd.DataFrame(history.history).plot(figsize=(8,5)) plt.grid(True) plt.title("Loss Curve") plt.show() plot_learning_curves(history)
经过多次调参和测试,最后loss收敛到0.09左右,去exp后与标签数据差距仍然不小。
价格区间拟合
未完待续~~~~~~~
转载请注明转自:http://moyubao.net/tlbb-news/2051/
111
老玩家~