周四. 9月 28th, 2023

堡主从4月中旬开始,陆陆续续爬取了天龙畅易阁的部分成交角色数据,将数据库导出excel表格,有兴趣的话可以注册本站下载上面的表格,更细致的数据还有待进一步重新爬取成交页面。

畅易阁4万条成交角色数据.xlsx (下载0)

堡主准备把这些成交页面都下载到本地,然后进行数据分析。暂时列出以下几项对价格可能存在影响的因素:

  • 历史最高装备评分
  • 宝石进阶评分
  • 等级
  • 区服
  • 是否有重楼肩、玉、戒、链、甲
  • 是否有30附体
  • 神鼎
  • 武意等级
  • 穿刺和减免
  • 血量
  • 命中
  • 属性攻击
  • 减抗
  • 减抗下限
  • 是否有九黎妖虎
  • 时装数目
  • 坐骑数目
  • 稀有时装或坐骑持有量

另一种方案是将价格分为两个部分,一个是硬件部分,另一个是外观部分。

硬件部分完全将人物属性的力、灵、体、定、身、外功、内功、防御、穿刺、属性攻击等等作为输入;外观部分作为一个额外的价格增量,把两部分累加起来。

今天就开始爬数据了,4W条,一秒一条,也需要4W秒,慢慢来。

一共爬取了23.2G的数据,共44975条。

数据爬取23.2GB文件夹

环境:Python 3.7,TensorFlow 2.5

价格拟合

先尝试一下堡主认为比较重要的8维数据属性【装备评分,宝石评分,30附体,武意等级,穿刺伤害,穿刺减免,血量,主属性】,设置个一个简单的两层全连接网络。网络输出就是待拟合的价格(取log)

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(256, activation=tf.nn.relu),
    tf.keras.layers.Dense(128, activation=tf.nn.relu),
    tf.keras.layers.Dense(1)
])

model.compile(optimizer=tf.keras.optimizers.Adam(0.001), loss=tf.keras.losses.mean_squared_error)

history = model.fit(x_train_scaled, y_train,
                   validation_data=(x_valid_scaled, y_valid),
                   epochs=100, batch_size=128,
                   callbacks = [cp_callback])

训练100个epoch,得到如下图所示损失函数曲线(收敛到0.2左右)

神经网络训练Loss曲线

额外添加两条数据:【重楼肩,命中】,用同样的网络训练,损失收敛到0.17,可见相关性很大,尤其是重楼肩,一个肩膀对应大几千的价格。

再添加三条数据:【神鼎(体力)、神鼎(属性)、是否有妖虎】,用同样的网络训练,损失收敛到0.15,可见相关性很大。

加上其他重楼装备后,重新进行拟合,收敛到0.14左右。

今晚打算把数据存的全一些,刚码了代码。一共32维数据,第一维是价格标签。另外修改了一处bug,重楼玉、戒指需要累加,万一大佬号上重楼多呢。

目前的思路是爬取之后,按照区服、门派计算一个价格指数,网络输出的值乘上价格指数方为实际价格。

for name in file_cut:
    file_html = open("./html/"+name, "r",encoding='utf-8')
    # 读取文件内容
    html = file_html.read()
    index_data = html.find('charObj')
    to_data = html.find(';', index_data)
    data = html[index_data+10: to_data]
    dict_data = json.loads(data)  # str转为dict
    # 1
    current_price_index = html.find("¥")
    if current_price_index != -1:
        big_text = html[current_price_index: current_price_index + 11]
        for m in range(11):
            if big_text[m] == '<':
                current_price = big_text[1:m]
                break
    price = int(current_price)
    # 2
    menpai = dict_data['menpai']
    #3
    sex = dict_data['sex']
    #4
    rank_raw = dict_data['level']
    if rank_raw <= 89:
        rank = 1
    if rank_raw>=90 and rank_raw<=99:
        rank = 2
    if rank_raw>=100 and rank_raw<=109:
        rank = 3
    if rank_raw>100:
        rank = 4
    #5
    score_equipment = dict_data['equipScoreHH']
    #6
    score_diamond = dict_data['gemJinJieScore']
    #7
    three10 = 0
    for item in dict_data['petList']:
        if item['savvy'] == 10 and item['lingXing'] == 10 and item['fitValue'] == 10:
            three10 = 1
            break
    #8
    wuyi_level = dict_data['martialDB']['martialLevel']
    #9
    chuanCiShangHai = dict_data['chuanCiShangHai']
    #10
    chuanCiJianMian = dict_data['chuanCiJianMian']
    #11
    blood = dict_data['maxHp']
    #12
    attack_arr = [dict_data["coldAtt"], dict_data["fireAtt"], dict_data["lightAtt"], dict_data["postionAtt"] ]
    attack = max(attack_arr)
    # 13jiankang
    jiankang_arr = [dict_data["resistColdDef"], dict_data["resistFireDef"], dict_data["resistLightDef"], dict_data["resistPostionDef"] ]
    jiankang = max(jiankang_arr)
    # 14xiaxian
    xiaxian_arr = [dict_data["resistColdDefLimit"], dict_data["resistFireDefLimit"], dict_data["resistLightDefLimit"], dict_data["resistPostionDefLimit"] ]
    xiaxian = max(xiaxian_arr)
    # 15命中
    mingzhong = dict_data['hit']
    # 16闪避
    shanbi = dict_data["miss"]
    # 17,18.19 20 21重楼
    chonglou_jian = 0
    chonglou_jia = 0
    chonglou_jie = 0
    chonglou_yu = 0
    chonglou_lian = 0
    for item in dict_data['items']['equip']:
        if dict_data['items']['equip'][item]['dataId'] == 10413102:
            chonglou_jia += 1
        if dict_data['items']['equip'][item]['dataId'] == 10422016:
            chonglou_jie += 1
        if dict_data['items']['equip'][item]['dataId'] == 10420090:
            chonglou_lian += 1
        if dict_data['items']['equip'][item]['dataId'] == 10423024:
            chonglou_yu += 1
        if dict_data['items']['equip'][item]['dataId'] == 10415056:
            chonglou_jian = 2
        if dict_data['items']['equip'][item]['dataId'] == 10415055:
            chonglou_jian = max(chonglou_jian, 1)
    # 24 妖虎
    yaohu = 0
    for item in dict_data['petList']:
        if item['strPerception'] == 6005 and item['conPerception'] == 4289:
            yaohu = 1
            break
    # 22,23shending
    shending1 = dict_data['shenDing']['danYaoCount']
    shending2 = dict_data['shenDing']['lianDanCount']
    # 25, 26zhenyuan
    zhenyuan_level = 0
    zhenyuan_grade = 0
    for item in dict_data["zhenYuanList"]:
        zhenyuan_grade += item['grade']
        zhenyuan_level += item['level']
    # 27 xinfa
    xinfa = dict_data["xinFaScore"]
    # 28 xiulian
    xiulian = dict_data["xiuLianScore"]
    # 29 clothes
    clothes_number = len(dict_data["tujianInfo"]["playerDressInfo"])
    # 30huanshi
    huanshi_number = len(dict_data["tujianInfo"]["playerExWeaponInfo"])
    # 31 ride
    ride_number = len(dict_data["tujianInfo"]["playerExRideInfo"])
    # 32 yuanbao
    yuanbao = dict_data["bkBgBaseInfo"]["yuanBao"] / 40
    current_data = np.array([price, menpai, sex, rank, score_equipment, score_diamond,three10,wuyi_level,chuanCiShangHai,chuanCiJianMian,
                             blood,attack,jiankang,xiaxian,mingzhong,shanbi,chonglou_jian,chonglou_jia, chonglou_yu,chonglou_lian,
                             chonglou_jie, shending1, shending2, yaohu, zhenyuan_level, zhenyuan_grade, xinfa, xiulian, clothes_number,huanshi_number,
                             ride_number,yuanbao])
    # print(current_data)
    np_data[count, :] = current_data
    print(count)
    count += 1
    file_html.close()

下面是训练部分代码:

data = np.load("data3.npy")

filtered_data = np.zeros(data.shape)
count_index = 0
for i in range(data.shape[0]):
    if 100<=data[i][1]<=30000 and data[i][1] / data[i][5] > 0.001 and data[i][4]==4:
        filtered_data[count_index, :] = data[i, :]
        count_index += 1
filtered_data = filtered_data[:count_index, :]

np.save("filtered_data.npy", filtered_data)

print(filtered_data.shape)

x_train_all, x_test, y_train_all, y_test = train_test_split(np.hstack((filtered_data[:, 5:18], filtered_data[:,22:-1])), np.log(filtered_data[:,1:2]), random_state=5, test_size=0.1)
x_train, x_valid, y_train, y_valid = train_test_split(x_train_all, y_train_all, random_state=5, test_size=0.1)

x_train_scaled = scaler.fit_transform(x_train)
x_valid_scaled = scaler.transform(x_valid)
x_test_scaled = scaler.transform(x_test)
# 打印数据集的维度
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
print(x_valid.shape, y_valid.shape)


model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(128, activation=tf.nn.leaky_relu),
    tf.keras.layers.Dense(64, activation=tf.nn.leaky_relu),
    tf.keras.layers.Dense(1)
])


model.compile(optimizer=tf.keras.optimizers.Adam(0.001), loss=tf.keras.losses.mean_squared_error)

# 载入模型测试
model.load_weights(checkpoint_path)     # 直接从检查点加载权重
output = model.predict(x_valid_scaled[50:100,:])
print(np.hstack([np.exp(output), np.exp(y_valid[50:100,:])]))
# tfjs.converters.save_keras_model(model, "./model_js")

cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)
# 训练
history = model.fit(x_train_scaled, y_train,
                   validation_data=(x_valid_scaled, y_valid),
                   epochs=100, batch_size=64,
                   callbacks = [cp_callback])

# 绘图
def plot_learning_curves(history):
    pd.DataFrame(history.history).plot(figsize=(8,5))
    plt.grid(True)
    plt.title("Loss Curve")
    plt.show()


plot_learning_curves(history)

经过多次调参和测试,最后loss收敛到0.09左右,去exp后与标签数据差距仍然不小。

价格区间拟合

未完待续~~~~~~~

   
 摸鱼堡版权所有丨如未注明,均为原创丨本网站采用BY-NC-SA协议进行授权
转载请注明转自:http://moyubao.net/tlbb-news/2051/
《天龙畅易阁成交角色4W条数据分析》有2条评论

发表评论

邮箱地址不会被公开。