
《绝地求生》这款游戏广受欢迎,大家或许想知道,能否依据最终成绩来预测玩家在战斗结束后所处的位置?这无疑是一项极具挑战性的活动,同时也是一种乐趣。接下来,我们就来详细探讨一下。
游戏背景
绝地求生是一款广受欢迎的战术射击游戏,以沙盒形式呈现。全球各地都有大量粉丝,每天都有众多玩家加入。游戏采用大逃杀模式,最多可同时容纳100名玩家竞技。玩家在绝地岛出生时一无所有,需在城市、荒野等场景中搜集资源,对抗对手。游戏自由度高,玩法丰富多样。战斗既激烈又策略性强。玩家群体涵盖不同年龄段,市场热度持续上升。玩家从跳伞开始就全力以赴,只为生存而战。
项目知识点
完成这个排名预测工作,需具备相关领域的知识。首先,掌握基本操作技能是项目参与的基础。例如,精通数据处理至关重要。高效地处理、整理及分析大量游戏数据同样不可或缺。此外,还需运用机器学习的基础算法。比如,运用算法建立模型,这要求有坚实的知识储备。众多团队在执行任务时,都会进行相关技术培训。比如,一些高校和培训机构共同开设的算法课程对参与者大有裨益。这些知识的积累并非一蹴而就,而是需要长期积累。
数据集剖析
这项服务提供了《绝地求生》的匿名游戏数据,内容涵盖多种比赛,比如单人、双人、四人等多种模式。不过要提醒您,并非所有比赛都有100名玩家参与,且每场比赛最多只有4人。我们还提供了.csv格式的训练集和测试集。在这些数据集中,每一行都记录了一位玩家完成游戏后的统计数据,而每一列则代表了不同的特征值。通过查看这些数据文件,您可以了解其局部结构。这些数据是从全球各地广泛收集的,包括了不同玩家的各种比赛场景。准确解读这些数据是进行精准排名预测的基础。
数据集中字段解释:
Id [用户id]
Player’s Id
groupId [所处小队id]
ID to identify a group within a match. If the same group of players plays in different matches, they will have a different groupId each time.
matchId [该场比赛id]
ID to identify match. There are no matches that are in both the training and testing set.
assists [助攻数]
Number of enemy players this player damaged that were killed by teammates.
boosts [使用能量,道具数量]
Number of boost items used.
damageDealt [总伤害]
Total damage dealt. Note: Self inflicted damage is subtracted.
DBNOs [击倒敌人数量]
Number of enemy players knocked.
headshotKills [爆头数]
Number of enemy players killed with headshots.
heals [使用治疗药品数量]
Number of healing items used.
killPlace [本厂比赛杀敌排行]
Ranking in match of number of enemy players killed.
killPoints [Elo杀敌排名]
Kills-based external ranking of player. (Think of this as an Elo ranking where only kills matter.) If there is a value other than -1 in rankPoints, then any 0 in killPoints should be treated as a “None”.
kills [杀敌数]
Number of enemy players killed.
killStreaks [连续杀敌数]
Max number of enemy players killed in a short amount of time.
longestKill [最远杀敌距离]
Longest distance between player and player killed at time of death. This may be misleading, as downing a player and driving away may lead to a large longestKill stat.
matchDuration [比赛时长]
Duration of match in seconds.
matchType [比赛类型(小组人数)]
String identifying the game mode that the data comes from. The standard modes are “solo”, “duo”, “squad”, “solo-fpp”, “duo-fpp”, and “squad-fpp”; other modes are from events or custom matches.
maxPlace [本局最差名次]
Worst placement we have data for in the match. This may not match with numGroups, as sometimes the data skips over placements.
numGroups [小组数量]
Number of groups we have data for in the match.
rankPoints [Elo排名]
Elo-like ranking of player. This ranking is inconsistent and is being deprecated in the API’s next version, so use with caution. Value of -1 takes place of “None”.
revives [救活队员的次数]
Number of times this player revived teammates.
rideDistance [驾车距离]
Total distance traveled in vehicles measured in meters.
roadKills [驾车杀敌数]
Number of kills while in a vehicle.
swimDistance [游泳距离]
Total distance traveled by swimming measured in meters.
teamKills [杀死队友的次数]
Number of times this player killed a teammate.
vehicleDestroys [毁坏机动车的数量]
Number of vehicles destroyed.
walkDistance [步行距离]
Total distance traveled on foot measured in meters.
weaponsAcquired [收集武器的数量]
Number of weapons picked up.
winPoints [胜率Elo排名]
Win-based external ranking of player. (Think of this as an Elo ranking where only winning matters.) If there is a value other than -1 in rankPoints, then any 0 in winPoints should be treated as a “None”.
winPlacePerc [百分比排名]
The target of prediction. This is a percentile winning placement, where 1 corresponds to 1st place, and 0 corresponds to last place in the match. It is calculated off of maxPlace, not numGroups, so it is possible to have missing chunks in a match.
项目评估方法
对玩家排名进行预测,我们采用的是一种模型,其输出数值范围在1到0之间,其中1代表排名第一,0则表示排名最后。这些预测结果的好坏,主要通过平均绝对误差来衡量,简称MAE。MAE能够精确反映出预测误差的真实情况,它是通过计算预测值与实际值之间绝对误差的平均值来确定的。在进行项目评估时,评估人员必须严格遵守这一标准,专业团队则会使用精确的计算工具,以确保评估结果的真实性和可靠性。无论是在论文研究中还是在公司项目检测中,这种方法都得到了高度关注。
数据分析
项目能否成功,数据分析是核心。首先,要搜集并审视数据的基础信息。然后,引入必需的API,比如将数据格式转换为pd。工作人员在核对数据时,依照既定流程,确保数据的准确性。接下来,对数据中的空缺部分进行修正。例如,若目标值缺失,且只有一个样本的数值为NaN,则将其移除。对于数据可能出现的其他异常状况,也有相应的处理办法。另外,还需找出并处理异常数据,从众多数据中筛选出那些不符合常理的部分。
模型训练与优化
在项目实施中,我们使用了随机森林算法来训练模型。这种算法在处理类似数据时表现突出。许多类似的项目都倾向于优先选择这个模型。在训练过程中,模型优化至关重要。优化能够提高模型的准确性和实用性。在优化过程中,我们需要关注模型的各个参数,并参考之前的数据分析结果。经过优化的模型能够更好地处理复杂数据,从而更准确地预测《绝地求生》玩家的排名。
你对《绝地求生》里玩家排名的预测有何看法?它对比赛未来会有怎样的影响?期待你的意见和讨论。另外,别忘了点赞并分享这篇文章。
train = pd.read_csv("./data/train_V2.csv")
train.describe()
assists boosts damageDealt DBNOs headshotKills heals killPlace killPoints kills killStreaks ... revives rideDistance roadKills swimDistance teamKills vehicleDestroys walkDistance weaponsAcquired winPoints winPlacePerc
count 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 ... 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446966e+06 4.446965e+06
mean 2.338149e-01 1.106908e+00 1.307171e+02 6.578755e-01 2.268196e-01 1.370147e+00 4.759935e+01 5.050060e+02 9.247833e-01 5.439551e-01 ... 1.646590e-01 6.061157e+02 3.496091e-03 4.509322e+00 2.386841e-02 7.918208e-03 1.154218e+03 3.660488e+00 6.064601e+02 4.728216e-01
std 5.885731e-01 1.715794e+00 1.707806e+02 1.145743e+00 6.021553e-01 2.679982e+00 2.746294e+01 6.275049e+02 1.558445e+00 7.109721e-01 ... 4.721671e-01 1.498344e+03 7.337297e-02 3.050220e+01 1.673935e-01 9.261157e-02 1.183497e+03 2.456544e+00 7.397004e+02 3.074050e-01
min 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
25% 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 2.400000e+01 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.551000e+02 2.000000e+00 0.000000e+00 2.000000e-01
50% 0.000000e+00 0.000000e+00 8.424000e+01 0.000000e+00 0.000000e+00 0.000000e+00 4.700000e+01 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 6.856000e+02 3.000000e+00 0.000000e+00 4.583000e-01
75% 0.000000e+00 2.000000e+00 1.860000e+02 1.000000e+00 0.000000e+00 2.000000e+00 7.100000e+01 1.172000e+03 1.000000e+00 1.000000e+00 ... 0.000000e+00 1.909750e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.976000e+03 5.000000e+00 1.495000e+03 7.407000e-01
max 2.200000e+01 3.300000e+01 6.616000e+03 5.300000e+01 6.400000e+01 8.000000e+01 1.010000e+02 2.170000e+03 7.200000e+01 2.000000e+01 ... 3.900000e+01 4.071000e+04 1.800000e+01 3.823000e+03 1.200000e+01 5.000000e+00 2.578000e+04 2.360000e+02 2.013000e+03 1.000000e+00
8 rows × 25 columns
train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4446966 entries, 0 to 4446965
Data columns (total 29 columns):
Id object
groupId object
matchId object
assists int64
boosts int64
damageDealt float64
DBNOs int64
headshotKills int64
heals int64
killPlace int64
killPoints int64
kills int64
killStreaks int64
longestKill float64
matchDuration int64
matchType object
maxPlace int64
numGroups int64
rankPoints int64
revives int64
rideDistance float64
roadKills int64
swimDistance float64
teamKills int64
vehicleDestroys int64
walkDistance float64
weaponsAcquired int64
winPoints int64
winPlacePerc float64
dtypes: float64(6), int64(19), object(4)
memory usage: 983.9+ MB
可以看到数据一共有4446966条,
train.shape
(4446966, 29)