人工智能 数据集划分
1、函数sklearn.model_selection.train_test_split(数据集[test_size测试集大小,train_size训练集大小,random_state整数——随机数种子,否则为随机数生成器],stratify数组[分层采样的标记数组]或none)返回值:一个列表,依次给出一/多个数据集划分的结果:训练集、测试集。2、代码(1)未分层X=[[1,2,3,4],[11
1、函数
sklearn.model_selection.train_test_split(数据集[test_size测试集大小,train_size训练集大小,random_state整数——随机数种子,否则为随机数生成器],stratify数组[分层采样的标记数组]或none)
返回值:一个列表,依次给出一/多个数据集划分的结果:训练集、测试集。
2、代码(未分层)
X=[[1,2,3,4],
[11,12,13,14],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44],
[51,52,53,54],
[61,62,63,64],
[71,72,73,74]]
y=[1,1,0,0,1,1,0,0]
# 切分,测试集大小为原始数据集大小的 40%
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.4, random_state=0)
print("X_train=",X_train)
print("X_test=",X_test)
print("y_train=",y_train)
print("y_test=",y_test)
3、运行结果
【out】:
X_train= [[31, 32, 33, 34], [1, 2, 3, 4], [51, 52, 53, 54], [41, 42, 43, 44]]
X_test= [[61, 62, 63, 64], [21, 22, 23, 24], [11, 12, 13, 14], [71, 72, 73, 74]]
y_train= [0, 1, 1, 1]
y_test= [0, 0, 1, 0]
4、代码(分层)
# 分层采样切分,测试集大小为原始数据集大小的 40%
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4,random_state=0,stratify=y)
print("Stratify:X_train=",X_train)
print("Stratify:X_test=",X_test)
print("Stratify:y_train=",y_train)
print("Stratify:y_test=",y_test)
5、运行结果
【out】:
Stratify:X_train= [[41, 42, 43, 44], [61, 62, 63, 64], [1, 2, 3, 4], [71, 72, 73, 74]]
Stratify:X_test= [[21, 22, 23, 24], [31, 32, 33, 34], [11, 12, 13, 14], [51, 52, 53, 54]]
Stratify:y_train= [1, 0, 1, 0]
Stratify:y_test= [0, 0, 1, 1]
test_size测试集大小,train_size训练集大小——有就进一位。
更多推荐
所有评论(0)