人工智能数据集划分

1、函数sklearn.model_selection.train_test_split(数据集[test_size测试集大小，train_size训练集大小,random_state整数——随机数种子，否则为随机数生成器]，stratify数组[分层采样的标记数组]或none)返回值：一个列表，依次给出一/多个数据集划分的结果：训练集、测试集。2、代码（1）未分层X=[[1,2,3,4],[11

期末保佑徐徐子

450人浏览 · 2022-05-25 16:42:21

期末保佑徐徐子 · 2022-05-25 16:42:21 发布

1、函数

sklearn.model_selection.train_test_split(数据集[test_size测试集大小，train_size训练集大小,random_state整数——随机数种子，否则为随机数生成器]，stratify数组[分层采样的标记数组]或none)
返回值：一个列表，依次给出一/多个数据集划分的结果：训练集、测试集。

2、代码（未分层）

X=[[1,2,3,4],
    [11,12,13,14],
    [21,22,23,24],
    [31,32,33,34],
    [41,42,43,44],
    [51,52,53,54],
    [61,62,63,64],
    [71,72,73,74]]
y=[1,1,0,0,1,1,0,0]
# 切分，测试集大小为原始数据集大小的 40%
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.4, random_state=0) 
print("X_train=",X_train)
print("X_test=",X_test)
print("y_train=",y_train)
print("y_test=",y_test)

3、运行结果

【out】：

X_train= [[31, 32, 33, 34], [1, 2, 3, 4], [51, 52, 53, 54], [41, 42, 43, 44]]
X_test= [[61, 62, 63, 64], [21, 22, 23, 24], [11, 12, 13, 14], [71, 72, 73, 74]]
y_train= [0, 1, 1, 1]
y_test= [0, 0, 1, 0]

4、代码（分层）

# 分层采样切分，测试集大小为原始数据集大小的 40%
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4,random_state=0,stratify=y) 
print("Stratify:X_train=",X_train)
print("Stratify:X_test=",X_test)
print("Stratify:y_train=",y_train)
print("Stratify:y_test=",y_test)

5、运行结果

【out】：

Stratify:X_train= [[41, 42, 43, 44], [61, 62, 63, 64], [1, 2, 3, 4], [71, 72, 73, 74]]
Stratify:X_test= [[21, 22, 23, 24], [31, 32, 33, 34], [11, 12, 13, 14], [51, 52, 53, 54]]
Stratify:y_train= [1, 0, 1, 0]
Stratify:y_test= [0, 0, 1, 1]