大数据学习笔记-hbase中的简单shell命令操作

hbase的shell操作：进入hbase shell命令行界面：[kgg@hadoop201 hbase]$ bin/hbase shell表的操作list查看表hbase(main):001:0> listTABLE0 row(s) in 0.1380 secon

airleaya

374人浏览 · 2020-10-15 20:50:58

airleaya · 2020-10-15 20:50:58 发布

hbase的shell操作：

进入hbase shell命令行界面：

[kgg@hadoop201 hbase]$ bin/hbase shell

表的操作

list

查看表

hbase(main):001:0> list
TABLE                                                                                                         
0 row(s) in 0.1380 seconds

create
- 创建表，创建的时候，需要指定表面和列族且起码要有一个列族，同时可以指定表的属性，但是表的属性要指定在列族上
```
hbase(main):002:0> create 'student','info'
0 row(s) in 1.3190 seconds
```

desc

查看表信息

hbase(main):003:0> desc 'student'
Table student is ENABLED                                                                                      
student                                                                                                       
COLUMN FAMILIES DESCRIPTION                                                                                   
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', D
ATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'tru
e', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                           
1 row(s) in 0.0930 seconds

disable
- 停用表，主要在对表进行一些维护时防止客户端继续写入数据。一半在删除表前都必须停用表，对列族进行修改的时候也需要停用表
```
hbase(main):006:0> disable 'student'
0 row(s) in 2.2530 seconds

hbase(main):007:0> desc 'student'
Table student is DISABLED 
```
enable
- 和停用表类似。enable ‘表名’用来启用表，is_enabled ‘表名’用来判断一个表是否被启用
- enable_all ‘正则表达式’可以通过正则来过滤表，启用复合条件的表
```
hbase(main):008:0> enable 'student'
0 row(s) in 1.2420 seconds

hbase(main):009:0> desc 'student'
Table student is ENABLED   
```

exists

判断表是否存在

hbase(main):010:0> exists 'student'
Table student does exist                                                                                      
0 row(s) in 0.0100 seconds

count

扫描表的数据量

hbase(main):011:0> count 'student'
0 row(s) in 0.0550 seconds

=> 0

drop

删除表

删除表前，需要将其disable

hbase(main):012:0> disable 'student'
0 row(s) in 2.2270 seconds

hbase(main):013:0> drop 'student'
0 row(s) in 1.2430 seconds

hbase(main):014:0> exists 'student'
Table student does not exist                                                                                  
0 row(s) in 0.0100 seconds

truncate

清空表

hbase(main):034:0> truncate 'student'
Truncating 'student' table (it may take a while):
 - Disabling table...
 - Truncating table...
0 row(s) in 3.3460 seconds

hbase(main):036:0> scan 'student'
ROW                          COLUMN+CELL                                                                      
0 row(s) in 0.1210 seconds

get_split

获取表对应得region个数

hbase(main):043:0> get_splits 'student'
Total number of splits = 1

=> []

alter

修改表的属性，通常时修改某个列族的属性

hbase(main):044:0> alter 'student',{NAME=>'info',VERSIONS=>'5'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9000 seconds

数据操作

put

put可以新增记录，也可以为记录设置属性，主要用法如下：

put ‘表名’, ‘行键’, ‘列名’, ‘值’

put ‘表名’, ‘行键’, ‘列名’, ‘值’,时间戳

put ‘表名’, ‘行键’, ‘列名’, ‘值’, { ‘属性名’ => ‘属性值’}

put ‘表名’, ‘行键’, ‘列名’, ‘值’,时间戳, { ‘属性名’ =>‘属性值’}

-- 先把删除的student表创建出来
hbase(main):021:0> create 'student','info'
0 row(s) in 1.2370 seconds

=> Hbase::Table - student
hbase(main):022:0> put 'student','1001','info:name','Nick'
0 row(s) in 0.0570 seconds

hbase(main):023:0> put 'student','1001','info:sex','male'
0 row(s) in 0.0070 seconds

hbase(main):024:0> put 'student','1001','info:age','18'
0 row(s) in 0.0040 seconds

hbase(main):025:0> put 'student','1002','info:name','Janna'
0 row(s) in 0.0040 seconds

hbase(main):026:0> put 'student','1002','info:sex','female'
0 row(s) in 0.0040 seconds

hbase(main):027:0> put 'student','1002','info:age','20'
0 row(s) in 0.0180 seconds

scan
- scan命令可以按照rowkey的字典顺序来遍历指定的表的数据
- scan的主要用法如下一些：
  - scan ‘表名’：默认当前表的所有列族。
  - scan ‘表名’,{COLUMNS=> [‘列族:列名’],…} ：遍历表的指定列
  - scan ‘表名’, { STARTROW => ‘起始行键’, ENDROW => ‘结束行键’ }：指定rowkey范围。如果不指定，则会从表的开头一直显示到表的结尾。区间为左闭右开。
  - scan ‘表名’, { LIMIT => 行数量}：指定返回的行的数量
  - scan ‘表名’, {VERSIONS => 版本数}：返回cell的多个版本
  - scan ‘表名’, { TIMERANGE => [最小时间戳, 最大时间戳]}：指定时间戳范围注意：此区间是一个左闭右开的区间，因此返回的结果包含最小时间戳的记录，但是不包含最大时间戳记录
  - scan ‘表名’, { RAW => true, VERSIONS => 版本数} 显示原始单元格记录，在Hbase中，被删掉的记录在HBase被删除掉的记录并不会立即从磁盘上清除，而是先被打上墓碑标记，然后等待下次major compaction的时候再被删除掉。注意RAW参数必须和VERSIONS一起使用，但是不能和COLUMNS参数一起使用。
  - scan ‘表名’, { FILTER => “过滤器”} and|or { FILTER => “过滤器”}: 使用过滤器扫描
```
hbase(main):028:0> scan 'student'
ROW                          COLUMN+CELL                                                                      
 1001                        column=info:age, timestamp=1602764954114, value=18                               
 1001                        column=info:name, timestamp=1602764954060, value=Nick                            
 1001                        column=info:sex, timestamp=1602764954089, value=male                             
 1002                        column=info:age, timestamp=1602764955707, value=20                               
 1002                        column=info:name, timestamp=1602764954132, value=Janna                           
 1002                        column=info:sex, timestamp=1602764954149, value=female                           
2 row(s) in 0.0300 seconds
```
  注意：虽然我们看似往表中放了多次数据且scan出不止一行的输出，但是hbase通过rowkey来鉴别行，所有rowkey相同的数据都是同一行的数据

get

get支持scan支持的大部分属性，如COLUMNS，TIMERANGE，VERSIONS，FILTER

hbase(main):030:0> get 'student','1001','info:name'
COLUMN                       CELL                                                                             
 info:name                   timestamp=1602764954060, value=Nick                                              
1 row(s) in 0.0060 seconds

delete

删除某行数据

hbase(main):006:0> deleteall 'student','1001'
0 row(s) in 0.1870 seconds

hbase(main):007:0> scan 'student'
ROW                          COLUMN+CELL                                                                      
 1002                        column=info:age, timestamp=1602765806166, value=20                               
 1002                        column=info:name, timestamp=1602765805624, value=Janna                           
 1002                        column=info:sex, timestamp=1602765805640, value=female                           
1 row(s) in 0.0490 seconds

删除指定列的数据

hbase(main):008:0> delete 'student','1002','info:sex'
0 row(s) in 0.0130 seconds

hbase(main):009:0> scan 'student'
ROW                          COLUMN+CELL                                                                      
 1002                        column=info:age, timestamp=1602765806166, value=20                               
 1002                        column=info:name, timestamp=1602765805624, value=Janna                           
1 row(s) in 0.0100 seconds

技术共进，成长同行——讯飞AI开发者社区

更多推荐

PHP与人工智能：结合案例与可能性探索

讯飞AI开发者社区

通用人工智能(AGI)发展现状：从科幻到现实的跨越

通用人工智能(AGI)正从科幻走向现实。2025年，多模态融合、递归推理引擎和能效革命三大技术突破推动AGI发展，国际科技巨头和中国企业加速布局。AI Agent在金融、医疗、教育等领域广泛应用，企业自动化效率显著提升。然而，数据隐私、算法透明度和就业替代等伦理挑战亟待解决。未来，AGI将向多模态量子计算融合、具身智能和世界模型方向发展，需要建立人机协作新模式和完善的政策法规框架。AGI既带来机遇

讯飞AI开发者社区

C++与人工智能框架

重新排列范围，使得指定位置的元素等于排序后的元素，并且左边的元素都不大于它，右边的元素都不小于它。算法的原理是 “覆盖” 要删除的元素，将保留的元素移到前面，返回新的逻辑尾迭代器，但。对范围内的每个元素应用一个函数，并将结果存储在另一个范围内。移除范围内连续的重复元素，返回新的逻辑结尾迭代器。旋转范围内的元素，使中间元素成为新的第一个元素。这些算法不会改变它们所操作的容器中的元素。这些算法会修改它