亚洲免费视频播放,亚洲一级黄色视频,久久无码av亚洲精品色午夜

HBase（七） HBase JAVA API - Filter

網友投稿 924 2025-04-03

過濾器在get和scan的基礎上，進行進一步的過濾，如列名、具體值等。Hbase提供了很多自帶的實現類，也可以自定義filter。

謂詞下推(predicate push down)，所有的過濾器都在服務端生效，所以過濾掉的數據不會傳到客戶端。使用者的自己的代碼實現也盡量不要做客戶端的過濾。

過濾器每region/scan一個實例

通用接口為org.apache.hadoop.hbase.filter.Filter，已有的接口實現中：

大部分實體過濾器類繼承自org.apache.hadoop.hbase.filter.FilterBase

還有一組繼承自org.apache.hadoop.hbase.filter.CompareFilter，比FilterBase多一個compare()方法

其他的接口實現可以參考Filter接口的API說明

CompareFilter需要兩個參數，一個是CompareFilter.CompareOp，即比較運算符；一個是WritableByteArrayComparable，即比較器。

語義上，比較過濾器是返回成功匹配的值，和hbase過濾器原有的目的（篩掉無用信息）不同

枚舉類型

WritableByteArrayComparable類，實現了org.apache.hadoop.io.Writable和Comparable

接口。Hbase自帶了幾個已實現的子類：

基于行健過濾數據，比較過程中，是按照字典順序排序的，比如篩選小于“row2”的行，會返回row1、row11、row100等，比較常見的避免這種語義上的差別的方法，就是存的時候補位數據。

Configuration conf = HBaseConfiguration.create();

HTable table = new HTable(conf, "t1");

Scan scan = new Scan(Bytes.toBytes("row0"), Bytes.toBytes("row9"));

Filter f1 = new RowFilter(CompareOp.LESS, new BinaryComparator(Bytes.toBytes("row2")));

scan.setFilter(f1);

ResultScanner rs = table.getScanner(scan);

for (Result r : rs) {

System.out.println(r);

}

rs.close();

System.out.println("===");

Filter f2 = new RowFilter(CompareOp.EQUAL, new RegexStringComparator("row[1,3]"));

scan.setFilter(f2);

ResultScanner rs2 = table.getScanner(scan);

for (Result r : rs2) {

System.out.println(r);

}

rs2.close();

System.out.println("===");

Filter f3 = new RowFilter(CompareOp.EQUAL, new SubstringComparator("ro"));

scan.setFilter(f3);

ResultScanner rs3 = table.getScanner(scan);

for (Result r : rs3) {

System.out.println(r);

}

rs3.close();

table.close();

與行過濾器使用方式類似，只不過用來比較列族

new FamilyFilter(CompareOp.LESS, new BinaryComparator(Bytes.toBytes("f2")));

過濾特定列

new QualifierFilter(CompareOp.LESS, new BinaryComparator(Bytes.toBytes("c2")));

常用的方式是與RegexStringComparator或SubstringComparator配合使用

new ValueFilter(CompareOp.EQUAL,new RegexStringComparator("v*2"));

指定一個列作為基準，過濾其他列，過濾條件是基準列的時間戳。這個過濾器是基于列值進行篩選的，也就是說，可以理解成一個ValueFilter和時間戳過濾器的組合。這個過濾器與scan.setBatch不兼容，因為可能會導致取不到基準列的值。

dropDependentColumn參數可以控制是否丟棄過濾掉的數據，從實測結果來看，基準列本身不會被查出來，除非dropDependentColumn=false

new DependentColumnFilter(Bytes.toBytes("f1"), Bytes.toBytes("c1"), true);

還有有幾種可選的構造函數，不同范圍的過濾。

Hbase第二類過濾器是繼承自FilterBase，部分過濾器只適用于scan，因為用在get上，會要么包含整行，要么都不包含

用一列的值判斷本行數據是否整體過濾掉。SingleColumnValueFilter使用了比較過濾器類似的參數風格，但是注意，并沒有繼承關系。

new SingleColumnValueFilter(Bytes.toBytes("f1"), Bytes.toBytes("c1"), CompareOp.EQUAL, new RegexStringComparator("v*1*"));

當參考列不存在時，默認這行是包含在結果中的，可以使用setFilterIfMissing方法排除。

默認檢查參考列的最新版本，可以使用setLatestVersionOnly(false)方法檢查所有版本。

繼承SingleColumnValueFilter，略不同的語義是參考列不被包含到結果中。

構造一個前綴，匹配前綴的行會返回客戶端。也是按字典順序查找。一般scan的時候使用。

new PrefixFilter(Bytes.toBytes("row1"));

會返回row1開頭的行

指定pageSize參數后，可以對結果進行分頁。其實就是過濾返回的行數，下一行的位置需要客戶端來維護。一次掃描的結果可能大于分頁大小，因為這個過濾器是分別作用于不同的regionserver的，并行執行不能共享他們現在的狀態和邊界，可能每個server上都獲取分頁大小的數據。所以客戶端程序要處理這種情況，如果需要的話。

Configuration conf = HBaseConfiguration.create();

HTable table = new HTable(conf, "t1");

final byte[] POSTFIX = new byte[] { 0x00 };

Filter filter = new PageFilter(2);

int totalRows = 0;

byte[] lastRow = null;

while (true) {

System.out.println("=======");

Scan scan = new Scan();

scan.setFilter(filter);

if (lastRow != null) {

// 加一個最小的增量new byte[] { 0x00 };

byte[] startRow = Bytes.add(lastRow, POSTFIX);

System.out.println("start row: " + Bytes.toStringBinary(startRow));

scan.setStartRow(startRow);

}

ResultScanner scanner = table.getScanner(scan);

int localRows = 0;

Result result;

while ((result = scanner.next()) != null) {

System.out.println(localRows++ + ": " + result);

totalRows++;

lastRow = result.getRow();

}

scanner.close();

if (localRows == 0)

break;

}

System.out.println("total rows: " + totalRows);

table.close();

針對只需要key的場景，這個過濾器可以只返回KV中的key，而把value覆寫成為空。

構造函數KeyOnlyFilter(boolean lenAsVal) 可以改變覆寫策略。無參構造函數默認為false，即覆寫為長度為0的字節數組，而設置為true時，value會被覆寫為原值長度的字節數組，這個長度可以用來做二次排序或其他場景。

這個過濾器只返回每行的第一個KV，排序是Hbase的隱式排序。

一般用在行數統計的場景，因為列式數據庫中，某行存在，則這一行必定有列。因為檢查完第一列的時候，過濾器框架就會通知region server結束本行的掃描，并跳到下一行，所以比全表掃描有很大的性能提升。

scan的范圍是[startrow, stoprow)，使用這個過濾器可以包含最后一行，同時也定義了scan的stoprow，如下面的代碼是從表開始，掃描到row2，且包含row2

Filter f = new InclusiveStopFilter(Bytes.toBytes("row2"));

scan.setFilter(f);

ResultScanner rs = table.getScanner(scan);

命名是時間戳，實際上是版本的控制，如下面代碼返回兩個特定版本的值

FilterList ts = Arrays.asList(new long [] {1,3});

FilterFilter f = new TimestampsFilter(ts);

Filter也支持和scan的setTimeRange方法聯合縮小范圍。

限制每行最多取回多少列，列數達到設定的值時，過濾器會停止整個掃描，所以一般不和scan配合使用，更適合get。列數可以直接在構造函數中設置

new ColumnCountGetFilter(2);

與PageFilter類似的功能，不過是在列上實現數目的限制返回。

ColumnPaginationFilter(int limit, int offset)

構造函數有兩個參數，就是返回偏移量在[offset, limit]的列。

與PrefixFilter類似，只不過作用在列上，返回所有前綴匹配的列

結果包含的行是隨機的。構造函數RandomRowFilter(float chance) 會傳入一個chance，取值范圍在0~1，內部是用了Java的Random.nextFloat()方法和chance的比較結果，來決定一行是否過濾掉，所以，如果chance<0則查詢結果全部過濾掉，而chance>1則會包含所有結果。

所以這個過濾器一般可以用于采樣，參數chance其實就是采樣比，數值越大，留下的數據越多。

這類過濾器采用裝飾者模式，可以裝飾在其他過濾器上使用。

包裝一個過濾器F，如果過濾器F檢查任何一個KV不滿足條件的時候，包裝成SkipFilter就會把這個KV的整行過濾掉。被包裝的過濾器必須實現filterKeyValue()方法，因為SkipFilter是判斷這個方法的結果來決定如何處理這一行的，所以和部分Filter不兼容。后面會有總結。

如下面代碼只會返回所有列值都大于value1的行：

Filter f1 = new ValueFilter(CompareOp.GREATER,new BinaryComparator(Bytes.toBytes("value1")));

Filter f = new SkipFilter(f1);

這個包裝后，一旦發現不符合包裝過濾器F的條件，就終止scan，這之前的結果回返回客戶端。下面的代碼，如果不加這個過濾器，會返回row2之外的所有行，加上之后，掃描到row2就停止了，所以只會掃描row2之前的行。

Filter f1 = new RowFilter(CompareOp.NOT_EQUAL,new BinaryComparator(Bytes.toBytes("row2")));

Filter f = new WhileMatchFilter(f1);

HBase（七） HBase JAVA API - Filter

自定義Filter一般繼承FilterBase類，也可以繼承Filter接口，前者把后者所有的方法提供了默認實現，按需覆寫即可。

Filter接口中有個枚舉Filter.ReturnCode，被filterKeyValue()方法用于通知執行框架，決定如何執行下一步。

Filter接口定義了若干方法，在客戶端的檢索操作的不同階段調用，按下面順序執行：

1. filterRowKey(byte[],int,int):返回true，則丟棄此行。

2. filterKeyValue(KeyValue):上面沒有被過濾掉，檢查KeyValue按照 Filter.ReturnCode處理當前值

3. filterRow(List

kvs): 讓用戶可以訪問上兩個方法篩選后的KV實例。DependentColumnFilter過濾器用這個方法來過濾與基準列不匹配的數據。

4. filterRow():最后一道判斷是否過濾掉行。PageFilter使用當前方法檢查一次迭代分頁中返回的行數是否達到預期分頁大小，如果達到返回true。默認返回false，即結果包含當前行。

5. reset() :迭代掃描中，為每個新行重置過濾器。服務端讀一行數據后，此方法被隱式調用。

6. filterAllRemaining():返回true，則整個scan結束。返回false繼續執行，主要用戶提前結束的優化場景

注意，使用filterRow(List

kvs)或filterRow()，必須重載hasRowFilter()方法，并返回true。框架用這個標志保證過濾器和scan操作的各個參數的兼容。當掃描使用batch時，之前方法不會在每次batch操作時調用，而是在當前行數據結束時被調用。

自定義Filter編譯成jar包后，上傳到region server上，并在hbase-env.sh的HBASE_CLASSPATH配置上jar包的路徑。重啟hbase生效。

代碼樣例：

public class CustomFilter extends FilterBase {

private byte[] value = null;

private boolean filterRow = true;

public CustomFilter(byte[] value) {

// 設置要比較的值

this.value = value;

}

@Override

public void reset() {

// 每個新行都重置

this.filterRow = true;

}

@Override

public ReturnCode filterKeyValue(KeyValue kv) {

if (Bytes.compareTo(value, kv.getValue()) == 0) {

// 策略是先包含進來，在filterRow判斷是否過濾掉行

filterRow = false;

}

return ReturnCode.INCLUDE;

}

@Override

public boolean filterRow() {

return filterRow;

}

@Override

public void write(DataOutput dataOutput) throws IOException {

// 設定值寫入DataOutput中，服務端實例化Filter時可以讀到要比較的這個value

Bytes.writeByteArray(dataOutput,this.value);

}

@Override

public void readFields(DataInput dataInput) throws IOException {

// 服務端用這個方法初始化Filter實例，比較值設定進來

this.value = Bytes.readByteArray(dataInput);

}

public static void main(String[] args) throws IOException {

Configuration conf = HBaseConfiguration.create();

HTable table = new HTable(conf, "t1");

Scan scan = new Scan(Bytes.toBytes("row0"), Bytes.toBytes("row9"));

Filter filter = new CustomFilter(Bytes.toBytes("value1"));

scan.setFilter(filter);

ResultScanner rs = table.getScanner(scan);

for (Result r : rs) {

System.out.println(r);

}

table.close();

}

運行時報錯//TODO

2015-09-10 11:30:08,588 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 11.13.1.30

java.io.IOException: Error in readFields

FilterList也實現了Filter接口，所以使用方式相同。但是FilterList提供的是一種多個過濾器組合的方式使用。有幾種構造函數

FilterList(Filter... rowFilters)

FilterList(FilterList.Operator operator)

FilterList(FilterList.Operator operator, Filter... rowFilters)

FilterList(FilterList.Operator operator, List

rowFilters)

FilterList(List

rowFilters)

核心的參數其實就是兩個，一個是組合邏輯FilterList.Operator，一個是需要組合的filter集合。FilterList.Operator是個枚舉類型，默認是FilterList.Operator.MUST_PASS_ALL，即所有過濾器都要通過才保留結果。可以改為FilterList.Operator.MUST_PASS_ONE。

可以控制List中的Filter添加順序去保證過濾器的執行順序，如使用ArrayList就可以精準的控制過濾器執行順序是添加順序。

[a] Filter supports Scan.setBatch(), i.e., the scanner batch mode.

[b] Filter can be used with the decorating SkipFilter class.

[c] Filter can be used with the decorating WhileMatchFilter class.

[d] Filter can be used with the combining FilterList class.

[e] Filter has optimizations to stop a scan early, once there are no more matchingrows ahead.

[f] Filter can be usefully applied to Get instances.

[g] Filter can be usefully applied to Scan instances.

[h] Depends on the included filters.

轉載請注明出處：華為云博客 https://portal.hwclouds.com/blogs

hbase

標簽：HBase

HBase 2.0 中的 In-Memory Compaction">HBase 2.0 中的 In-Memory Compaction

924 2025-04-03

HBase（十）架構和內部實現">HBase（十）架構和內部實現

924 2025-04-03

Hbase（一）簡介

924 2025-04-03

HBase（七） HBase JAVA API - Filter

HBase 2.0 中的 In-Memory Compaction">HBase 2.0 中的 In-Memory Compaction

HBase（十）架構和內部實現">HBase（十）架構和內部實現

Hbase（一）簡介

推薦文章

企業生產管理是什么，企業生產管理軟件

進盤點進銷存軟件排行榜前十名

進銷存系統哪個簡單好用？進銷存系統優點

工廠生產管理（工廠生產管理流程及制度）

生產管理軟件，機械制造業生產管理，制造業生產過程管理軟件

進銷存軟件和ERP有什么區別？進銷存與erp軟件理解

進銷存如何進行庫存管理

如何利用excel制作銷售訂單管理系統？

數據庫訂單管理系統有哪些功能？數據庫訂單管理系統怎么設計？

什么是數據庫管理系統？

最近發表

熱評文章

零代碼開發是什么？2022低代碼平臺排行榜">零代碼開發是什么？2022低代碼平臺排行榜

進銷存庫存管理 系統（智慧進銷存）">智能進銷存庫存管理系統（智慧進銷存）

在線文檔哪家強？8款在線文檔編輯軟件推薦">在線文檔哪家強？8款在線文檔編輯軟件推薦

WPS2016怎么繪制簡單的價格表?

系統的功能有哪些？餐飲服務系統的構成及工作程序">連鎖餐飲管理系統的功能有哪些？餐飲服務系統的構成及工

進銷存庫存管理盤點">簡單進銷存庫存管理盤點

友情鏈接