-
无线增值产品部
Hive
函数及语法说明
版本:
Hive
0.7.0.001
eagooqi
2011-7-19
版本
V1.0
日期
2010-07-20
2012-1-4
修订人
eagooqi
Eagooqi
描述
初稿
||
、
cube
目录
?
函数说明
.
..................................................
..................................................
......................
2
?
内置函数
.
...........................
..................................................
.....................................
2
?
p>
增加
oracle
函数
.
........................
..................................................
..........................
1
4
?
增加业务函数
.
................................................ .................................................. ......
1
7
扩展函数开发规范
.
..............................................
..................................................
........
1
9
语法说明
.
..................................................
..................................................
....................
2
1
?
内置语法
.
...........................
..................................................
...................................
2
1
?
增加语法
.
..................................................
..................................................
............
2
8
扩展语法开发规范(语法转换)
.<
/p>
........................................
........................................
2
9
ORACLE sql
对应的
hSQL
语法支持
............................................... ...............................
2
9
?
?
?
?
?
函数说明
参考链接:
/confluence/display/Hive/LanguageManual
CLI
下
,
使用以下命令显示最新系统函数说明
SHOW FUNCTIONS;
DESCRIBE
FUNCTION
DESCRIBE
FUNCTION EXTENDED
?
内置函数
数值函数
Mathematical Functions
返回类型
BIGINT
函数名称
(
参数
)
round(double a)
描述
返回
double
类型的四舍五入的
BIGINT
值
Returns the rounded BIGINT value of the
double
DOUBLE
round(double a,
int d)
返回
a
的<
/p>
d
进制的四舍五入值
Returns the double rounded to d decimal
places
返回小于等于
a
的最大
整数值
Returns the maximum
BIGINT value that is equal or
less than
the double
BIGINT
floor(double a)
BIGINT
ceil(double a),
ceiling(double
a)
返回
a
的最
小的大于等于
a
的整数值
Returns the minimum BIGINT value that
is equal or
greater than the double
Returns a random number
(that changes from row to
row) that is
distributed uniformly from 0 to 1.
Specifiying the seed will make sure the
generated
random number sequence is
deterministic.
double
rand(), rand(int
seed)
double
exp(double a)
自然指数函数
Returns e^a where e is
the base of the
natural logarithm
double
ln(double a)
自然对数函数
Returns the natural
logarithm of the
argument
double
log10(double a)
以
10
为底对数函数
Returns
the base-10 logarithm of
the argument
double
log2(double a)
以
2
为底对数函数
Returns
the base-2 logarithm of
the argument
double
log(double
base, double a)
对数函数
Return the
base
argument
幂运算函数
Return a^p
double
pow(double a,
double p)
power(double
a, double p)
double
sqrt(double a)
开平方函数
Returns the square root
of a
string
bin(BIGINT a)
二进制函数
Returns the number in
binary format (see
[
/doc/refman/5.0/en/string-
functi
#function_bin
])
string
hex(BIGINT a)
hex(string a)
十六进制函数
If the
argument is an int, hex returns
the
number as a string in hex format. Otherwise if the
number is a string, it converts each
character into its
hex representation
and returns the resulting string.
(see
[
/doc/refman/5.0/en/string-
functi
#function_hex
])
string
unhex(string a)
反转十六进制函数
Inverse of hex.
Interprets each
pair of characters as a
hexidecimal number and
converts to the
character represented by the number.
string
conv(BIGINT
num, int
from_base, int
to_base)
进制转换函数
Converts a
number from a given base
to another
(see
[
/doc/refman/5.0/en/mat
hematic
#function_conv
])
绝对值函数
Returns the
absolute value
正取余函数
Returns the positive
value of a mod b
double
int double
abs(double a)
pmod(int a, int
b)
pmod(double
a, double b)
double
double
sin(double a)
asin(double a)
正弦函数
Returns the sine of a (a
is in radians)
反正弦函数
Returns the arc sin of
x if -1<=a<=1 or
null otherwise
double
double
cos(double a)
acos(double a)
余弦函数
Returns the cosine of a
(a is in radians)
反余弦函数
Returns the arc cosine
of x if -1<=a<=1 or
null otherwise
int double
positive(int a)
positive(double
a)
positive
函数
Returns
a
int double
negative(int a)
negative(dou
bl
negative
函数
Retur
ns -a
e a)
集合函数
Collection Functions
返回类
型
int
size(Map
)
int
size(Array
返
回
Array
类型的元素个数
返回
Map
类型的元素个数
函数名称
(
参数
)
描述
类型转换函数
Type Conversion
Functions
返回类型
函数名称
(
参数
)
cast(expr as
描述
转换
expr
为
类型
例如
.
cast('1' as
BIGINT)
转换
string '1'
为整数
.
如果转换失败
返回
null .
日期函数
Date Functions
string
from_unixtime(bigint
unixtime[, string
format])
Converts the
number of seconds from unix epoch
(1970-01-01 00:00:00 UTC) to a string
representing the timestamp of that
moment in the
current system time zone
in the format of
bigint
unix_timestamp()
Gets current time stamp using the
default time
zone.
bigint
unix_timestamp(string
date)
Converts time string in
format
yyyy-MM-dd
HH:mm:ss
to Unix time stamp,
return 0 if fail:
unix_timestamp('2009-03-20 11:30:01') =
1237573801
bigint
unix_timestamp(string
date,
string pattern)
Convert
time string with given pattern (see
[
p>
/j2se/1.4.2/docs/api/java/text/
]) to Unix time stamp,
return 0 if fail:
unix_timestamp('2009-03-20',
'yyyy-MM-
dd') = 1237532400
string
to_date(string
timestamp)
int
year(string date)
Returns the date part of a
timestamp string:
to_date(
Returns the year part of a date or a
timestamp
string:
year(
year(
int
month(string date)
Returns the month part of a
date or a timestamp
string:
month(
month(
int
day(string date)
dayofmonth(date)
Return the day part of a date or a
timestamp
string:
day(
day(
int
hour(string date)
Returns the hour of the
timestamp:
hour('2009-07-30 12:58:59')
= 12, hour('12:58:59')
= 12
int
int
int
minute(string date)
second(string date)
weekofyear(string date)
Returns the minute of the timestamp
Returns the second of the
timestamp
Return the week
number of a timestamp string:
weekofyea
r(
weekofyear(
int
datediff(string enddate,
string startdate)
int
date_add(string
startdate,
int days)
int
date_sub(string
startdate,
int days)
Return the number
of days from startdate to
enddate:
datediff('2009-03-01', '2009-02-27') = 2
Add a number of days to
startdate:
date_add('2008-12-31', 1) =
'2009-01-01'
Subtract a
number of days to startdate:
date_sub('2008-12-31', 1) =
'2008-12-30'
条件判断函数
Conditional Functions
T
T
T
T
CASE a WHEN b THEN c [WHEN d
THEN e]* [ELSE f] END
CASE WHEN a THEN b [WHEN c
THEN d]* [ELSE e] END
if(boolean testCondition, T
valueTrue, T valueFalseOrNull)
COALESCE(T v1, T v2, ...)
Return valueTrue when
testCondition is
true, returns
valueFalseOrNull otherwise
Return the first v that is not NULL, or
NULL
if all v's are NULL
When a = b, returns c; when a = d,
return e;
else return f
When a = true, returns b; when c =
true,
return d; else return e
字符串函数
String
Functions
返回类型
int
string
string
< br>函数名称
(
参数
)
length(string A)
reverse(string A)
concat(string A,
string B...)
描述
返回字符串
A
的长度
返回
A
的反序字符串
按照参数顺序返回连接的字符串
.
例如
.
concat('foo',
'bar')
返回
'foobar'.
注
:
该函数
接受任意长度参数
.
string
concat_ws(string
SEP, string
A, string
B...)
string
substr(string A, int
start)
substring(string A,
int
start)
string
substr(string A, int
start,
int len)
substring(string A,
int start, int len)
返回
A
的子串从
start
位置开始直到结尾,
例如
.
substr('foobar', 4)
返回
'bar' (
参见
[
/doc/refman/5.0/en/stri
#function_substr
])
返回
A
的子串从
start
位置开始直到
len
长
度
例如
substr('foobar', 4, 1)
返回
'b'
(
参见
[
/
doc/refman/5.0/en/stri
#function_substr<
/p>
])
如同
concat(),
但间隔连接自定义的分隔符
SEP.
string
upper(string A)
ucase(string
A)
Returns the string
resulting from converting
all
characters of A to upper case e.g.
upper('fOoBaR') results in 'FOOBAR'
string
lower(string A)
lcase(string
A)
Returns the string
resulting from converting
all
characters of B to lower case e.g.
lower('fOoBaR') results in 'foobar'
string
trim(string A)
Returns the string resulting from
trimming
spaces from both ends of A
e.g. trim(' foobar
') results in
'foobar'
string
ltrim(string A)
Returns the string resulting from
trimming
spaces from the beginning(left
hand side) of
A e.g. ltrim(' foobar ')
results in 'foobar '
string
rtrim(string A)
Returns the string resulting from
trimming
spaces from the end(right hand
side) of A
e.g. rtrim(' foobar ')
results in ' foobar'
string
regexp_replace(strin
g A,
string B, string
C)
Returns the string resulting from
replacing
all substrings in B that
match the Java
regular expression
syntax(See Java regular
expressions
syntax) with C e.g.
regexp_replace(
'fb.' Note
that some care is necessary in
using
predefined character classes: using
's'
as the second argument will match the
letter s; '
s' is necessary
to match whitespace, etc.
string
regexp_extract(strin
g
subject, string
pattern, int index)
Returns the string
extracted using the
pattern. e.g.
regexp_extract('foothebar',
'foo(.*?)(bar)', 2) returns 'bar.' Note
that
some care is necessary in using
predefined
character classes: using 's'
as the second
argument will match the
letter s; '
s' is necessary to match
whitespace, etc.
The 'index' parameter
is the Java regex
Matcher group()
method index. See
docs/api/java/util/regex/ for
more information on the 'index' or Java
regex group() method.
string
parse_url(string
urlString, string
partToExtract [,
string
keyToExtract])
Returns the
specified part from the URL.
Valid
values for partToExtract include
HOST,
PATH, QUERY, REF, PROTOCOL,
AUTHORITY,
FILE, and USERINFO. e.g.
parse_url('/pa
th1/?
k1=v1&k2=v2#Ref1', 'HOST') returns
''. Also a value of a particular
key in QUERY can be extracted by
providing
the key as the third
argument, e.g.
parse_url('/path1/?
k1=v1&k2=v2#Ref1', 'QUERY', 'k1') returns
'v1'.
string
get_json_object(strin
g
json_string, string
path)
Extract json object from a json string
based
on json path specified, and
return json string
of the extracted
json object. It will return null
if the
input json string is invalid.
NOTE: The
json path can only have the characters
[0-9a-z_], i.e., no upper-case or
special
characters. Also, the keys
*cannot
start
with numbers.*
This is due to restrictions on
Hive
column names.
string
string
space(int n)
repeat(string str, int
n)
返回
n
个空格字符串
返回
n
次重复
str
连接的字符串
int
string
ascii(string str)
lpad(string str, int
len,
string pad)
返回
st
r
的首个字符的
ascii
码对应数字
Returns str,
left-padded with pad to a length
of len
Returns str, right-padded
with pad to a
length of len
按照
pat
正则拆分
str
(pat
是正则表达式
)
string
rpad(string str, int
len,
string pad)
array
split(string str, string
pat)
int
find_in_set(string str,
string strList)
Returns the first occurance of str in
strList
where strList is a comma-
delimited string.
Returns null if
either argument is null.
Returns 0 if
the first argument contains any
commas.
e.g. find_in_set('ab',
'abc,b,ab,c,def') returns 3
<
br>y <
br>} <
br>函数名称
array
ring>>
sentences(string str,
string lang, string
locale)
Tokenizes a string of
natural language text
into words and
sentences, where each
sentence is
broken at the appropriate
sentence
boundary and returned as an array
of
words. The 'lang' and 'locale' are optional
arguments. e.g. sentences('Hello there!
How
are you?') returns (
(
array
ring,double>>
ngrams(array
K, int pf)
Returns the top-k N-grams from a set of
tokenized sentences, such as those
returned by the sentences() UDAF. See
[Hive-StatisticsAndDataMining] for more
information.
array
ring,double>>
context_ngrams(arra
array
int pf)
Returns the top-k contextual N-grams
from a
set of tokenized sentences,
given a string of
[Hive-
StatisticsAndDataMining] for more
information.
其他函数
Misc. Functions
xpath
The
following functions are described in [Hive-
LanguageManual-XPathUDF]:
?
xpath, xpath_short, xpath_int,
xpath_long, xpath_float, xpath_double,
xpath_number,
xpath_string
get_json_object
A limited
version of JSONPath is supported:
?
?
?
?
$$ : Root object
. : Child
operator
[] : Subscript operator for
array
* : Wildcard for []
Syntax not supported that's worth
noticing:
?
?
: Zero length
string as key
.. : Recursive descent
?
?
?
?
?
@ : Current
object/element
() : Script expression
?() : Filter (script) expression.
[,] : Union operator
[start:] : array slice operator
Example: src_json table is a single
column (json), single row table:
json
{
:
<
/p>
{
:[{
:8,
:
},{
:9,
:
}],
:{
:19.95,
:
}
},
:
,
:
}
The fields of the json
object can be extracted using these queries:
hive>
SELECT
get_json_object(src_,
'$$.owner')
FROM
src_json;
amy
hive>
SELECT
get_json_object(src_,
'$$.[0]')
FROM
src_json;
{
:8,
:
hive> SELECT
get_json_object(src_, '$$.non_exist_key') FROM
src_json;
NULL
内置聚合函数
Built-in Aggregate
Functions (UDAF)
返回类型
bigint
(
参数
)
count(*), count(expr),
count(DISTINCT expr[,
expr_.])
描述
count(*)
–
返回总行数,包括空
值
NULL;
count(expr) - Returns
the number of
rows for which
the supplied expression
is
non-NULL; count(DISTINCT
expr[, expr]) - Returns the
number of rows for which the
supplied expression(s) are
unique and non-NULL.
double
sum(col),
sum(DISTINCT col)
Returns the sum of the
elements in the group or the
sum of the distinct values of
the column in the group
double
avg(col),
avg(DISTINCT
col)
Returns the average of the
elements in the group or the
average of the distinct values
of the column in the group
double
min(col)
Returns the minimum of the
column in the group
double
max(col)
Returns the maximum value
of
the column in the group
double
var_pop(col)
Returns the variance of a
numeric column in the group
double
var_samp(col)
Returns the unbiased sample
variance of a numeric column
in the group
double
stddev_pop(col)
Returns the standard deviation
of a numeric column in the
group
double
stddev_samp(col)
Returns the unbiased sample
standard deviation of a
numeric column in the group
double
covar_pop(col1, col2)
Returns the population
covariance of a pair of numeric
columns in the group
double
covar_samp(col1, col2)
Returns the sample covariance
of a pair of a numeric columns
in the group
double
corr(col1, col2)
Returns the Pearson
coefficient of correlation of a
pair of a numeric columns in
the group
double
percentile(col, p)
Returns the exact p^th^
percentile of an integer column
in the group (does not work
with floating point types). p
must be between 0 and 1.
array
percentile(col,
array(p~1,,
[, p,,2,,]...))
Returns the
exact percentiles
p,,1,,, p,,2,,, ...
of an integer
column in the group (does
not
work with floating point types).
p,,i~ must be between 0 and 1.
double
percentile_approx(col, p
[,
B])
Returns an approximate
p^th^
percentile of a numeric column
(including floating point types)
in the group. The B parameter
controls approximation
accuracy at the cost of
memory. Higher values yield
better approximations, and the
default is 10,000. When the
number of distinct values in col
is smaller than B, this gives an
exact percentile value.
array
percentile_approx(col,
array(p~1,, [, p,,2_]...) [,
B])
Same as
above, but accepts
and returns an array
of
percentile values instead of a
single one.
array
'x','y'
}>
histogram_numeric(col,
b)
Computes a histogram of a
numeric column in the group
using b non-uniformly spaced
bins. The output is an array of
size b of double-valued (x,y)
coordinates that represent the
bin centers and heights
array
collect_set(col)
Returns a set of objects with
duplicate elements eliminated
?
p>
增加
oracle
函数
返回类型
T
函数名称
(
参数
)
Nvl(T1,T2)
描述
如果
T
1
为空,返回
T2
,否则返回
T1
;
其中
T1
和
T2
的类型必须一样。
参数类型同返回值
的类型一致。
(
若
参数类型不
一致,
函数的行为不可预见,
可能执
行正确,
可能执行失败
)
String
s1 || s2
如同
concat(s1,s2)
T
nvl2(T1,T2,T3)
T1
为
NULL
,
则返回
T2
,
否则返回
T3
;
其中
T1
、
T2
、
T3
的类型必须一样。
参
数类型同返回值的类型一致。
T
decode (T cond, T
params[])
判断
cond
是
否与
params
的基数位置
的值是否
相等,是则返回偶数位置值;
否则返回
params
最后位置的值。
string
systimestamp
返回当前的系统日期、
时间
(不支持
时区);输
出时间格式为:
yyyy-MM-dd
hh24:mi:ss:ff3
,当
前仅支持
24
小时制
< br>
string
sysdate
返回当前的系统日期
,
输
出时间格式为:
yyyy-MM-dd
int
INSTR((strin
g,str[,start][
,appea
r])
返回
string
中
str
出现的位置。
start
代表开始搜
索的位置,可选参数,默
认为
1
(
1
表
示第一个
字符位置);
appear
指示搜索第几
次出现的,可
选参数,默认为
1
;若没有匹配到,
返回
0
。(
start
和
appear
取值为大于
0
的整数)
int/long/f
loa
t/double
Mod
(
n1,n2
)
返回一个
n1
除以
n2
的余数。
< br>支持的
类型有
Byte
、
Short
、
In
teger
、
Long
、
Float
、
Double
。
返回值的正负和
n1
相关。使用此函
数需要注意
的
2
个问题:
1
、
对小数执行
mod
计算可能会产生
精度丢失,
(如
mod(3.97384626
,3.14159265
35897384
627
,返回结果为
0.0)
)此问题由
java
环境本身导致,若
需要高精度
的浮点计算,慎用此函数;
2
、
传入比
MAX_LONG
还大的整数作为
参数,
则参数会被自动升级成
Double
类
型,函数也可
以正常计算结果,
但返回的结果是小
数类型。
int
Bitand(expr1,expr2)
指定按位进行
AND
运算的两个数值。
当前只
支持
Byte
、
Shor
t
、
Integer
、
Long
类
型。如果
expr1
和
expr2
的位都是
1<
/p>
,
相应的结果位就
是
1;
否
则相应的结果位是
0
。若
expr1
和
expr2
类型不
一样,直接进行
and
计算
string
next_day(date,day)
计算给出日期
date
之后的下一个星
期
day
的日
期。
day
是数字,
1-7
分别表示星期
日
-
六;返
回日期的格式为“
YYYY-MM-
DD
”
int
to_number(string)
将给出的字符转换为数字;
string
必须为全数字
串。
(
Oracle
中的
to_number
很复
杂,可变参数
且支持多种类型。
当前仅支持整数类
型,包括
short
、
int
、
long
)
string
to_char(date,format)
将日期
date
转化为一个字符串;
date
的格式固
定为
yyyy-mm-dd hh24:mi:ss:ff3<
/p>
,
输出的格式由
format
指定。
format
当前支持的格
式如下(不
区分大小写):
yyyymmdd
,
年月日;
yyyymm
,年月;
mm
,月
dd
,日
yyyy-mm-dd
yyyy-mm
yyyymmddhh24miss
,年月日时分秒
(
p>
24
小时制)
yyyy-mm-dd hh24:mi:ss
hh24miss
yyyymmddhh24missff3
,年月日时分
秒毫秒(
24
小时制)
-
-
-
-
-
-
-
-
-
上一篇:PS术语及操作指令中英对照
下一篇:全新版大学英语综合教程4课后词汇填空及翻译