关键词不能为空

当前您在: 主页 > 英语 >

Hive函数及语法说明

作者:高考题库网
来源:https://www.bjmy2z.cn/gaokao
2021-02-08 07:52
tags:

-

2021年2月8日发(作者:再看一眼)







无线增值产品部



Hive


函数及语法说明




版本:


Hive 0.7.0.001



eagooqi


2011-7-19



版本



V1.0





日期



2010-07-20


2012-1-4


修订人



eagooqi


Eagooqi


描述



初稿



||



cube


目录



?



函数说明



.


.................................................. .................................................. ......................


2



?



内置函数



.


........................... .................................................. .....................................


2



?



增加


oracle


函数


.


........................ .................................................. ..........................


1


4


?



增加业务函数



.

................................................ .................................................. ......


1


7


扩展函数开发规范



.


.............................................. .................................................. ........


1


9


语法说明



.


.................................................. .................................................. ....................


2


1


?



内置语法



.


........................... .................................................. ...................................


2


1


?



增加语法



.


.................................................. .................................................. ............


2


8


扩展语法开发规范(语法转换)



.< /p>


........................................ ........................................


2


9


ORACLE sql


对应的


hSQL


语法支持

............................................... ...............................


2


9


?



?



?



?





?



函数说明




参考链接:



/confluence/display/Hive/LanguageManual



CLI



,


使用以下命令显示最新系统函数说明



SHOW FUNCTIONS;


DESCRIBE FUNCTION ;


DESCRIBE FUNCTION EXTENDED ;





?



内置函数



数值函数


Mathematical Functions



返回类型



BIGINT



函数名称

< p>
(


参数


)




round(double a)



描述




返回


double


类型的四舍五入的



BIGINT




Returns the rounded BIGINT value of the double



DOUBLE



round(double a,


int d)



返回


a


的< /p>


d


进制的四舍五入值



Returns the double rounded to d decimal places


返回小于等于


a


的最大 整数值



Returns the maximum BIGINT value that is equal or


less than the double


BIGINT



floor(double a)



BIGINT



ceil(double a),


ceiling(double


a)



返回


a


的最 小的大于等于


a


的整数值



Returns the minimum BIGINT value that is equal or


greater than the double



Returns a random number (that changes from row to


row) that is distributed uniformly from 0 to 1.


Specifiying the seed will make sure the generated


random number sequence is deterministic.



double



rand(), rand(int


seed)



double



exp(double a)



自然指数函数


Returns e^a where e is the base of the


natural logarithm



double



ln(double a)



自然对数函数


Returns the natural logarithm of the


argument



double



log10(double a)




10


为底对数函数


Returns the base-10 logarithm of


the argument



double



log2(double a)




2


为底对数函数


Returns the base-2 logarithm of


the argument



double



log(double


base, double a)



对数函数


Return the base


argument



幂运算函数


Return a^p



double



pow(double a,


double p)


power(double


a, double p)



double



sqrt(double a)



开平方函数


Returns the square root of a



string



bin(BIGINT a)



二进制函数


Returns the number in binary format (see


[


/doc/refman/5.0/en/string- functi


#function_bin


])



string



hex(BIGINT a)


hex(string a)



十六进制函数


If the argument is an int, hex returns


the number as a string in hex format. Otherwise if the


number is a string, it converts each character into its


hex representation and returns the resulting string.


(see


[


/doc/refman/5.0/en/string- functi


#function_hex


])



string



unhex(string a)



反转十六进制函数


Inverse of hex. Interprets each


pair of characters as a hexidecimal number and


converts to the character represented by the number.



string



conv(BIGINT


num, int


from_base, int


to_base)



进制转换函数


Converts a number from a given base


to another (see


[


/doc/refman/5.0/en/mat hematic


#function_conv


])



绝对值函数


Returns the absolute value



正取余函数


Returns the positive value of a mod b



double



int double



abs(double a)



pmod(int a, int


b) pmod(double


a, double b)



double



double



sin(double a)



asin(double a)



正弦函数


Returns the sine of a (a is in radians)



反正弦函数


Returns the arc sin of x if -1<=a<=1 or


null otherwise



double



double



cos(double a)



acos(double a)



余弦函数


Returns the cosine of a (a is in radians)



反余弦函数


Returns the arc cosine of x if -1<=a<=1 or


null otherwise



int double



positive(int a)


positive(double


a)



positive


函数


Returns a



int double



negative(int a)


negative(dou bl


negative


函数


Retur ns -a



e a)



集合函数


Collection Functions



返回类





int



size(Map


)



int



size(Array)



返 回


Array


类型的元素个数




返回


Map


类型的元素个数




函数名称


(


参数


)




描述




类型转换函数


Type Conversion Functions


返回类型







函数名称


(


参数


)




cast(expr as


)



描述




转换



expr





类型



例如


. cast('1' as


BIGINT)


转换



string '1'


为整数


.


如果转换失败


返回



null .



日期函数


Date Functions


string



from_unixtime(bigint


unixtime[, string


format])




Converts the number of seconds from unix epoch


(1970-01-01 00:00:00 UTC) to a string


representing the timestamp of that moment in the


current system time zone in the format of




bigint



unix_timestamp()



Gets current time stamp using the default time


zone.



bigint



unix_timestamp(string


date)



Converts time string in format


yyyy-MM-dd


HH:mm:ss


to Unix time stamp, return 0 if fail:


unix_timestamp('2009-03-20 11:30:01') =


1237573801



bigint



unix_timestamp(string


date, string pattern)



Convert time string with given pattern (see


[


/j2se/1.4.2/docs/api/java/text/



]) to Unix time stamp,


return 0 if fail: unix_timestamp('2009-03-20',


'yyyy-MM- dd') = 1237532400



string



to_date(string


timestamp)



int



year(string date)



Returns the date part of a timestamp string:


to_date(



Returns the year part of a date or a timestamp


string: year(


year(



int



month(string date)



Returns the month part of a date or a timestamp


string: month(


month(



int



day(string date)


dayofmonth(date)



Return the day part of a date or a timestamp


string: day(


day(



int



hour(string date)



Returns the hour of the timestamp:


hour('2009-07-30 12:58:59') = 12, hour('12:58:59')


= 12



int



int



int



minute(string date)



second(string date)



weekofyear(string date)



Returns the minute of the timestamp



Returns the second of the timestamp



Return the week number of a timestamp string:


weekofyea r(


weekofyear(



int



datediff(string enddate,


string startdate)



int



date_add(string


startdate, int days)



int



date_sub(string


startdate, int days)



Return the number of days from startdate to


enddate: datediff('2009-03-01', '2009-02-27') = 2



Add a number of days to startdate:


date_add('2008-12-31', 1) = '2009-01-01'



Subtract a number of days to startdate:


date_sub('2008-12-31', 1) = '2008-12-30'



条件判断函数


Conditional Functions


T




T




T




T




CASE a WHEN b THEN c [WHEN d


THEN e]* [ELSE f] END



CASE WHEN a THEN b [WHEN c


THEN d]* [ELSE e] END



if(boolean testCondition, T


valueTrue, T valueFalseOrNull)



COALESCE(T v1, T v2, ...)



Return valueTrue when testCondition is


true, returns valueFalseOrNull otherwise



Return the first v that is not NULL, or NULL


if all v's are NULL



When a = b, returns c; when a = d, return e;


else return f



When a = true, returns b; when c = true,


return d; else return e



字符串函数


String Functions


返回类型




int



string



string


< br>函数名称


(


参数


)




length(string A)



reverse(string A)



concat(string A,


string B...)



描述



返回字符串

A


的长度



返回


A


的反序字符串



按照参数顺序返回连接的字符串


.


例如


.


concat('foo', 'bar')


返回



'foobar'.



:

< p>
该函数


接受任意长度参数


.



string



concat_ws(string


SEP, string A, string


B...)



string



substr(string A, int


start)


substring(string A,


int start)



string



substr(string A, int


start, int len)


substring(string A,


int start, int len)



返回



A


的子串从


start


位置开始直到结尾,


例如


. substr('foobar', 4)


返回



'bar' (


参见



[


/doc/refman/5.0/en/stri


#function_substr


])



返回



A


的子串从


start


位置开始直到



len





例如



substr('foobar', 4, 1)


返回



'b' (


参见



[


/ doc/refman/5.0/en/stri


#function_substr< /p>


])



如同



concat(),


但间隔连接自定义的分隔符



SEP.



string



upper(string A)


ucase(string A)



Returns the string resulting from converting


all characters of A to upper case e.g.


upper('fOoBaR') results in 'FOOBAR'



string



lower(string A)


lcase(string A)



Returns the string resulting from converting


all characters of B to lower case e.g.


lower('fOoBaR') results in 'foobar'



string



trim(string A)



Returns the string resulting from trimming


spaces from both ends of A e.g. trim(' foobar


') results in 'foobar'



string



ltrim(string A)



Returns the string resulting from trimming


spaces from the beginning(left hand side) of


A e.g. ltrim(' foobar ') results in 'foobar '



string



rtrim(string A)



Returns the string resulting from trimming


spaces from the end(right hand side) of A


e.g. rtrim(' foobar ') results in ' foobar'



string



regexp_replace(strin


g A, string B, string


C)



Returns the string resulting from replacing


all substrings in B that match the Java


regular expression syntax(See Java regular


expressions syntax) with C e.g.


regexp_replace(


'fb.' Note that some care is necessary in


using predefined character classes: using


's' as the second argument will match the


letter s; '


s' is necessary to match whitespace, etc.



string



regexp_extract(strin


g subject, string


pattern, int index)



Returns the string extracted using the


pattern. e.g. regexp_extract('foothebar',


'foo(.*?)(bar)', 2) returns 'bar.' Note that


some care is necessary in using predefined


character classes: using 's' as the second


argument will match the letter s; '


s' is necessary to match whitespace, etc.


The 'index' parameter is the Java regex


Matcher group() method index. See


docs/api/java/util/regex/ for


more information on the 'index' or Java


regex group() method.


string



parse_url(string


urlString, string


partToExtract [,


string keyToExtract])



Returns the specified part from the URL.


Valid values for partToExtract include


HOST, PATH, QUERY, REF, PROTOCOL,


AUTHORITY, FILE, and USERINFO. e.g.


parse_url('/pa th1/?


k1=v1&k2=v2#Ref1', 'HOST') returns


''. Also a value of a particular


key in QUERY can be extracted by providing


the key as the third argument, e.g.


parse_url('/path1/?

< p>
k1=v1&k2=v2#Ref1', 'QUERY', 'k1') returns


'v1'.



string



get_json_object(strin


g json_string, string


path)



Extract json object from a json string based


on json path specified, and return json string


of the extracted json object. It will return null


if the input json string is invalid.


NOTE: The


json path can only have the characters


[0-9a-z_], i.e., no upper-case or special


characters. Also, the keys *cannot


start


with numbers.* This is due to restrictions on


Hive column names.



string



string



space(int n)



repeat(string str, int


n)



返回


n


个空格字符串




返回


n


次重复


str


连接的字符串



int



string



ascii(string str)



lpad(string str, int


len, string pad)



返回


st r


的首个字符的


ascii


码对应数字




Returns str, left-padded with pad to a length


of len



Returns str, right-padded with pad to a


length of len



按照



pat


正则拆分


str (pat


是正则表达式


)



string



rpad(string str, int


len, string pad)



array



split(string str, string


pat)



int



find_in_set(string str,


string strList)



Returns the first occurance of str in strList


where strList is a comma- delimited string.


Returns null if either argument is null.


Returns 0 if the first argument contains any


commas. e.g. find_in_set('ab',


'abc,b,ab,c,def') returns 3



array


ring>>



sentences(string str,


string lang, string


locale)



Tokenizes a string of natural language text


into words and sentences, where each


sentence is broken at the appropriate


sentence boundary and returned as an array


of words. The 'lang' and 'locale' are optional


arguments. e.g. sentences('Hello there! How


are you?') returns ( (




array


ring,double>>



ngrams(array


>, int N, int


K, int pf)



Returns the top-k N-grams from a set of


tokenized sentences, such as those


returned by the sentences() UDAF. See


[Hive-StatisticsAndDataMining] for more


information.



array


ring,double>>



context_ngrams(arra

< br>y>,


array, int K,


int pf)



Returns the top-k contextual N-grams from a


set of tokenized sentences, given a string of



[Hive- StatisticsAndDataMining] for more


information.


其他函数


Misc. Functions



xpath


The following functions are described in [Hive- LanguageManual-XPathUDF]:


?



xpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_number,


xpath_string



get_json_object


A limited version of JSONPath is supported:


?



?



?



?



$$ : Root object


. : Child operator


[] : Subscript operator for array


* : Wildcard for []


Syntax not supported that's worth noticing:


?



?



: Zero length string as key


.. : Recursive descent


?



?



?



?



?



@ : Current object/element


() : Script expression


?() : Filter (script) expression.


[,] : Union operator


[start:] : array slice operator


Example: src_json table is a single column (json), single row table:


json


{



:


< /p>


{



:[{



:8,



:


< p>
},{



:9,


< p>
:



}],




:{



:19.95,



:



}


},




:



,




:




}


The fields of the json object can be extracted using these queries:


hive>


SELECT


get_json_object(src_,


'$$.owner')


FROM


src_json;


amy



hive>


SELECT


get_json_object(src_,


'$$.[0]')


FROM


src_json;


{



:8,



:


< br>}



hive> SELECT get_json_object(src_, '$$.non_exist_key') FROM


src_json;


NULL


内置聚合函数


Built-in Aggregate Functions (UDAF)


返回类型




bigint


< br>函数名称


(


参数


)



count(*), count(expr),


count(DISTINCT expr[,


expr_.])



描述



count(*)




返回总行数,包括空



NULL; count(expr) - Returns


the number of rows for which


the supplied expression is


non-NULL; count(DISTINCT


expr[, expr]) - Returns the


number of rows for which the


supplied expression(s) are


unique and non-NULL.



double



sum(col),


sum(DISTINCT col)



Returns the sum of the


elements in the group or the


sum of the distinct values of


the column in the group



double



avg(col), avg(DISTINCT


col)



Returns the average of the


elements in the group or the


average of the distinct values


of the column in the group



double



min(col)



Returns the minimum of the


column in the group



double



max(col)



Returns the maximum value of


the column in the group



double



var_pop(col)



Returns the variance of a


numeric column in the group



double



var_samp(col)



Returns the unbiased sample


variance of a numeric column


in the group



double



stddev_pop(col)



Returns the standard deviation


of a numeric column in the


group



double



stddev_samp(col)



Returns the unbiased sample


standard deviation of a


numeric column in the group



double



covar_pop(col1, col2)



Returns the population


covariance of a pair of numeric


columns in the group



double



covar_samp(col1, col2)



Returns the sample covariance


of a pair of a numeric columns


in the group



double



corr(col1, col2)



Returns the Pearson


coefficient of correlation of a


pair of a numeric columns in


the group



double



percentile(col, p)



Returns the exact p^th^


percentile of an integer column


in the group (does not work


with floating point types). p


must be between 0 and 1.



array



percentile(col,


array(p~1,, [, p,,2,,]...))



Returns the exact percentiles


p,,1,,, p,,2,,, ... of an integer


column in the group (does not


work with floating point types).


p,,i~ must be between 0 and 1.



double



percentile_approx(col, p


[, B])



Returns an approximate p^th^


percentile of a numeric column


(including floating point types)


in the group. The B parameter


controls approximation


accuracy at the cost of


memory. Higher values yield


better approximations, and the


default is 10,000. When the


number of distinct values in col


is smaller than B, this gives an


exact percentile value.



array



percentile_approx(col,


array(p~1,, [, p,,2_]...) [,


B])



Same as above, but accepts


and returns an array of


percentile values instead of a


single one.



array


'x','y'


}>


histogram_numeric(col,


b)



Computes a histogram of a


numeric column in the group


using b non-uniformly spaced


bins. The output is an array of


size b of double-valued (x,y)


coordinates that represent the


bin centers and heights



array



collect_set(col)



Returns a set of objects with


duplicate elements eliminated




?



增加


oracle


函数


返回类型



T

< p>
函数名称


(


参数


)


Nvl(T1,T2)



描述



如果


T 1


为空,返回


T2


,否则返回


T1




其中


T1



T2


的类型必须一样。



参数类型同返回值 的类型一致。




参数类型不



一致,


函数的行为不可预见,


可能执


行正确,



可能执行失败





String


s1 || s2


如同


concat(s1,s2)


T


nvl2(T1,T2,T3)


T1



NULL



则返回


T2



否则返回


T3




其中


T1



T2



T3


的类型必须一样。




数类型同返回值的类型一致。




T


decode (T cond, T params[])


判断


cond


是 否与


params


的基数位置


的值是否 相等,是则返回偶数位置值;


否则返回


params

< p>
最后位置的值。



string


systimestamp


返回当前的系统日期、

< p>
时间


(不支持


时区);输



出时间格式为:


yyyy-MM-dd


hh24:mi:ss:ff3


,当



前仅支持


24


小时制

< br>



string


sysdate


返回当前的系统日期


,




出时间格式为:


yyyy-MM-dd



int


INSTR((strin g,str[,start][


,appea


r])




返回


string



str


出现的位置。


start


代表开始搜



索的位置,可选参数,默 认为


1



1


表 示第一个



字符位置);


appear


指示搜索第几


次出现的,可



选参数,默认为


1


;若没有匹配到,

< p>
返回


0


。(


start



appear


取值为大于


0


的整数)




int/long/f


loa


t/double




Mod



n1,n2




返回一个


n1


除以


n2


的余数。

< br>支持的


类型有



Byte



Short



In teger



Long



Float



Double




返回值的正负和


n1


相关。使用此函


数需要注意




2


个问题:



1



对小数执行


mod


计算可能会产生


精度丢失,



(如



mod(3.97384626 ,3.14159265


35897384


627


,返回结果为


0.0)


)此问题由


java


环境本身导致,若


需要高精度



的浮点计算,慎用此函数;



2



传入比


MAX_LONG


还大的整数作为


参数,



则参数会被自动升级成


Double



型,函数也可


以正常计算结果,


但返回的结果是小


数类型。




int


Bitand(expr1,expr2)


指定按位进行


AND


运算的两个数值。


当前只


< p>
支持


Byte



Shor t



Integer



Long



型。如果


expr1



expr2


的位都是


1< /p>



相应的结果位就


1;




则相应的结果位是


0


。若


expr1



expr2


类型不



一样,直接进行


and


计算




string


next_day(date,day)


计算给出日期


date


之后的下一个星



day


的日



期。


day


是数字,


1-7


分别表示星期



-


六;返



回日期的格式为“


YYYY-MM- DD





int


to_number(string)



将给出的字符转换为数字;


string


必须为全数字



串。




Oracle


中的


to_number


很复


杂,可变参数



且支持多种类型。


当前仅支持整数类



型,包括


short



int



long





string


to_char(date,format)


将日期


date


转化为一个字符串;


date


的格式固



定为


yyyy-mm-dd hh24:mi:ss:ff3< /p>



输出的格式由



format


指定。


format


当前支持的格


式如下(不



区分大小写):



yyyymmdd




年月日;



yyyymm


,年月;



mm


,月



dd


,日



yyyy-mm-dd


yyyy-mm

yyyymmddhh24miss


,年月日时分秒



24


小时制)



yyyy-mm-dd hh24:mi:ss


hh24miss


yyyymmddhh24missff3


,年月日时分


秒毫秒(


24


小时制)


-


-


-


-


-


-


-


-



本文更新与2021-02-08 07:52,由作者提供,不代表本网站立场,转载请注明出处:https://www.bjmy2z.cn/gaokao/613126.html

Hive函数及语法说明的相关文章