Thursday, November 05, 2015

Null Handling in Hadoop Pig Latin

For chararray type, when you load a dataset, PigStorage will convert empty fields to null. So in any relations, you won't find any empty string but only nulls.

However, in the pig script, if you have a constant as '', it is not treated as null.

So '' is not null return true.
'' is null return not true.

If A is a relation immediately after a load, A.$0 == '' will never be true.

If you compose something manually with GENERATE, it will keep the origin.

B = FOREACH A GENERATE $0, $1, ''; -- Will keep the value as empty string
C = FOREACH A GENERATE $0, $2, (chararry) null; -- Will keep the value as null

Sorting for NULLs

NULL is always treated as smallest value, if you do ORDER BY DESC, it will come last. If you do ASC, it comes first.

1 comment:

fathmahaarstad said...

NJ casinos offering virtual sports betting with virtual
New Jersey is now live with the launch 경주 출장안마 of 사천 출장샵 virtual sports betting, 출장마사지 bringing with it the option to bet on sporting events, Dec 김해 출장샵 7, 2020 · Uploaded 포천 출장샵 by SportsNewJersey