Setting up Pig

Apache Pig is a high-level procedural language platform developed to simplify querying large data sets in Apache Hadoop and MapReduce., Pig is popular for performing query operations in hadoop using “Pig Latin” language, this layer that enables SQL-like queries to be performed on distributed datasets within Hadoop applications, due to its simple interface, support for doing complex operations such as joins and filters, which has the following key properties:

Ease of programming. Pig programs are easy to write and which accomplish huge tasks as its done with other Map-Reducing programs.
Optimization: System optimize pig job’s execution automatically, allowing the user to focus on semantics rather than efficiency.
Extensibility: Pig Users can write their own user defined functions (UDF) to do special-purpose processing as per the requirement using Java/Phyton and JavaScript.

Objective

The objective of this tutorial is for setting up Pig and running Pig scripts.

Prerequisites

The following are the prerequisites for setting up Pig and running Pig scripts.

You should have the latest stable build of Hadoop up and running, to install hadoop, please check my previous blog article on Hadoop Setup.

Setting up Pig

Procedure

Download a stable version of Pig file from apache download mirrors, For this tutorial we are using pig-0.11.1,this release works with Hadoop 0.20.X, 1.X, 0.23.X and 2.X

wget http://apache.mirrors.hoobly.com/pig/pig-0.11.1/pig-0.11.1.tar.gz

2. Copy the pig binaries into the /usr/local/pig directory.

cp -r pig-0.11.1.tar.gz /usr/local/pig

3. Change the directory to /usr/local/pig by using this command

cd /usr/local/pig

4. Unpack the compressed pig, in the directory /usr/local/pig

sudo tar xvzf pig-0.11.1.tar.gz

5. set PIG_HOME in $HOME/.bashrc so it will be set every time you login. Add the following line to it.

export PIG_HOME=<path_to_pig_home_directory>

e.g.
export PIG_HOME='/usr/local/pig/pig-0.11.1'
export PATH=$HADOOP_HOME/bin:$PIG_HOME/bin:$JAVA_HOME/bin:$PATH

6. Set the environment variable JAVA_HOME to point to the Java installation directory, which Pig uses internally.

export JAVA_HOME=<<Java_installation_directory>>

Execution Modes

Pig has two modes of execution – local mode and MapReduce mode.

Local Mode

Local mode is usually used to verify and debug Pig queries and/or scripts on smaller datasets which a single machine could handle. It runs on a single JVM and access the local filesystem.

To run in local mode, please pass the following command:

$ pig -x local 
grunt>

MapReduce Mode

This is the default mode Pig translates the queries into MapReduce jobs, which requires access to a Hadoop cluster.

$ pig

2013-10-28 11:39:44,767 [main] INFO org.apache.pig.Main – Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53

2013-10-28 11:39:44,767 [main] INFO org.apache.pig.Main – Logging error messages to: /home/hduser/pig_1382985584762.log

2013-10-28 11:39:44,797 [main] INFO org.apache.pig.impl.util.Utils – Default bootup file /home/hduser/.pigbootup not found

2013-10-28 11:39:45,094 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://Hadoopmaster:54310

2013-10-28 11:39:45,592 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to map-reduce job tracker at: Hadoopmaster:54311

grunt>

You can see the log reports from Pig stating the filesystem and jobtracker it connected to. Grunt is an interactive shell for your Pig queries. You can run Pig programs in three ways via Script, Grunt, or embedding the script into Java code. Running in Interactive shell is shown in the Problem section. To run a batch of pig scripts, it is recommended to place them in a single file with .pig extension and execute them in batch mode, will explain them in depth in coming posts.

Setting up Pig

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112