Quantcast
Channel: BigData Handler » BigData Handler
Viewing all articles
Browse latest Browse all 10

Setting up Pig

$
0
0

Pig

Apache Pig is a high-level procedural language platform developed to simplify querying large data sets in Apache Hadoop and MapReduce., Pig is popular for performing query operations in hadoop using “Pig Latin” language, this layer that enables SQL-like queries to be performed on distributed datasets within Hadoop applications, due to its simple interface,  support for doing complex operations such as joins and filters, which has the following key properties:

  • Ease of programming. Pig programs are easy to write and which accomplish huge tasks as its done with other Map-Reducing programs.
  • Optimization: System optimize pig job’s execution automatically, allowing the user to focus on semantics rather than efficiency.
  • Extensibility: Pig Users can write their own user defined functions (UDF) to do special-purpose processing as per the requirement using Java/Phyton and JavaScript.

Objective

The objective of this tutorial is for setting up Pig and running Pig scripts.

Prerequisites

The following are the prerequisites for setting up Pig and running Pig scripts.

  • You should have the latest stable build of Hadoop up and running, to install hadoop, please check my previous blog article on Hadoop Setup.

Setting up Pig

Procedure

  1. Download a stable version of Pig file from apache download mirrors,  For this tutorial we are using pig-0.11.1,this release works with Hadoop 0.20.X, 1.X, 0.23.X and 2.X
wget http://apache.mirrors.hoobly.com/pig/pig-0.11.1/pig-0.11.1.tar.gz

pig1

2. Copy the pig binaries into the /usr/local/pig directory.

cp -r pig-0.11.1.tar.gz /usr/local/pig

3. Change the directory to /usr/local/pig by using this command

cd /usr/local/pig

4. Unpack the compressed pig, in the directory /usr/local/pig

sudo tar xvzf pig-0.11.1.tar.gz

pig2 pig3

5. set PIG_HOME in $HOME/.bashrc so it will be set every time you login. Add the following line to it.

export PIG_HOME=<path_to_pig_home_directory>

e.g.
export PIG_HOME='/usr/local/pig/pig-0.11.1'
export PATH=$HADOOP_HOME/bin:$PIG_HOME/bin:$JAVA_HOME/bin:$PATH

pig4

6. Set the environment variable JAVA_HOME to point to the Java installation directory, which Pig uses internally.

export JAVA_HOME=<<Java_installation_directory>>

Execution Modes

Pig has two modes of execution – local mode and MapReduce mode.

Local Mode

Local mode is usually used to verify and debug Pig queries and/or scripts on smaller datasets which a single machine could handle. It runs on a single JVM and access the local filesystem.

To run in local mode, please pass the following command:

$ pig -x local 
grunt>

 pig5

MapReduce Mode

This is the default mode Pig translates the queries into MapReduce jobs, which requires access to a Hadoop cluster.

$ pig

2013-10-28 11:39:44,767 [main] INFO  org.apache.pig.Main – Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53

 

2013-10-28 11:39:44,767 [main] INFO  org.apache.pig.Main – Logging error messages to: /home/hduser/pig_1382985584762.log

 

2013-10-28 11:39:44,797 [main] INFO  org.apache.pig.impl.util.Utils – Default bootup file /home/hduser/.pigbootup not found

 

2013-10-28 11:39:45,094 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://Hadoopmaster:54310

 

2013-10-28 11:39:45,592 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to map-reduce job tracker at: Hadoopmaster:54311

grunt>

pig6

You can see the log reports from Pig stating the filesystem and jobtracker it connected to. Grunt is an interactive shell for your Pig queries. You can run Pig programs in three ways via Script, Grunt, or embedding the script into Java code. Running in Interactive shell is shown in the Problem section. To run a batch of pig scripts, it is recommended to place them in a single file with .pig extension and execute them in batch mode, will explain them in depth in coming posts.


Viewing all articles
Browse latest Browse all 10

Trending Articles