As I said earlier, Apache Hive is an open-source data warehouse infrastructure built on top of Hadoop for providing data summary, query, and analyzing large datasets stored in Hadoop files, it is developed by Facebook and it provides
- Tools to enable easy data extract/transform/load (ETL)
- A mechanism to impose structure on a variety of data formats
- Access to files stored either directly in Apache HDFSTM or in other data storage systems such as Apache HBase
- Query execution via MapReduce
Image may be NSFW.
Clik here to view.
In this post we will get to know about, how to setup Hive on top of Hadoop cluster
Objective
The objective of this tutorial is for setting up Hive and running HiveQL scripts.
Prerequisites
The following are the prerequisites for setting up Hive.
You should have the latest stable build of Hadoop up and running, to install hadoop, please check my previous blog article on Hadoop Setup.
Setting up Hive:
Procedure
1. Download a stable version of the hive file from apache download mirrors, For this tutorial we are using Hive-0.12.0,this release works with Hadoop 0.20.X, 1.X, 0.23.X and 2.X
wget http://apache.osuosl.org/hive/hive-0.12.0/hive-0.12.0.tar.gz
Image may be NSFW.
Clik here to view.
2. Unpack the compressed hive in home directory:
tar xvzf hive-0.12.0.tar.gz
Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.
3. Create a hive directory under usr/local directory as root user and change the ownership to hduser as shown, this is for our convenience to differentiate each framework,software and application with different users.
cd /usr/local mkdir hive sudo chown -R hduser:hadoop /usr/local/hive
Image may be NSFW.
Clik here to view.
4. Login as hduser and move the uncompressed hive-0.12.0 to /usr/local/hive folder
mv hive-0.12.0/ /usr/local/hive
Image may be NSFW.
Clik here to view.
5. set HIVE_HOME in $HOME/.bashrc so it will be set every time you login.
$ .bashrc
Add the following entries to the .bashrc file.
export HIVE_HOME='/usr/local/hive/hive-0.12.0' export PATH=$HADOOP_HOME/bin:$HIVE_HOME/bin:PATH
Image may be NSFW.
Clik here to view.
7. compile .bashrc file using this command:
. .bashrc
Image may be NSFW.
Clik here to view.
Setting up hive on top of hadoop has takencare, lets test it:
8. Start hive by executing the following command.
hive
9. table in hive by the following command. Also after creating check if the table exists.
create table test (field1 string, field2 string); show tables;
Image may be NSFW.
Clik here to view.
10. Show extended details on the table
Describe extended test;
Image may be NSFW.
Clik here to view.
By this output we know that hive was setup correctly on top of Hadoop cluster, it’s time to learn the HiveQL.