Bigdata

Installing Sun Java in Ubuntu 14.04


Ubuntu comes with Open java and jdk but most of the time application developments demands for sun java.

Steps to install Sun Java in Ubuntu 14.04

1. Initial commands to execute

sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update

2. For Oracle JDK 6

sudo apt-get install oracle-java6-installer

3. For Oracle JDK 7

sudo apt-get install oracle-java7-installer

4. For Oracle JDK 8

sudo apt-get install oracle-java8-installer

5. Setting JAVA_HOME
Copy the path where preferred JDK is installed and edit /etc/environment file

sudo nano /etc/environment
JAVA_HOME="YOUR_PATH"

6.Reload the file

source /etc/environment
Standard
Bigdata

Meaningful Stories with Data


“If history were taught in the form of stories, it would never be forgotten.” The same applies to data. A simple analogy for why stories to be told with data.

In her “Persuasion and the Power of Story” video, Stanford University Professor of Marketing Jennifer L. Aaker explains that stories are meaningful when they are memorable, impactful and personal. Through the use of interesting visuals and examples, she details the way people respond to messaging when it’s delivered either with statistics or through story. Although she says engagement is quite different from messaging, she does not suggest one over the other. Instead, Aaker surmises that the future of storytelling incorporates both, stating, “When data and stories are used together, they resonate with audiences on both an intellectual and emotional level.”

 

Standard
Bigdata

Install hadoop on OpenSuse 12.1


Firstly, Pseudo-Distributed mode is effectively a 1 node Hadoop Cluster setup. This is really the best way to get started with Hadoop as it makes it really easy to modify the config to be fully distributed once you’ve got a handle on the basics.

Step 1: Update OpenSuse packages from Software manager.

Step 2:Install Sun JDK(click here to refer the previous post to install Sun JDK in OpenSuse 12.1).

Create a user “hadoop” on your suse machine and login with the user hadoop to carry out below activities.

Step 3:Setup Passwordless SSH- Activate  sshd  and set bootable from root bash.

>sudo bash
#rcsshd  start
#chkconfig  sshd  on

Now create ssh key for connet ssh without password.
>ssh-keygen  -N ”  -d  -q  -f  ~/.ssh/id_dsa
>ssh-add   ~/.ssh/id_dsa
Identity added: /root/.ssh/id_dsa (/root/.ssh/id_dsa)

Test connect to ssh without password — with Key
>ssh  localhost
The authenticity of host ‘localhost (: :1)’ can’t be established.
RSA key fingerprint is 05:22:61:78:05:04:7e:d1:81:67:f2:d5:8a:42:bb:9f.
Are you sure you want to continue connecting (yes/no)? Please input   yes

Step 4:Hadoop Installation:
Download hadoop-0.21.0.tar.gz file from http://www.apache.org/dyn/closer.cgi/hadoop/core/

Create a directory /home/hadoop/hadoop-install
/home/hadoop> mkdir hadoop-install

Extract the hadoop-0.21.0 tar file to this new directory.
/home/hadoop>sudo tar -zxvf /home/hadoop/Downlods/hadoop-0.21.0.tar.gz

Edit the following files in /home/hadoop/hadoop-install/hadoop-0.21.0/conf directory.

conf/core-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-install/hadoop-datastore/</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://local:54310</value>
</property>
</configuration>

conf/mapred-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>local:54311</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>100</value>
</property>
</configuration>

conf/hdfs-site.xml

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>

conf/masters
localhost

conf/slaves
localhost

conf/hadoop-env.sh
Uncomment the line where you provide the details about JAVA_HOME. It should be pointing to sun-jdk. That is as shown below.
export JAVA_HOME=/usr/java/default

Setting the environmental variables for JDK and HADOOP
Open the file ~/.bashrc file and paste the below two command at the end of the file.

>vi ~/.bashrc

export JAVA_HOME=/usr/java/default
export HADOOP_COMMON_HOME=/home/hadoop/hadoop-install/hadoop-0.21.0

To get the immediate effect of .bashrc file, following command must be run.
$source ~/.bashrc

Starting hadoop processes

Format the namenode using the following command
bin/hdfs namenode -format

Start the dfs:
hadoop@localhost:~/hadoop/hadoop-0.21.0>bin/start-dfs.sh

Start the mapred:
hadoop@localhost:~/hadoop/hadoop-0.21.0>bin/start-mapred.sh

Check for running processes.
hadoop@localhost:~/hadoop/hadoop-0.21.0>jps
SecondaryNameNode
NameNode
DataNode
TaskTracker
JobTracker

Standard
Bigdata

Installing Sun JDK in OpenSuse 12.1


Most of the applications require sun JDK as prerequisite. OpenSuse above 12.1 versions does not includes sun java package in the repository by default due to license issues.

Follow the below steps to install Sun JDK in OpenSuse 12.1.

Check the current version from a terminal window.

>java -version

By default openJDK will be installed.Filter installed version of OpenJDK to uninstall it.

# rpm -qa | grep jdk

Remove it from the system. Replace your system specific OpenJDK version that you got from above command.

# rpm -e java-1_6_0-openjdk-1.6.0.0_b24.1.11.5-16.1.x86_64

Verify that the default Java package is uninstalled.

which java

Download latest JDK rpm package from Oracle site(jdk-7u25-linux-x64.rpm)

http://www.oracle.com/technetwork/java/javase/downloads/index.html

change directory to Downloads directory to install JDK.

localhost:/home/hadoop/Downloads # rpm -ivh  jdk-7u25-linux-x64.rpm

All essential java commands seem works fine but there is something we have to commit, finally: Setting JAVA_HOME directory in PATH.

Suse stores all its profile information /etc/profile.d directory, grant access under the /etc/profile.d with root.

localhost:/etc/profile.d # su

Create jdk.sh file under /etc/profile. and write output of below echo command.

# echo 'export JAVA_HOME=/usr/java/jdk1.7.0_25'>/etc/profile.d/jdk.sh

Append output of echo command.

# echo 'export PATH=$JAVA_HOME/bin:$PATH'>>/etc/profile.d/jdk.sh

trigger jdk.sh

# source /etc/profile.d/jdk.sh

Finally, You have to logout and login to see the effect with your own user.

 

Standard
Bigdata

‘Big data’ and ‘Tweet’ enters Oxford Dictionary..!!


The Oxford English Dictionary becomes part of the social media technology revolution.

Image

The Oxford English Dictionary has a rule that “a new word needs to be current for ten years before consideration for inclusion”.

Chief Editor John Simpson, who made the announcement in a blog spot that, OED breaks its rule to match the tech savy race and adds the words  ‘Big data’ and ‘Tweet’ to the dictionary.

From quarterly update of the Oxford English Dictionary:

The word “tweet,” appearing both as a noun and a verb, was added to the dictionary.

Image

The word ‘big data’ is also added to the dictionary.

Image

The OED got on board with other tech lingo. The words “crowdsourcing,” “e-reader,” “mouseover,” “stream” and “redirect” “Flash mob,” 3D printer” and “live-blogging” also made their entry in the century old dictionary.

Standard
Bigdata

What is the point with Hadoop…???


Whenever I have a chitchat or formal talk with a BI or Analytic person, the most widely asked question is

what is the point of Hadoop?’.

Image

It is a more fundamental question than ‘what analytic workloads is Hadoop used for’ and really gets to the heart of uncovering why businesses are deploying or considering deploying Apache Hadoop. There are three core roles:

  • Big data storage: Hadoop as a system for storing large, unstructured, data sets
  •  Big data integration: Hadoop as a data ingestion/ETL layer
  •  Big data analytic: Hadoop as a platform new new exploratory analytic applications

While much of the attention for Apache Hadoop use-cases focuses on the innovative analytic applications it has enabled and high-profile adoption at Web properties. Initial adoption of Hadoop at traditional enterprises and later adopters are more likely triggered by the first two features. Indeed there are some good examples of these three roles representing an adoption continuum.

We also see the multiple roles playing out at a vendor level, with regards to strategies for Hadoop-related products. Oracle’s Big Data Appliance, for example, is focused very specifically on Apache Hadoop as a pre-processing layer for data to be analyzed in Oracle Database.

While Oracle focuses on Hadoop’s ETL role, it is no surprise that the other major incumbent vendors showing interest in Hadoop can be grouped into three main areas:

  • Storage vendors
  • Existing database/integration vendors
  • Business intelligence/analytic vendors

This is just a small instance I took to showcase how the major DATA players are slowly adopting this new technology to harness its capabilities to retain there position in the major players list.

Standard
Bigdata

CONFIGURING PERL ON WAMP


My 7 semester lab exams got scheduled for 22 of Nov,2011,to work out programs at hostel I have been struggling to configure perl to execute my “Web Programming Lab” programs. It took hell lot of time and could finally complete it.So i thought sharing the configuring the steps which might be useful for those who are stuck jus like how I was, few days before.
Perl, a scripting language  developed by Larry Wall in 1987. Perl has been constantly getting huge user response for its simplicity in text processing from the day of its release.
Here, I choose WAMP, a packages of independently-created programs which includes Apache(web server), MySQL(open source database) and  PHP as principal components.
Ok let’s start with the step by step instructions to configure.
STEP 1: Download and install wamp 2. version.Click here to download.
STEP2:Similar to step 1 download and install Active Perl 5.10.0 build 1005 from active state web.
STEP 3:Now right click on wamp     server icon which is at the left corner of windows taskbar and select put offline option else select stop all the sevices.Once all wamp services are stopped again right click on wamp server icon and select Apache then open httdp.conf file.
STEP 4: Now we need to make some changes in this httpd.conf file let’s do it one by one;
a)scroll down and look for the line “Options Indexes FollowSymLinks ” and replace it with “Options Indexes FollowSymLinks Includes ExecCGI ”

before

after

b)scroll down and look for the line  “#AddHandler cgi-script .cgi” and replace it with “AddHandler cgi-script .cgi
AddHandler cgi-script .pl ”

before

after

c)Now look for the line “DirectoryIndex index.php index.php3 index.html index.htm“  and  add index.cgi and index.pl in this line.

before

after

STEP 5:server is now configured and ready to run perl and cgi script.Now need to add additional repository and install from that repository. For that:
1. Open command prompt , then type
“ppm repo add uwinnipeg”

screen of ppm installation

2. After the “uwinnipeg” repository is added successfully, install DBD-mysql by typing this command
“ppm install DBD-mysql”
Hmmm, now were done with configuring stuffs.Try  writing some simple perl scripts     and save them in  C:\wamp\bin\apache\Apache2.2.11\cgi-bin\
to run the scripts open the browser and type this url :http://localhost/cgi-bin/  followed by your program name as shown

NOTE:  Please make sure that no process is running on port 80

Standard