Natural Language Processing(NLP)

Treebank Tag-set


Here are the most important tags used in POS tagging

POS Tag Description Example
CC coordinating conjunction and
CD cardinal number 1, third
DT determiner the
EX existential there there is
FW foreign word d’hoevre
IN preposition/subordinating conjunction in, of, like
JJ adjective green
JJR adjective, comparative greener
JJS adjective, superlative greenest
LS list marker 1)
MD modal could, will
NN noun, singular or mass table
NNS noun plural tables
NNP proper noun, singular John
NNPS proper noun, plural Vikings
PDT predeterminer both the boys
POS possessive ending friend‘s
PRP personal pronoun I, he, it
PRP$ possessive pronoun my, his
RB adverb however, usually, naturally, here, good
RBR adverb, comparative better
RBS adverb, superlative best
RP particle give up
TO to to go, to him
UH interjection uhhuhhuhh
VB verb, base form take
VBD verb, past tense took
VBG verb, gerund/present participle taking
VBN verb, past participle taken
VBP verb, sing. present, non-3d take
VBZ verb, 3rd person sing. present takes
WDT wh-determiner which
WP wh-pronoun who, what
WP$ possessive wh-pronoun whose
WRB wh-abverb where, when
Standard
Natural Language Processing(NLP)

What is Part of Speech Tagging or POS tagging?


POS is the process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition, as well as its context.Before we deep down to know about POS tagging its important to know about Parts of Speech.There are mainly 8 part of speech that define the words into different categories. Here is a short summary of Parts of Speech.

part of speech function or “job” example words example sentences
Verb action or state (to) be, have, do, like, work, sing, can, must EnglishClub.com is a web site. I like EnglishClub.com.
Noun thing or person pen, dog, work, music, town, London, teacher, John This is my dog. He lives in my house. We live in London.
Adjective describes a noun a/an, the, 69, some, good, big, red, well, interesting My dog is big. I like big dogs.
Adverb describes a verb, adjective or adverb quickly, silently, well, badly, very, really My dog eats quickly. When he is very hungry, he eats really quickly.
Pronoun replaces a noun I, you, he, she, some Tara is Indian. She is beautiful.
Preposition links a noun to another word to, at, after, on, but We went to school on Monday.
Conjunction joins clauses or sentences or words and, but, when I like dogs and I like cats. I like cats and dogs. I like dogs but I don’t like cats.
Interjection short exclamation, sometimes inserted into a sentence oh!, ouch!, hi!, well Ouch! That hurts! Hi! How are you? Well, I don’t know.

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token). Let’s take an example,

Input for the POS tagger be,

The strongest rain ever recorded in India shut down the financial hub of Mumbai, snapped communication lines, closed airports and forced thousands of people to sleep in their offices or walk home during the night, officials said today.

Then the output of the POS tagger should look like,

The/DT strongest/JJS rain/NN ever/RB recorded/VBN in/IN India/NNP
shut/VBD down/RP the/DT financial/JJ hub/NN of/IN Mumbai/NNP ,/,
snapped/VBD communication/NN lines/NNS ,/, closed/VBD airports/NNS
and/CC forced/VBD thousands/NNS of/IN people/NNS to/TO sleep/VB in/IN
their/PRP$ offices/NNS or/CC walk/VB home/NN during/IN the/DT night/NN
,/, officials/NNS said/VBD today/NN ./.

Here the NN tag refers to Normal Noun, JJ refers to adjective, etc,. to know more about the tags click here.

Reference:  http://nlp.stanford.edu/software/lex-parser.shtml

Standard
Personal Encounters

10 Lessons from Einstein


Einstein, the father of modern physics  is one of the favorite personality for majority of high school folks not just because of his  mass–energy equivalence formula E = mc2   but also because of his childhood stories which they have to read because those are the lessons as a part of the curriculum.I personally became a crazy fan of Einstein after reading the book “At The Speed of Light” by Prof.Chandrashekar.

Knowingly or unknowingly all of us would have read about the stories related relativity (Ex:Freely Falling Lift, Space Journey of Twin sisters, etc).Einstein has truly made a huge impact on many young talented folks.Just want to share few of the lessons from Einstein to all my fellow fans of the ‘Man of Relativity’ in accordance to his 133th birthday on March 14.

1. Follow Your Curiosity “I have no special talent. I am only passionately curious.”

2. Perseverance is Priceless “It’s not that I’m so smart; it’s just that I stay with problems longer.”

3. Focus on the Present “Any man who can drive safely while kissing a pretty girl is simply not giving the kiss the attention it deserves.”

4. The Imagination is Powerful “Imagination is everything. It is the preview of life’s coming attractions. Imagination is more important than knowledge.”

5. Make Mistakes “A person who never made a mistake never tried anything new.”

6. Live in the Moment “I never think of the future – it comes soon enough.”

7. Create Value “Strive not to be a success, but rather to be of value.”

8. Don’t be repetitive “Insanity: doing the same thing over and over again and expecting different results.”

9. Knowledge Comes From Experience “Information is not knowledge. The only source of knowledge is experience.”

10. Learn the Rules and Then Play Better “You have to learn the rules of the game. And then you have to play better than anyone else.”

“The most beautiful thing we can experience is the mysterious. It is the source of all true art and all science. He to whom this emotion is a stranger, who can no longer pause to wonder and stand rapt in awe, is as good as dead: his eyes are closed.”

Source: Dumb little man

Standard
Bigdata

CONFIGURING PERL ON WAMP


My 7 semester lab exams got scheduled for 22 of Nov,2011,to work out programs at hostel I have been struggling to configure perl to execute my “Web Programming Lab” programs. It took hell lot of time and could finally complete it.So i thought sharing the configuring the steps which might be useful for those who are stuck jus like how I was, few days before.
Perl, a scripting language  developed by Larry Wall in 1987. Perl has been constantly getting huge user response for its simplicity in text processing from the day of its release.
Here, I choose WAMP, a packages of independently-created programs which includes Apache(web server), MySQL(open source database) and  PHP as principal components.
Ok let’s start with the step by step instructions to configure.
STEP 1: Download and install wamp 2. version.Click here to download.
STEP2:Similar to step 1 download and install Active Perl 5.10.0 build 1005 from active state web.
STEP 3:Now right click on wamp     server icon which is at the left corner of windows taskbar and select put offline option else select stop all the sevices.Once all wamp services are stopped again right click on wamp server icon and select Apache then open httdp.conf file.
STEP 4: Now we need to make some changes in this httpd.conf file let’s do it one by one;
a)scroll down and look for the line “Options Indexes FollowSymLinks ” and replace it with “Options Indexes FollowSymLinks Includes ExecCGI ”

before

after

b)scroll down and look for the line  “#AddHandler cgi-script .cgi” and replace it with “AddHandler cgi-script .cgi
AddHandler cgi-script .pl ”

before

after

c)Now look for the line “DirectoryIndex index.php index.php3 index.html index.htm“  and  add index.cgi and index.pl in this line.

before

after

STEP 5:server is now configured and ready to run perl and cgi script.Now need to add additional repository and install from that repository. For that:
1. Open command prompt , then type
“ppm repo add uwinnipeg”

screen of ppm installation

2. After the “uwinnipeg” repository is added successfully, install DBD-mysql by typing this command
“ppm install DBD-mysql”
Hmmm, now were done with configuring stuffs.Try  writing some simple perl scripts     and save them in  C:\wamp\bin\apache\Apache2.2.11\cgi-bin\
to run the scripts open the browser and type this url :http://localhost/cgi-bin/  followed by your program name as shown

NOTE:  Please make sure that no process is running on port 80

Standard
Workshops and Conferences

BDotNet-“Bangalore .NET” User Group


BDotNet, Bangalore .NET user group took birth 8 years back when .NET users at Bangalore rightly identified the need to form a community to share and exchange their knowledge on rapidly growing technologies. Mr.Kashinath  and Mr.Vic Parmar, who are UG Leads of BDotNet, has always been the motivation for successful conduction of UG meets and Community TechEd’s.

BDotNet has been constantly supporting young and talented minds in order to motivate and direct them in the right path.I have been very fortunate to be part of this community.Right from the day one I joined the community constantly I’m able to update myself with the newest technologies.
I’m a 7th semester student at Bangalore Institute of Technology.When I approached the BDotNet members (Vic,Kashinath,Lohith and Amar) to give sessions for students, they wholeheartedly accepted my request and agreed on to come to my college to give sessions on Microsoft Developer Platform.

figure: Metro Style Flyer Created by Amar Nityananda

Yesterday, November 5 we had sessions on Windows 8(-by Vic Parmar), HTML 5 – CSS 3(-by Lohith)  and Windows Phone 7(by – Amar N). There were 250 odd students registered for the event and got a huge response from them about the sessions conducted and about BDotNet too.
It was all possible because of BDotNet community and members of BDotNet who always find time in there busy schedule to contribute to the community by sharing there knowledge.
I’m looking forward to see BDotNet becoming much more popular so that all those techies & geeks out there get to know about the community and kind of contribution these community offering to the society and hence there get benefited from the regular sessions conducted by BDotNet.

Standard
Bigdata, Workshops and Conferences

25th CSI Student Convention


CSI, Computer Society of India conducted its 25th student convention at R.V College of Engineering on 13th and 14th of October,2011. I got an opportunity to be part of the convention and present our paper entitled ” Map/Reduce Algorithm Performance Analysis in Computing Frequency of Tweets ” along with my co-author Nagashree.

The convention was a great time for all the students who came from all across the state to learn about the latest trends in the field of Information Technology.Also it was a wonderful platform for innovative young minds to share there ideas and innovations.Students who came from different places of karnataka took part in the convention and presented there papers.

Hadoop and map/reduce being my area of interest we decided to present a paper on “Map/Reduce Algorithm Performance Analysis” so that more and more students get to know about this latest emerging technology.We were given just 10min to present our paper and we had only 10 min to impress the judges and to communicate our ideas with our counter friends who were present in the convention.It was a wonderful experience to present paper in front of eminent professionals who were the judges for the event between were quite nervous as it was our first ever paper presentation.The day became much more memorable when we got to know that we got 3rd place for our presentation.

Here is the abstract of our presentation:

  Abstract of Paper presentation

Title:Map/Reduce Algorithm Performance Analysis in Computing Frequency of Tweets

Background

This paper proposes method to extract the tweets from twitter and analyses the efficiency of Map/Reduce algorithm on Hadoop framework hence achieves maximum performance.

New research in cloud computing has shown that implementing mapreduce not only influencing the performance -it also influences on more reliable storage management.

For about a decade it was considered that distributed computing is more complex to handle than expanding memory of single node cluster since inter-process communication (IPC) to be used to communicate with the nodes which was tedious to implement as the code would run longer than the computation procedure itself. But now apache.hadoop offers a more scalable and reliable platform to implement distributed computing .Through this paper we have analysed  that Map/Reduce algorithm run on hadoop  influences the performance significantly while handling huge data set stored on different nodes of a multi-node cluster .

Aim of the study

Cloud computing is the future and it will  focuses more on distributed computing. In order to evaluate the features offered by hadoop for cloud computing huge unstructured data set is required. The present study investigated those questions.

The main focus of the study was to analysis the performance of Map/Reduce algorithm in computing the frequency of tweets.

Method

About 6 to 10 lines of python algorithm was used to extract the tweets of people, taking input from twitter search API. Tweets were extracted consecutively for about 1 week resulted in a huge data set piling up to 50MB

The study was carried out in to parts. The first part was extracting tweets as mentioned above and the second was to implement customized Map/Reduce algorithm to compute the frequency of tweets on particular keywork(say “Anna Hazare”).

 Result

It was found that this approach offers a more reliable method to analyse huge data compared to any other classic methods.

Here is the slides of our presentation:

Finally after the presentation I got to know that hadoop is the platform used for the India UID (ADAR Card) project and I felt proud for having the knowledge of it.

Standard
Bigdata

txtWeb :browse internet through sms


 

“There are roughly 700 million mobile subscribers in India. But, out of those 700 million, more than 600 million Indians  do NOT have access to a computer or mobile data.”

txtWeb is a global platform where anyone with a mobile phone can access internet  just by SMSing keywords ( like web address in browsers) to ONE national number, and receive back content (up to 900 characters per SMS). Keywords represent an  application that user can make use of to  get content from the internet. These applications are created by an open community of publishers and developers.Applications include wikipedia content, local market prices, government programs, financial literacy tips etc.

txtWeb is an SMS-based browser wherein one can browse internet for no charges(provided you have free sms plan to your mobile), but much more accessible than web-browsers on computers since anyone with a simple feature phone can use it. Deploying existing content via a txtWeb site takes only 5 minutes. Creating and deploying an SMS-based app on txtWeb usually takes about 5 hours.

Using txtWeb:

Just type the keyword and send that sms to txtweb Indian national number 9243342000

Ex: “@cat ignite” , this sms would search for meaning  of the word ignite

Working of txtWeb:

  1. User sends a request to the txtWeb number e.g. @dictionary alibi to 9243342000.
  2. The request is forwarded from the phone carrier to the platform as a SMS.
  3. Platform accepts the keyword and maps it to the external URL for the application (or to the text provided if it’s a text site).The AppUrl /text should be provided by the developer when he is building an app. If it is a txtSite, the content is retrieved from the platform’s database. If it is a txtApp steps 4 and 5 described below are followed
  4. A HTTP call is made to the URL of the application.
  5. The content of the app is sent back to the platform over HTTP.
  6. Platform accepts the content . This is converted to an SMS.
  7. The SMS is transferred to the phone carrier.
  8. The SMS reaches the end user.

txtsites are static text pages used to publish information. It is analogous to a static web page on the internet. A publisher can provide content and the same can be published as a txtSite for consumption over SMS.

Steps to build your first txtSite-

  1. Click on “Create a txtSite” on your home page.
  2. Enter a keyword which would be the handle for your application.(say the keyword is Hello)
  3. Give your txtSite an appropriate description. This description would help in easy discovery of your application. The Search on the platform takes the description into account when searching for relevant apps against the search term entered by the user.(You could enter- “This is my first text application”.
  4. Enter relevant text to be sent to the end user when he accesses your application e.g- “Hello World!! I am live”
  5. Click “Publish” to get our app up and running on the platform.

Txtapps are dynamic pages used to provide information to an end user on the basis of the request he makes via SMS. It is analogous to dynamically populated web pages on the Internet. Unlike a txtsite, one needs to develop a web application, to render dynamic information to the end user using a txtApp.

There are 3 parameters that the platform sends to an application viz-

Txtweb-mobile- The mobile number of the end user in hash format

Txtweb-message- message sent by the end user

TxtWeb-location- The location as set by the end user.

One needs to access these information via API calls. The relevant information is passed as an XML.

Example Code to build a Hello World txtapp

private String TestMessage() {

String resp=””;

Resp= “+ “\” />Hello World < br/>< br/>”;

return(resp);

}

This is a html response that would display hello world on the browser once the servlet is invoked. The String resp is sent to a method sendResponse which is given below-

private void sendResponse(HttpServletResponse response, String resp)

{

try{

//resp contains htmlized version of Hello World

PrintWriter out = response.getWriter();

out.println(resp);

} catch (IOException e) {}

}

So where you are if wanna google someting jus send an sms “goog <search parameter>” to no 9243342000.. Have fun using  txtWeb.

Standard