Demo entry 6333538

NLTK tokenizer

   

Submitted by anonymous on Dec 04, 2016 at 05:31
Language: Python. Code size: 623 Bytes.

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize

para = "Hello there this is the blog about NLP. In this blog I have made some posts. I can come up with new content."

# tokenizing the paragraph into sentences and words
sent = sent_tokenize(para)
word = word_tokenize(para)

# printing the output
print("this paragraph has " + str(len(sent)) + " sentences and " + str(len(word)) + " words")

# print each sentence
k = 1
for i in sent:
    print("sentence " + str(k) + " = " + i)
    k += 1

# print each word
k = 1
for i in word:
    print("word " + str(k) + " = " + i)
    k += 1

This snippet took 0.00 seconds to highlight.

Back to the Entry List or Home.

Delete this entry (admin only).