In the previous articles we have discussed Tokenizing, Sequencing and Padding the sentences…now we will apply those methods on a real dataset.
News Headlines Dataset For Sarcasm Detection dataset — 
Each record consists of three attributes:
1. is_sarcastic: 1 if the record is sarcastic otherwise 0
2. headline: the headline of the news article
3. article_link: link to the original news article. Useful in collecting supplementary data
data:image/s3,"s3://crabby-images/29424/29424b056b692387d0fda83d2893aa4c560906ea" alt=""
Follow this link to know more about the dataset…Kaggle
Now we shall see how to apply the methods we have learned
1. Loading the dataset and creating 3 lists to store ‘article_link’, ‘headline’ and ‘url’ info from each data point.
data:image/s3,"s3://crabby-images/83b44/83b447774dbd636e579a980d2ee938ddc59808d7" alt=""
2. Tokenizing, Sequencing and Padding the sentences list.
data:image/s3,"s3://crabby-images/b9f88/b9f8837cd9f862b461d474812d2899433be1f79a" alt=""
The length of word_index is 29657. We can see a padded sentence of size 40 i.e the largest sequence is of length 40.