Hello, and welcome to this bonus video my name is Partha Majumdar. One very popular application in modern world is a word cloud. Now word cloud gives a lot of information regarding the matter which is being discussed. So it is used in various applications like HR, analytics, then blog analytics, etc. So we will see in this video how to create a word cloud. We will discuss the code that we need for generating this application.
I will provide you the code as attachment to this video. Also, we will walk through the code and see a demonstration of the software. The basic steps for creating a word cloud is as follows. First, we need to gather the data for which we need to form The cloud then we form the frequency table from this data and from this frequency table or for the words we generate the word cloud the data that you can get for generating word cloud can be either a structured data or it can be unstructured data. Structured Data, what we mean is that the data which is in the form of a table or maybe from coming from a database, so, this will normally call it a structured data which is a proper structure as to giving information regarding what is the fields or the columns and what is the nature of the data etc.
When we talk about unstructured data, we talking about data coming from sources like social media like Twitter, Facebook, or document, etc. We will first create the word cloud from our structured data. The data that we will use is the issue pencils that I track on a daily basis. Now this contains the date on which the expense was made the narration which describes what was the expense made, the amount spent and the category for which the expense was made. Now, the narration is a column which is a text column which we can use for making a word cloud. Similarly, we can also use the category column for making a word cloud.
You can see a sample of the data which has been displayed using the head command, the structure of the data can be seen which has been provided using the str command. You notice that there are 7091 data points for the four variables that we have described in earlier. Now we will walk through the code that is required for generating the word cloud from this data. We will create the word cloud for the narrations which is provided in the data We will have to first include the packages, which is required for this program. One is PL via. And second package which you require, which is called word cloud.
So using what cloud we can generate word cloud in our, we have to first read the data. So we use read dot CSV as this is a CSV file to get the data that we require for forming the word cloud. Once we get the data, then we use the count command to form the frequency table. Once we have formed the frequency table, we can use the word cloud command from the word cloud package to form the word cloud. We will see a demo of this entire code and we will see how we can manipulate the different parameters of the word cloud command to generate different kinds of word clouds. So here we are.
In our studio, I've got the same code that I displayed in the slide. So we invoke the libraries. You can see that this library is not present so it is downloading this library. Once a library is downloaded we read the file and there we can see the head of this file. Now we see the structure. So you can see we have about 7091 observations in four variables.
Now we generate the frequency table. We'll have a look at the frequency table. There it is. Now via also So let's try to generate the word cloud. So you see that we get error. So it says this error basically because the Word Cloud Library is not loaded.
So we will load the Word Cloud Library once again. Now it's got loaded, and then we will try this command. Now you'll see it's generating the word cloud and the word cloud has been generated. Now, we will try to change some of the parameters and see how the difference scenarios are displayed. So he changed the random order as false. Now, you see this is the display what you get from the word cloud will change the minimum frequency.
So now all the words which have more than 20 or equal 20 frequency will be displayed and will display them Smell 150 words. So there the word cloud has changed now we'll see what is the impact of changing the color palette. So there we have a different color scheme. To the plot, we'll clear the variables to the console. I will change the field on which the word cloud is regenerated. We will use the Category field for generating the word cloud.
We'll run the entire code so we get error. There is basically because we changed the field for which we have taken the frequency. However, we have not changed the field name in the word cloud command for which the workload has to be generated. So, we have to make that change. So, we have to change the narration to category. So, now, we run the entire code once again.
So, there you see the word cloud is generated for the category. So, we created word cloud from structured data. Now, we will see how we can create a word cloud from unstructured data. The data which we will utilize is my resume A, which I have saved in the text format. So it's a text file, which contains learning text regarding my resuming now we have to use the package called tm. Tm stands for text mining.
So using the text writing package in R, we can first massage the data which is coming from my resume. And then utilize that data for creating the word cloud. We will see how this is done in the demonstration. So we are back in the art studio. So we load the packages first. So we see that the TM package is not available.
So it is getting installed. So there the TM package is getting installed. And now it's done. Now let's load the data first. So they will load the data using the inspect command. We can see what the data actually looks like.
Now we will clean the data so that it gives us the data in the format that we want. So we run a series of commands. First we convert the data to lowercase. Then we remove the numbers. And we can remove the words that we don't want to include in the word cloud, we can remove the white spaces, punctuation, etc. So once the data is cleaned, then we form the DTM to take a look at what the matrix looks like, so here is what has been extracted from the data.
So all the words and their frequencies We will sort this data in descending order. Now we'll create a data frame from this data which has been generated. So this is done let's take a look at what this data frame looks like. This should be familiar just something like what we are done with the structure data also We generate the word cloud. There you see the word cloud has been generated. So it clear it.
And we'll try to change some parameters and see how it looks like. We change the minimum frequency, and the maximum number of words to be displayed. Now we select the entire code and run it. You see that the libraries are no longer installed, and you have the fresh word cloud. That concludes our discussion. Thank you for listening