So friends now we are going to learn about text processing tools for manipulation, sorting and a wide range of applications. So let's start. We have text extraction tools for various purposes, for having a file contained, where we can view first 10 lines and last 10 lines of file using head and tail, where which drives the excerpt for the text. Then we can check the file content using the cat functionality for cat command. And we are familiar with less command where multiple commands could be executed simultaneously. We can extract by column or field using the cut command and can extract by a keyword using The grep command GRP C'mon, the cat command is used to dump one or more files to a standard output.
And multiple files could be concatenated together. So how we can use those files that when you write the head command, it displays the first 10 lines of a file. The syntax for writing head with head and file name. When you want to see more lines or less lines, you have to provide the number of lines while using the argument head minus n 10, three, four anything. Then that number of lines are displayed on the lines terminal. You can also provide the output in and store it in a text file.
The tail command is used to display the last 10 lines of a file. Usually, you can write the F argument to follow up Subsequent addition to the files, the both commands are very useful for monitoring log files. Suppose you have a computer server environment, which is running Rh GL, and there are various system administrators. But some trouble occurred and you want to check who was the last in charge. And what he did the account login, you can check run the tail command, and last 10 logs will be displayed. Similarly, if you need to find the first first 10 logs, or more or less number of logs from the starting of the day, you can use the head command.
You can also extract text from a text file using the key word. The key that txt files could be any file with txt extension docx extension doc x, ODT, C Java Python anything that is containing a text and a file. We can extract text by keyword we using the grip Command G IVP command. Suppose I have to search a string called john in a file which is stored in a TC password. A provide the folder directory we need to write the command as gr EP is paste the string within the single quotes and then the full file file path for the text file ready to this is top. Then we can search the frame if you the grep command takes three arguments or options.
I and v. You can use minus i to search case insensitive Li where we need to match a small case, upper case. Full upper case fully small case Capitol arrest is small, all kinds of stuff for the same spelling. Similarly, if you want to print line numbers, number of lines, actually line number where the matches occurred. You can do so by writing minus m. You can also print lines that are not containing the keyword. Similarly, we can use other arguments like minus x and minus bx to include x lines after each match and include excellence before each match, respectively. You can use also the minus r option to recursively search our directory.
Suppose we have multiple files in a folder or directory and we did not do not exactly know which file contains the string. We can search all the files, it may take some time, based on your computation capability of processor is speed. You can use the color argument to highlight the match in a color. You can extract text by column or field using the command cut, the country is used to find based on column or field. It can be also used with a pipe, which may take the output of grep command as an input to cut. There are three arguments for this thing, minus d minus F and C, D, use D to specify the column delimiter, which is generally tab.
If the delimiter could be comma full stop is phase where anything you can use a minus F to specify the column to bring C to cut the characters or a slice. You can use gather text statistics where you can count number of lines Words, characters or bytes on a string, or a text file, which can act upon the file or STD IE, yeah, it could be operated on input as well. We need to read the command WC for word count. Suppose we have a file called book dot txt, and we want to check count the number of lines words and characters. We simply need to write WC is paste the file name with full path. It will show us below the number 42 shows number of lines in the file.
Txt file. 198 is a number of words. 2110 is the number of characters we can use for different arguments or options with this wc command if we want only to see the number of lines or you can use minus l if you want to see only number of words, you can use minus W. If you want to see only the number of bytes, you can use minus C. And if you want to use only number of characters to see, you can use minus m. Then there comes a sorting techniques. The sorting text to STD out is an original file where the file is unchanged. You can do so by writing the sort command SRT and provide various options separate followed by a file name file path. The options include R and F, you PC and ki x, where r denotes the PR is used to perform a reverse descending sword.
When you can sort a given file txt file in the descending order and performs a numeric sort f ignores the fold or case of characters in a string. You remove the remote duplicate lines in output it is representing unique characters. PC users see as a field separator k x sought by C delimited field charactered emulator fatal x they can be used to solve multiple times. You can also eliminate duplicate lines using the unique command or either salt minus u the salt minus u command remove duplicate lines from the input and unique command you and I queue command remove duplicate address and lines from the input. You can use minus C to count number of occurrence of any unique number. Use a certain for best technique you can use a combination of both and so we can Compare multiple files.
Suppose we have two different text files, where the content is almost same, more than 90% is same, but there is a little difference and we want to check the difference what is the difference between them. So, we can write the command di w F for difference da w f provide the name of the two text files, it will display a difference, it will display the difference on line number with line number at which line the differences there. We may or may do various a spelling mistakes and if you want to check the spelling mistake there is an utility in here called a spell which is used to check the spelling of a spelling out the whole file. It is the non interactively on list and count missing. misspelled words, and you can check the misspelled words. Or you can also perform various operations like word count and salt on it.
We can also manipulate text in various formats. For example, we can translate characters from uppercase to lowercase or a wide range of operations. You can convert characters in one set to corresponding characters in another set. For example, based on STD if you can do convert a small character cases to capital in a file, the format of writing this thing could be to provide the court with in court the manipulate the translated character in which it has to be and the new thing with a left directed arrow followed by the file name You can also alter the strings in a file using the cd command that is called a stream editor. It performs a search replace operation on a stream of text, it normally does not alter the source file. So the source file is being intact and on the stream is being edited.
So, we have to save the stream otherwise it is lost. You can also use regular expression for pattern matching. If you are familiar with languages like Python or Perl, you might be familiar with a regular expression and have used it for wide range of applications. regular expression is used to match is trained characters or words based on certain parameters. We have a brief we all already have a brief introduction in in the starting videos. So, just to recap, the upper arrow symbol represent the beginning Have the lines suppose we want to match certain word or strain from the beginning of the line.
If it is matched, we want to raise a flag that it is found. Otherwise, we don't need to found anything. So far, suppose I want to search USA from the beginning of the line, if USA is written as a first word of a line, it will be matched. So, the format of writing it is that is that a person right upper arrow symbol, then write the string which you want to search. Similarly, we can search a string from the end of the line using the dollar symbol. We can also do including include or exclude operation by providing them within the square brackets.
If you write characters in the square brackets, all the characters are to be matched One occurred, they will be matched. Similarly, if you place upper arrow symbol as a first argument inside the square bracket, it will ignore the strings or characters from the from is called records. These are regular expressions are used by commands like grep said less and so on. So, we have learned a few techniques about text manipulation and extraction. It is very useful throughout various operations, wide range of operations like manipulation of log files, extracting data from text files from cores programs and a wide range of utilities. So, you can use them on the command line interface as well as with shell script.
So, give it a try. Try to code your own programs and practice them Keep learning keep moving ahead. We will be learning more in the coming videos.