Hi. In this lesson, we talk about grouping and how to use group functions. To start with, group function is capable of taking multiple rows and aggregate and return that. So a group function like some is going to give you the sum of the values present in the given roles. So sum, average minimum maximum standard deviation, variance and count. These are the group functions.
The general syntax is the function name followed by the column. But if you give distinctive before the column name, it means you are only considering distinct values are unique values. So, you're ignoring duplicates now Here are some examples. Select sum of salary from employees. This is going to give you the total salary of all employees, this is my sum, and then this is the column and this is going to give you the count of commission percent days. This is interesting because it counts only not null values.
So this is giving you how many employees have not null in commission percent days. And this is about getting the first and last the date on which employers joined. The minimum hire date is the latest hire date and maximum hire date is the most recent hire date. So, you can even subtract one from another to find out when you got the first employee joined and when you got the most recent template join, and how many days are they In between, and this is where I want to count the number of jobs employers are doing. Well, the number of distinct jobs because distinct is only counting unique values. And then this is about average salary of employees where the salary is greater than 10,000.
Of course, you can use WHERE clause you can use other clauses also. For example, later when we talk about grouping we can include order by etc. Now, this is about growth functions without any grouping. We'll see grouping shortly, but these are the group functions. Let's switch to this and see how we deal with that. So here we are trying to find out the average salary of all employees.
So this is what I want. To give, and that's going to give me the average salary. So that's the average. Of course, you can run date and truncate it and do all sorts of things that we have been doing. I say I just want to get only an in teaser, so I can go for trunk. And then if I say, I'm only interested in employees, where the department ID is something like 80, then I get it.
This is the average salary of employees in department 80. But what if I want to get average salary of employees for each department? Because this can give me an average salary for one. But I want that for each and every department. How do we get that? Well, that's where grouping comes into picture.
We'll say that very shortly. Before I go to that, let me count how many employees are there In the department at and this count star is just a count how many rows are there not counting any not null values in a single column. This is almost equivalent to saying count of employee ID because do remember, employee ID is a not null column so you can't have null value there. So this is going to be equal and to count star and I find count star more convenient to use it so I go with that. But that's not same as counting the commission person days. If I want to know the commission person days for employees and department 80 then it is still giving me 34 Well, don't be surprised because this department is where all employees are given commission percentages.
But if you go for some other department like 60 Then the count is zero. But it doesn't mean we don't have employees in that, let's try. Let's see how many employees are there. So that's about the number of employees we have, and how many employees are having commission person days. So this is about a count. It can either take star or a column.
But when you give a column, it counts number of not null values. Okay, that's about simple group functions. But how do we use that with the group by now grouped by class, if you switch back to this is to divide the data into different groups based on the value in a given column. So if I want to divide the employees into different groups based on department ID, then we need to use group by default meant ID. So let's switch to that. So here I say, I want to get the department ID.
And I want to know how many employees are there. And I want to divide the data using department ID. So saying, department ID, then count from employees group by department ID, this is what we need to do. So now you get each department and how many employees are there? Well, you can also use order by and say order by department ID. Then you get all the details in the sorted order of department ID.
Of course, you can sort even by second column, if you like. Then you get the data sorted by the count. So that departments with more employees are pushed down to the bottom. Well, if you want them to be at the top, you know what to do, just use descending. And interestingly, there is a row for a department ID. Now, one of the rows in employees table is having null in department ID, what if I want to exclude that I can.
So I need to come to this and say where department ID is not null. So consider only those rows where department ID is not null. Then group that data by department ID, and then we are saying we want to sort the data by the count of employees, you can change it to one to whatever you like. So go ahead, and this time, you don't see that now. Okay, I go back to one and I'm just going by the sorted order of department ID. So this is How you group the data and you calculate some aggregate for each group.
And you can count you can get the average salary, total salary, you can have whatever you like, this is the way it goes. Okay. Now, coming back, what if I want to know how many departments are which departments have employees with average salary more than 5000? Well, I can't do that. Let us say we want to get the average salary of employees. And I'm only interested in departments in which average salary is more than 5000.
So this is what I have to do. And if you think I can do that with wear, I'm sorry, you're wrong. Because wear cannot be used with any group functions because group functions are performing The operation on groups and groups are not created by the time where is executed. So, you have to introduce a new class called having. So we are saying having average salary greater than 5000. So I'm interested in only those departments where the average salary is greater than 5000.
So, this will give me only those departments in which the average salary is more than 5000. And to keep it simple, we would like to use trunk and we want to get only simple numbers like this. So you have to give them in the order that's important. First comes where then grew by then having and finally you have to put order by That's the order. And here you can see some more examples. The first one is going to get you the job ID and sum of salary.
And then we are talking about the department ID and count of the employees. And then we are talking about another important one, we need not always give a column we can also give an expression. So I'm giving an expression which is here of hire date. And this tells me how many employees joined in each ear. And we need to group the data by the exact expression. This is a very important point.
Whatever use in group by the same expression you need to give event for select. And if these two don't match, it's considered to be an error. And the next one is where we are talking about selecting a few rows, and then grouping them using job ID. And then finally, it's possible that you group the data by multiple columns. First you want to group by department ID. And within the same department, you want to group again, based on job ID.
And then this is about having class. And here are some more examples. And this is where I want to display the jobs with total salary greater than 50,000. And this is where I want to display the manager and count of the employees reporting to the manager, if we have more than five employees reporting to the manager, so manager ID is nothing but the employ whom an employer is reporting. So if we're not Three is reporting to one, not one, then in the row of one or three, we have one, not one as manager. So this is where I want to know, the manager.
So where we have more than five reports. And then the next one is where we divide the data by department ID, and select only those departments where the count is greater than five. So these are different examples. And coming back, I can go and find out how many employees joined in each ear. So this is how we can get hired date for wise count of star from employees, but don't forget group by exactly the same expression like this. So this will give me how many implies joined in each year.
Let's sort it by one and you You're going to get something like this. So this is about the number of employees joining each year. According to my data, we have 2001 2008 employees joining, and these are the counts. So if you want a more legible column heading, you can say something like a number of employees joined, that's fine. Well, you can give as I told you, either in uppercase or lowercase doesn't matter. For clarity, I use generally uppercase, but even if I use lowercase doesn't matter, it still works.
Okay. So this is about it. But if you want to consider only the years in which more than 10 employees join, then just use having count of star greater than 10 and you will be getting only a few like this. This is the way it works. So That's about having and this is the order. First we are going to apply where and filter rows.
So if I have 100 rows based on the condition might be only 70 rows are selected. Now those rows are divided into groups using group by. So that might be something like you end up having 10 different groups. And then you're going to use having, which will select a few out of those groups. So maybe from 10 different groups are selecting only four groups, because only four groups are satisfying the given condition. So having comes after group by group by shirt come after where that's exactly the order and you can't change it.
And if you are planning to use Audra, by that always comes at the end So that's about the way these classes are applied and this is the order of execution. Well, this is about this lesson, we understood how to divide the data into different groups, how we can create groups by a single column or multiple columns. We understood how we can use group functions, group functions can be used with or without group by, but it is generally a practice to divide the data into different groups and get some aggregates. We also understood how having class can be used. It is to select a few groups and having a is to select groups and where is to select rows. You have to understand that distinction clearly.
So do not interchange them. Where is used too much before grouping and having used after grouping, because it works on groups. So those are the important things. And this lesson is pretty important because grouping is such a common task. And you have to be very comfortable with grouping. Well, so keep practicing, keep trying queries.
I'll catch you in the next lesson.