Hey guys, in this video we will be looking at SQL group by statements. Now the SQL group by statement is an excellent way to literally group the data to see more distinct data. And we will look at how it is written the different variations and considerations that one must make in using this statement. And in the rest of this section, we will actually be looking at aggregate functions that rely on this statements. Now, I did start off by saying that the SQL group by statements can help us to identify distinct data. And I just want to inject a bit of disambiguation there with the Select distinct statement.
So select distinct actually says that whatever data I'm about to select, I will look for are only unique rows. That's what the Select distinct does. So if I did a select distinct on enrollments, then what it would do is if I had a rule that say, row 12, with the same data across the columns that I'm selecting, then it would only bring back one rule. So let's take for instance, and here's a practical example. At least in this enrollment stable. In real world, there should be no tools that have the same ID, teacher ID, student ID, and course ID at least taking it out of the picture.
No teacher, student and course should be repeating in your tables. So there's that but then, in this situation, I do have a lecturer or a teacher teach in the course and I have two instances of that appearing here. So I have 141, and four. So I quickly wrote up two statements wanted to select the teacher ID and the course ID from enrollments and If you want to get fancy, of course, you can just enjoy everything so you get about more details. But just for the expediency, I'm just going to use this quick example. So select these two columns from the enrollments table.
And here's all of enrollments. So when I run this query, it should bring back 17 rows only with the teacher ID and the course ID, no one will look at them together, we see that we have a few repetitions here we have two and two here, two and two. Here, we have one and four, one and four, etc, etc. So the Select distinct is actually going to say, okay, since I'm seeing these two, the two and two, twice, then I only need to bring it back one time. And that's the purpose of select this thing. So I'm just going to run that line.
And then you can compare the data output. So here we see that we're now eliminating our don't from AWS, it's 17. So we're done from 17 rules. To just nine. So the select this thing actually eliminated all of those repeating rules. So if you're in a situation where you have repeating rules updatable, you really only need one rule represented in your results and then the distinct keyword right before the columns, and right after the word select that is your keyword to eliminate those repetitions.
That is, however not why we're here why we're here is to group. Now grouping actually does something a bit differently than the Select distinct grouping actually will bring back only one record that matches all of the requirements. And it brings about this one record by actually doing like a bat grown mathematics kind of thing to actually keep track of the number of rows that were there, allowing us to then layer certain mathematical functions on top. So if you need My flag for instance count or the average or the maximum or the minimum those kind of things, then you need to use a group by to actually bring back that one clump of the same data as opposed to the distinct the distinct will just omit group by actually clumps. So we'll just do some examples of this. And as we go through this section with the difference, I forget functions then you will see exactly what I mean by the group by actually helps with the mathematical function.
Now we'll just modify this query that's already here selecting everything from our enrollments and add a group by and the thing with a group by is that it is the last statement. So in other words, if you have an inner join, it comes before the group by if you have a where it comes before the group by so we're going to use a we're in this situation. So let's just see where and lets us run. So quarters against teacher ID two. So we want to do some things where teacher IDs equals two. And I'll just execute and sifters.
So this is all the data relating to teacher ID two. Now, the first thing to note with a group by is that well, the second thing to note with a group by is that every column that is being referenced needs to be included in the group by and this is a blessing or a curse, because the more columns that you put in, and the more variations of data is the less grouping that can actually occur. So let's take for instance, iD iD is different in every row in this result set. So if I included the ID in my group by I literally would see no difference. Alright, so the same way you would write the Select and and specify the columns of the same way you would, group by column, comma, column, etc, etc. And if I execute that, This query, then you'll see that there is literally no variation in the data set that comes back.
That's because because each role is unique, then there is nothing to group because it can't group distinctly different data or roles. So when you're running a group by have to start with a process of elimination, what data is not absolutely necessary to my grouping? So let's say for instance, ID, I really don't need a group ID. All right. And so my group by starts with teacher ID, and the school is up. Now that I've taken it out of the picture, it depends on what kind of data we want to get back once again.
So let's say our scenario is that we want to see how many are all of the courses being taught by this teacher, which means that I just want to see the teacher ID, and the course the teacher ID and the course teacher ID and the course and you can see that these start Repeating, because he's already here for of course, too. He's already here, for course, one, etc. So obviously, the variation of the values with students ID across the courses would skew that result. So I can just eliminate students idea now just coming to talk quickly, because I'm going to use it at a later date. And then I execute. And there you go.
So the group by is now eliminating all of the additional rows, and just bringing back the group's teacher ID with course, one course 243. And remember, that we use group by because we want to use some maths and we'll get into the math and the aggregate functions a bit later. But for now, just appreciate how the group by works. So we see here, obviously, if the grades were different than there would be variations there also. So it would be distinctly different data again, so Right now we don't need grid actually didn't remember to remove that and can execute. And there we go.
So the group bys are slumping these two together. What if we wanted to see all of the students that this teacher is teaching regardless of the course. So then students ID would become our source of contention. In addition to course ID so we can just comment, of course ID, and then we run it. And so we see teacher to come back with only one record of the students. And of course, we can go ahead and join it on our table if we wish.
All right. So I wrote up my inner join statements, and I'm just showing you this error here. And this is why I would have alluded to the use of aliases before because if two tables have the same column name, then there's is going to be some amount of ambiguity as to which one is which. So that's why we put on our aliases. And we use those to distinctly identify which table it is that we're referring to. Now, having added both tables, Inner Joins to this query, we need to add the columns.
Now, notice I'm getting this, this arrow saying that certain things are invalid because they're not a part of the GROUP BY clause. So remember that any column that I put in another space, so what I did was include the student first name and the student last name, and call it student name. And I included the teachers first name, and last name, call the teacher's name. And both of these or any column that I'm referencing in my select has to be referenced in my group. By, so I will have to also add s dot, first name, dot last name, and repeat the same for the teacher columns. And after I've completed adding my additional columns to my group by, so it really doesn't matter the fact that I'm concatenating remembering but the full name, it just matters that I am making a reference to them in my select.
And if I'm selecting them, then they must be a part of the group by so when when I write this entire query and this entire condition, then I must ensure that my group by is a part contains all the columns that are part of my query. And so below you'll see that the teacher ID and the student ID are going back and the student name is going back as well as the teacher name. And then really, and truly, I don't need to teach Your ID and the student ID. So process of elimination cleaning up my report. And here my output is just the student name and the teacher names and the back end I'm grouping by, and I could probably even take them out of the GROUP BY clause. And there you go.
So it's just a process of elimination really, and truly, so sometimes you're building from the ground up, you have a requirement for report, you don't know where the data is coming from you feel it's all to group by and then when you see a sensible result, then you can refine it so that you get the desired output. All right. So as we go along, we'll see how group by helps us with our aggregate functions.