Hey guys in this video we will start looking at the group by statements available to us in my SQL. The group by allows us to eliminate repeating data and select distinct similar looking records from a table. But it also affords us the opportunity to conduct certain mathematical operations like counting and you know, taking a sum on average, things like that. Though it is worthy to know that there is another way that SQL allows us to eliminate repeating data and that is using select distinct now if we take a quick look at our classes table, we see here that we have a few repeating records if we were to just want to eliminate them then we could use either the Select distinct or the group by so when I say repeating I mean that we have course ID one repeating course ID toward beating etc. Now what if we were in a situation where we only wanted to see which courses currently have classes, we don't need any additional data with only the time or the lecture, we just want to see which courses are currently rostered.
We don't need to see course number one twice. And of course number two twice, all we really want to see is course one course two, course five and course four. So we could just say select distinct and star but then what this will do is still bring back the same data set because it is looking for a distinct role, not that distinct value in the rule. So because this rule has an ID one, one and 1400, whereas this one as to one and 1700 they're not they're not the one they are distinctly different records. So in order for the distinct to be very effective, you have to make sure that you're bringing about the exact column or columns that you know are repeat And you really want to eliminate the repetition from so if I was to say courses on the score ID here, and I select this thing courses on the score IDs just going to look for the distinct values in this column, and only bring back 124 and five.
Now similarly, if I was to do a group by on this, I took out the distinct keyword, and a group by will just come after our SELECT statement where I say group by and whatever column I am selecting, I have to group by so the row column or columns I'm selecting have to be grouped by and then I execute, then you'll see that we get about the same results set. What is invisible to the naked eye, however, is the fact that the group by is actually keeping track of what it is grouping. And we can actually perform operations like counting each record that was grouped or getting an average of each thing that was grouped together. apply that. So that is the major difference between the group buy and the Select distinct. Now let us see that our supervisor asked us for a report of all of the courses that have currently been rostered for classes.
All right, so we have the courses, and the courses are in their own table. And then we have the classes, which is an instance of a course being given a lecture and a time. And so we wanted to see of all the the courses that are available, which ones have been rostered or put into the class table of our record in the cloud stable. No, we don't need to repeat course, one, we don't need to repeat course too. So really, actually, all we should be presenting in the report are the details for the courses 125 and four. And so we will probably want to refine our results set before we give that report to eliminate the double instance of one and the double instance.
So to know when I'm writing these reports, what I like to do is select as much data as I possibly can and then start my elimination. So I'm selecting star from classes, and then I'll just inner join the lectures table. Right? So I modified the query a bit, I just put in my use statement, because it's kind of annoying when I have to type school DB dot every time I want a table. So I just put in my use statements, and I INNER JOIN lectures. I'm using my NSC for classes for lecture.
And I'm also going to inner join my courses table. And then we can take a quick look at that result set. Alright, so we're here. We're in adjoining all of these but then we see where this is not clean, because we're inner joining on the lectures table, which is eliminating some of the classes that are available. So in this lesson, we'll also see that we can mix and match the type of joins that were using. So I'm I want all the classes because I need the course Well, I need all the courses.
And some of the courses or some of the classes associated with a course don't have a lecture. So the inner join is actually eliminating them from my results at what I want to see them because they have been rostered. So I can just change this inner join, because I know that this is the joint statement that is eliminating those, and I can change that to a left join. So I want to see all the classes. If you have any lecture details, then you can bring them back. And well I want to see courses and this one is safe because you know that this is not nullable.
So this condition must be met on either side of this query. So I can just execute this again. We see that we're getting back all the courses this time with any additional details if available for lectures. Now, I did say that I wanted only the course name. We can read About the number of credits and the course code, but I want to see only the courses I really don't need to enter join on lectures. Alright, so I can see that's old.
And the simpler your query, the better it is it runs faster. And it brings about the data that the answer that you need, then it's fine. So now I go through my process of elimination, where I changed my star to the exact columns that I want. Now do remember two things about this group by statement, one, the more diverse the data is, is the harder it is to group. So let's leave this on Star. And well, it's actually difficult if it's OnStar, because if I add a group by here, I would have to group by every single column that this star would bring back and that would be tedious at this moment.
So what I'm going to do is, I'm going to group by only the course related columns. So firstly, let me bring by the course related columns. So that's see all that course, Id are looking better. But we still have that repetition, we still have data mining coming to us the introduction to SQL coming twice. So what we can do now is group by, and then whatever columns are selecting, we have to group by those columns also. So I just add them there.
And then I execute. And there you see that now we have eliminated repetitions. And because the data was consistent going across, so we only had the ID being one being repeated data mining three, and this core school, it was repeating the day was the same data that was coming in more than one records, the group by was able to crunch them all together. And then process of elimination. Once again, if I'm presenting this report, I don't need to see the course ID. So I can see that's all it's off both the Select and the group by and do know if you use choose not to select but still grew by, that wouldn't cause an arrow.
But what it might lead to is, if this record or this columns data is diverse enough, you would think you're grouping by, but you're you're not seeing the the columns being split up or being crunched together the way you expect, then that means that your group by is not clean. So I'm going to introduce a wildcard column here, and I'm going to put in see that ID. So C is the alias for classes. So I'm no adding the class ID to this group by statement. When I execute, you see that everything is no doubling up again, because this value is different for each of these records. So because I'm grouping by this different value, it can't.
It's not going to actually I'd say, Oh, I'm only selecting these, it's only going to see what am i grouping by how diverse are these records are similar. If they're similar enough, I'll crunch them into one record. If they're diverse, then I can tell you so because of this diverse value, it's no no longer grouping by effectively. So I can just stay that out. And I can I can take this out because it's of no consequence I once I execute again, we're getting all of these records grouped. Now in a nutshell, once again, this and this select this thing look very similar to the naked eye because the Select distinct would do something very similar to this.
But as we go along, we'll see how grouping by helps us to do or aggregate my functions.