Let's take a few minutes to talk about C protect the data set that we'll be using throughout this course. This video is primarily aimed at people who haven't worked with a super 10 data set yet. So if you're one of those people stick around, otherwise, feel free to skip this video, and I'll see you in the next one. The Super 10 data set is one of the most famous metric purpose data sets. The main reason we choose to work with this data set in this course, is because it is well balanced, and it doesn't require a lot of pre processing, you can jump right into developing of your algorithm. This allows us to focus on the product itself that we are trying to build, which in this case is image to Image Search.
Don't worry about the small size of the data set for now. In the future lessons, you will get tips and tricks on how to improve and scale the pipeline to much more challenging data sets. So it's You can see on the main page of the secret 10 data set, there are some core information about it. The link to download the data set is in the resources for the video. It has about 60,000 images, of which 50,000 are for training, and 10,000 for testing purpose. These images are split into 10 classes with 5000 images per class.
And here you can see classes that we are working with. Although this is the main page of the data set, we are best off downloading it from the different place. The main reason for it is because this version of the data set is not standardized. And it will take a while to pre process for us. The alternative and the recommended way to download the data set is from the web page of the dark net, where you can download the data set with images already saved in PNG format, and organized in separate folders for training and test purposes. And that's it for this video, download the data set And make sure everything is the order for the next video where we are going to start working on the pre processing phase.
See you in the next video.