So if you've worked as a translator, you're probably familiar with OCR, optical character recognition. But maybe you're not too anyway, it's a very important tool. And this is becomes especially important when you become an agency. The main reason for this is that and clients have many different types of files, but translators really like to work with text. This is, you know, whether they use cat tools or not, it's just always easier to work with text. And so you if you can avoid sending them scanned copies or PDFs but actually sending them text it's a lot better.
And the best way to do this very often is with OCR. Now, I'm going to be using this called online OCR dotnet. I, this is my preferred program. I have absolutely zero affiliation with this company and nothing to do with it whatsoever, but I use it. It's a paid one. They have free ones, but this one is quite cheap, and at least I found it found it.
One of the cheapest options. And so in fact, here, I can show you quickly at least what the prices are, as of now, for 50 pages, it's 495 100 pages 795. And then when you get to, you know, because I use it quite a bit, so yeah, it gets to 2495 for 500 pages. Anyway, these are available pages I have 170 right now. So I want to go over it quickly and show you basically how it works just to give you an idea as to how it works. So I have a couple files the first file we're going to get, I think I download two files, I think they're both in English.
So first of all, I choose the recognition language because it needs to know ahead of time what the language is so that it can work through his database, output format, Microsoft Word, I always wanted him word. And here all pages I don't want because I think the first file is actually pretty big. So let's just say I'll do 123 don't want to use up too many credits. Here automatic detection scan PDF type year, I usually just do automatic detection, which is fine if I have issues I'll go back and try to scan you know, try to specify but usually on a detection is that's fine. Here. I do select the file on the desktop.
I have okay. 92 I think was this one. Yeah, that was it. So now we open it and convert. Right now it's converting whatever this PDF was into text. Here.
I have it right here. So I just click on this to download it. And that's pretty much it. It's done. Let's open it and see what happens. So now I'm going to open the original as well and so we can compare them side by side.
First of all, I should okay wait. Fine find. Let's see exhibit. There we go. So, so this is the original. Now, as you can see the original actually you can already take the text and probably copy it and stuff.
They're always issues when you do that from a PDF. A lot of things get messed up but you can still do it here. Hear, which makes it easier. And a bit later, we'll go through a PDF where you can't do that. But for now, let's see, here we go. Sample contract.
Let's compare how it looks sample contracts, sample contract, Exhibit A. So everything came out quite well. Let's go down here. And as you can see, it all came out. Now this is a Word document. And it seems like pretty much all came out correctly.
And go through here. Well, you notice here the page break was a bit different. And stuff like that will happen. But it seems like all the text came out correctly. So you can pretty much just take this as is and send it to the to the translator to work on. And it can work out well.
So if you get this PDF from the client, you can by the way, these PDFs I just took him off. I literally I think wrote sample contract PDF or something like that, and it's the first thing that came online You can pretty much just send it straight to the to the translator, the translator can translate it and send back another word file, and everything should be fine. I like to do a quick, cursory check. Like I noticed a couple things are underlined and you kind of need to see why this is probably because it's not all in the same language. You know, I need to see how it's formatted in Microsoft Word and stuff like that. Anyway, you can see how you want to do, you want to make sure that it's been done correctly, like here, I see this underlined, that's probably because this is lowercase.
If I make it uppercase, yeah, it's not underlined anymore. But in the original contract, it's lowercase. So I should probably leave it that way. Although with after the translation, it'll probably lose that. But anyway, you get the idea. And that's the way it works.
Now let's take this other file because this other file is kind of interesting in that it is, first of all, it's a JPEG. And it's a bit different and you cannot the other one, you could highlight the text here you can't because it's literally an image. So let's see, I actually haven't tried this yet. So let's see what happens and click on convert again. And now we're going to select the file, we're going to select the, this file, open. And here I left all pages, everything has just one page, so it doesn't matter.
And then it's working, actually, and I should say also because it usually works best with PDF documents, and that was a JPEG. So um, so sometimes that can be a sort of a wild card as well. There are ways to turn JPEGs into PDFs. I won't get into that. I'm sure there are 101 different tutorials about that that you can find online. But for now, I'm just straight taking it straight as a JPG and tried to convert it and see what happens.
And this is very exciting because I'm doing it live. I haven't done it yet. And so we'll see what happens. First of all, it's taking its time. There we go. Okay.
Now let's click on it and see what happens what we have here. First of all, it's zoomed out quite a bit, which is odd. But yeah, that can be be formatted as well. Again, that's formatting but see here. So this is an image, as you can see, but this all became text. So this you're going to have to do yourself, but it did detect that as an image.
But all of this seems to be text. Here I have third party standing at an extra space, it looks like there I'm not sure why it underlined this, but this was misspelled in the original as well. So actually, it caught a couple of them here. Obviously, where you have this going it didn't take the whole text out but as you can see, it works quite well. And you can quickly see spot the places where well first of all, I caught mistakes in the original but you can you can quickly see where it didn't work out because usually it will be underlined. Here food items I assume that means but anyway, this is just to give you an idea as to how OCR works, because once again when you're dealing with your translate And you send them this, it's a bit harder to work with this and it's easier to work with straight text.
And here you're able to send them straight text just by using this OCR, this optical character recognition scan, and it makes life a lot easier. Once again, I'm not trying to say you should use this type, there are plenty of other ones that you can you can find. This is just the one that I use. And so I wanted to walk you through it briefly, just to show you how OCR scanning works. And because I think it's quite useful, and so I think, hopefully you can find it useful, especially when you're dealing with end clients who send you scanned documents and all different formats. And you're also dealing with the translators who prefer to have everything in text