Chapter nine, image and advanced Citrix automation. In this chapter, we'll be going through an overview of for image in advance Citrix automations include, we'll go through some image specific activities. We'll go through keyboard actions, and we'll also walk through output activities. Overview in recording chapter, we have discussed automations within virtual environments, where the whole operating system screen is treated as an image by the workflow. virtual machines in simple terms are the emulations of a computer system. They typically run on a server, providing behavior as much as a separate standalone computer, running applications and programs on it.
The interface between the client system and the server system streams a live image of the server and is represented as such to the user. The techniques implemented for automations in such scenario cater to image automation. Image automation is not limited to virtual environments, as we have discussed in OCR output method of chapter seven. To facilitate image automations UiPath provides distinct capabilities to achieve the desired result. Note, even though the features provided by UiPath for image automations are pretty advanced, recording them to create an automatic workflow is still not an option. The activities need to be added manually to the workflow image specific activities there are quite a few image specific activities supported within UiPath like click image activity, then we get to have relative element selection and click text activity.
Right. We will be going through each of those through an example. So let's open UiPath studio, right and we'll be using These activities from the Citrix recorder. Right? So click image activities right here. So from Citrix, this is one of the like major activities using image automation.
And similar to click activity where the mouse click operation is performed on the selected UI element. Click image works for clicking on an image which is chosen as the UI element by the user. This is commonly used activity but works under limitation. It can't have same or similar image as that might lead to erroneous operation execution by the workflow. However, it is perfect to be used in cases where the image selected on the screen is unique to the overall screen. Click image activity is very sensitive to colors and overall image look and feel.
And changes on the same may not effectively be managed in here. So say if you want to click on an image, which is let's say any of these three three options. It's a split deposit is what you want to click, you select that, and we're going to talk about this just say okay, and we sing that click on the center. Like that's what that note said. And in here, if you see attached windows there as the container and you have a click image activity, right? And if you run this program, this workflow, you see that spirit deposit is clicked, right?
Okay, this worked fine. But dick Let's take another case where we want to click on this image, right? Just see if it if the click actually and it actually clicked on the first one. You see because the cursor right There. So that's because all three are even in all these four are pretty identical, they just text boxes, right. And if you would have clicked here, like the for the whole image, then you would have to, you know, give some sort of, like if you if you would have taken the label also in the image, then the center of it if the label is pretty large than the center of it may not have been, you know, the actual textbox area where the input action needs to happen needs to occur.
So, this brings us to the next, you know concept that we're going to discuss, which is relative element selection. There are times right when RPA developers don't feel the use of manual recording feature within UiPath. They believe it's precisely same as the usual workflow creation process, where activities are added to the designer panel from the activities pane. However, the concept of selecting a position for clicking an image clears out this confusion. Let's see Let's also delve into that note that we were getting now. So imagine, you open the Citrix recorder, you have click image, and you choose this whole image.
Right. So now, you once you chose the image, it's going to tell you where the center is, which is right here. This works fine right? Now say, if it wouldn't have worked fine, let's try on the not honest check. Let's see something like this. Right, the center of the this whole screen, where we need to have the click operation is actually in between the label and the text field, which is not what we want to click to happen.
We want the click to actually operate on the text field itself, right? Either we can select right bottom left bar, but we have like instead of the center, we can say top left or bottom left, bottom right which means Have one which which might have not our use the indicate point to the screen element. And once we indicate, we just have to click it here. And that's it, save and exit. Right. Now, if we go and check this click image activity in the Properties itself, this offset x and offset y are automatically been populated.
Trust me, this activity if you would have taken it from the activities panel, then setting off these offset X and Y would have been pretty challenging, I'm sure. Because getting to know the screen coordinates within an within a selected image. I think it's close to impossible. So that's where selecting a relative element helps out within the selected image. And it helps us point out that within the selected image, where do we want our operation to work upon? Right, it could be a type into operation, it could be clicking on operation or anything.
Right. This makes convenient for the user to set the position and the operation will take place at that particular position itself. So that covers the relative element selection. The next one is clicked text activity, right? So click text activity is actually going to use OCR technology to find a given text within an image and perform the click operation on it. Okay, so for this, again, we'll have to play around with the OCR, like, you know, data extraction wizard, and that's how the operation is going to be performed.
So let's take one example. We have pretty much the same application open in a virtual server this time, right? Because the whole thing is considered as an image. Right. So now, if we take Citrix recording, I do want to click on the text, we want to click on withdraw text right? So now, say if you take because you will have to take the whole, you know, image into consideration.
Right. So right now, this region is showing some real results, right. So again, you'll have to play around with the OCR, let's increase the scale to three. To increase the resolution, let's invert so that the background gets inverted in the back end. And let's see if this works or not. And it is pretty good.
Right? You see, these are creators. Oh, but other than that withdrawal is working fine. So now if you search for the withdrawal text, and there's also an occurrence so let's say this word draw text if I would have taken the whole application as the screen for the scraping for the OCR to work upon, and if there were same labor labels are fields within the selected screen. Then we can say that you know, on which occurrence the first occurrence or on the second occurrence, we want our input action to be performed. Right, we want to perform the search text up, we want to search for the withdrawal text.
And we could have said the mouse position as well, but this is pretty much doing it, we have selected pretty good region like the transaction region, and we can test the click before you know actually making a activity out of it. And withdrawal is working fine, right. So that is how the click text activity works. It finds a given text within an image and perform the click operation on it. It works fine if the possibility of change of an image color or its overall look and feel exists since the data will be captured and operation is based on the extracted result. Right.
Even if the possibilities like the Yeah, the color might change. It's okay because we are working on OCR which doesn't deal with the current the format changes it actually works with scraping the data out of it and performing operation accordingly. Right. So that's how the this activity click text activity is used within the Citrix recording. Okay, keep keyboard actions and output activities. In our daily activities.
Many times, we tend to use keyboard shortcuts instead of performing the task via mouse. A few examples would be to traverse to the next text field in an application using TAB key from the keyboard, clicking on a button by moving keyboard cursor to it and pressing space or maximizing a window by sending a combination of Windows key and up arrow key instead of clicking the maximize button and so on. These methods also work efficiently if implemented correctly in RP workflow, and can tremendously reduce the painstaking effort element identification and interaction. It effectively handles the shortcomings of click image or click text activities with small change in graphical image format, or addition or deletion of characters of label can disrupt the entire workflow. Let's go with this by an example. So suppose that in the virtual environment itself, because this is going to be a very decent example that we will be taking here.
Suppose we have an application open, which is going to take new orders, right. And it had some, I don't know, ABC value already in it. But let's say this application is where we need to input some data. It's got some text fields. So what we should do and I'm gonna do this from the scratch itself, take the Citrix recording, we want to either click on text or click on image to make sure that you know Right now our cursor is on the first image here. So what we'll do is we'll click on the image here, just to make sure that our cursor is right, save and exit.
Even if I'm, you know, by default, I'm putting the cursor somewhere else to make sure our program workflow works. So we have click image, and we have everything all set in here, it's gonna click on this image and in the text window, and then if you want to type some data into it, we'll use the type into activity, which is right here. And where we are going to type we're going to type to the whole virtual machine because it doesn't matter. Once we have the cursor on the correct position. We can take it from there and the input is going to go to the virtual machine itself. Because that's an image, that's how it has been treated.
Right. So whenever you're typing something, it's going to go to the whole virtual environment and correspondingly to the correct element within that virtual environment. So let's say you want to enter the first product, which is gonna be a broad one, right? That's the product. And then instead of, you know, clicking using another activity to find this image, and relative to this label, you want to click on this text. What you can do is because say, if you would have been doing this, even, you know, like on a local machine, or even like by yourself, instead of the bot doing it for you, you would have written something, you would have clicked the tab button on your keyboard and you would have written the next step, right.
So let's follow exactly the same methodology. To implement this so that it becomes efficient, right a very easy way, simply click this plus button, which helps you send any keystrokes to the element. And what you're going to do is, you're going to choose because it's in alphabetical sequence, simply click tab, right? And there needs to be a plus sign. Excuse me, sorry. After this.
If we choose tab now, oops, my bad. what you gotta do is i plus, and then you put in the tab button. So now know why that happened. But yeah, and you add plus, because that's how we concatenate if we are either using two separate strings, variables or any keystrokes. That's how we concatenate a message within a type into window or in any string expression. That's the valid connector between the two.
So we just keep it copied here. And next is the number required, how many numbers do we require? Let's say 30. Right? And then what is the next one unit price? How much is the price for a single unit?
Let's say $12 or something, then again, I select the plus, and I did it. And then the cost center let's put it anything right, cc or something, right. And if you click anywhere else, the blue sign is gone. That means there is no errors in terms of, you know, the compilation issues. So we have a pretty decent workflow which is going to work with the using some image automation principles, principles, and it's also going to take some keyboard strokes and shortcut keys. That's going to help automate the process and make Get more efficient.
So if we run this you can see, it's gonna click on that here, and then it won't, and boom, already done. All the four fields, all the four fields have got the data in them. And it was way too fast because the tab key, we made the robot feel like you know, it's working on the first field itself or on just one image itself. But actually, internally, it went to all the four different fields, text fields, and input the data. These techniques are not like specific to just image automations you can and you should be using them for local desktop applications as well. Because keep using a keyboard, it's always going to be more reliable.
And because the bar is gonna work hundred percent of the time so it's not like you know, it will skip one command over the other or something like that. The only thing is, with these sort of methodologies to implement them are over All bar becomes more robust and becomes, you know, resistant to any formatting changes or any color changes or anything like that. Right. So that's a very good way if you want to maximize the screen, then you can definitely send a keystroke or send a hotkey. So like, I'll show you how to do that as well. If you wanted to send a you know, instead off in one like in one activity itself, all the four parameter values, if you would have wanted to send TAB key in separate activities, you could have used send the hotkey as the activity.
And in here again, you would have had to select the not in here actually. Because you cannot think within the activity it has to be a sequence or something or the flowchart in this case. So You can indicate that on the screen which element you want to send the key and the you can use stab or something. Right. But for dive into activities, this is an even better approach because you're solving a pretty sophisticated feature in just one activity itself. Right?
So I highly recommend use keystrokes. Okay. Next is application state. For any automation to be successful, it is of utmost importance to know the current state of your application. There is a possibility that due to a bad internet connection, your web page is taking too much time to load. You don't want to execute subsequent activities to input your, say logon credentials to a page that's not visible yet.
Nor would you want to keep waiting for the application to load forever. This will lead to abrupt termination of your program, which is surely not a desired outcome. As per the definition itself. RPA is the practice of mimicking the interactions with humans. pewter system in the same way as a human foot. There are activities that can be inserted into a workflow to make sure the subsequent activities run only if a certain application state is met.
The logic of application state check applies to local systems, web applications, as well as Citrix environments. Also on local systems, UiPath is intuitive enough to wait for UI elements to appear and perform actions accordingly. That's not the case with virtual environments. Thus, the concept of application state becomes more important in this case, being treated as images. interaction with UI elements of a Citrix machine becomes challenging. And hence, as a better practice, we need to look for visual elements to make sure that the application is in desired state, and the activities can operate on them.
An additional information is that most of the web applications That we use have a fav icon associated with them. It's a frequently used practice to wait for the fav icon to appear. On other words, processing signed to disappear, such as like, you know, let's say we open youtube.com. Right? I want you to focus on this, this is processing right. And now it's showing the YouTube icon This is the fav icon, right?
So we would want the fav icon to be visible before we can actually perform any operations on the YouTube so that we can make sure that our web application has finished processing. That's a very commonly used practice in the bots as well. To wait for the fav icon to appear correctly. Right. To accomplish this, we have activity called find image activity right here. So this activity is used to find image on the screen and carry out the subsequent workflow activities if the images image is found, if not, it keeps looking for the image until the timeout is reached.
So here is where we select the timeout. And in milliseconds, we have to write the timeout value, by default is 30 seconds. That means for 30 seconds, it is going to wait. And if the element appears, then it passes on the execution passes on to the next activity. If not, then we'll get the error note saying that the element doesn't exist, right. And that's really a very good practice because you don't want to wait forever.
Neither you want your workflow to, especially in image automations to just keep going with the activities in a haphazard fashion without even looking for the proper area or the screen where the actual input needs to take place. Because if you remember in the type into activity as well, which we used in our just now. Right? We, because that was a desktop application. So it was okay. But we were giving the type into actions to the whole virtual environment, right?
So it's better to have these find image or similar activities as much as possible so that we always are sure of the different application states. We are in or the processes in, right. Next. Another activity that we often use to debug any issue is highlight activity. Right? So highlight activity, what it does is, suppose you want to highlight a particular element, right, then that can be accomplished in here.
Let's say you wanna kill this firewall. On, right, so it took that icon. And it's going to highlight with that yellow color the same way that we got in the selectors as well to highlight an element, we can highlight using like this activity. And we can either provide a selector, like in this case, or we can provide an entire element, like, you know, this whole text box or something, or there could have been a clipping region as well, because this is a clipping region for the whole selector. So that's why clipping region has been used as well. Okay, just showing this, you know, that's a very easy to use activity here.
And here it is. It showed the in the yellow box, the activity, that it's got the element that it's going to interact with. Okay. Next, we're going to talk about starting the applications right until now, all the various scenarios that we have discussed all the use cases that we have done, right. We have performed a lot of operations on different applications. And we is That the application was already open right in the word or in the notepad windows that whatever we have done, we assumed the window that the application has been open.
But in the actual scenario, we will want the automation or the bot itself to start the application, perform activities to it and then close it Upon finishing the execution. There are several ways to start an application on a local machine, the task becomes simple as the application can be accessed from its root installation directory. And the executable file right can be started. So we have like, you know, open application or something and we can directly provide the path to it and default file would be open. It becomes a little challenging in case of image automations. So we'll discuss the different methodologies that are supported and also the recommended ones.
So first and foremost, manually, any person who would start an application, what would it What the What would the person do? He would simply double click on it right? That's what our first activity is. Double click is the most common way to start an application right is to simply double click on it and to implement this we simply choose the image activity. So, let me show you suppose you want to start this training order application which is there in the Citrix environment. Correct.
So we use click image, which is here. I'm gonna take this one out. And in here, we select the image to be actually like this. Do you remember why? Because I tell you why. Because if this is selected right now, right if It was not selected.
And if I would have taken the whole image with the training order system and everything, the image would have been different. The image is different right now versus this, because it is highlighted with the different color and the background. So we always have to make sure that you know, we either select the portion, which is not going to be changed easily, or other ways, which we're going to discuss later. Right. So we did this, and instead of single click in the input within the properties parameter, we make it to click double. That way, we can click on this element, and we can double click on this element and hopefully, our application would also be started.
Let's try and see if it works. And that's it. It worked, right? So that's how we use Double click activity, it's fairly easy to use, very simple, the only thing you need to take care of is the image that is selected as the UI element that needs to be unique to the overall screen and shouldn't be changing. Right? Next is assign a keyboard shortcut.
So we've established the fact that using keyboard strokes for automation, especially with virtual environments is a highly recommended practice. To accomplish this, you need to assign the shortcut to the application and then provide specific key combination to the virtual environment via workflow. This, in turn will open the application for further operations. Let's take a quick sample case to showcase how this works. So we have the virtual environment open here. And let's say we open Excel right and instead of opening it I'm just going to its original installation directory.
It could be any sample application. But that's how it is. And once we open the properties in the shortcut tab, we see that there is no shortcut key assigned to it. So what we can do is we can click here, and we can give any, you know, shortcut key to it. It's better to use, like difficult keys so that, you know, we don't mess up with the actual shortcuts like Ctrl, Alt Delete or any of the alt f4 or those shortcuts which are being used, like by default by the operating system itself. So called he seems okay, we apply it.
Yes, that's okay. All right. And just to give it a shot, if I click Ctrl, Alt E in here, then it's opening the MS Excel application. Right, so we have assigned a shortcut The easy way is simply use a send hotkey activity. And in here, if you have to select the because you'll have to select the element where the operation needs to happen. So you select the screen of the virtual environment, and control all alt and V. That's it.
Right? Once you do that, you run and you see the activity is already finished, you go to the virtual environment, and you see that the Excel is opening up, right? So this works fine, then you can obviously use like images, find image activity and all that stuff to wait for the Excel to open up and continue with your operations. That's another way to open applications is by you know, assigning the shortcut keys. There is one more way to open applications which is using the parameters. So there are situations When certain parameters need to be passed to an application, like in a virtual environment, you may intend to open an Excel document with one or two files or multiple files.
So what we can do is, we can open either the Run command and provide the shortcut key to open the Excel document. And alongside we can pass some arguments or in local desktop, what we can do is we can use the open application activity in here like this one, right, and then here we have the filename and the arguments that can be passed. So instead of opening the Excel application itself, it's going to open the specific document that we're asking it to open. Right. So to give an example how this works, see, we have a sample one Excel file, which an opening looks something like this. It's a sample file opened successfully.
Right. And what we're going to do is, we're going to go to the Excel where the Excel, you know, application is installed in our local system, which is in here. And what we'll do is we'll provide this under the file name, and excel 2013 dot i nk. So how did I get this information? Because if you go to the properties in here, we see that this is the file name and the extension. This is the file name.
And this is the dot IMG file. Right? So it could have been like stored somewhere else as well, or it could be accessed from the EFC file, which we'll have to look for. But this should get the job done. Because I can access it from anywhere the application, right. And then from the arguments standpoint, we provide the file path for the This file that we want to open.
So it's gonna be and slash sample one dot x, x. Right. So that's how it is. And still, we need to have a selector because it needs to have, you know, what sort of applications kind of open. So what we'll do is just for now, you know, just to make it work, we're gonna add a XML selector, right to show that it's going to be an Excel dot exe file, but is the application. We don't want this book one in here, right?
So we're omitting that from the selector, but it's going to be an Excel file. And that way we have passed the file name, which is still correct, which is the Excel dot txt file. It automatically took the Excel dot txt or it could have been access to From the AMA file as I was showing, and we are opening the argument, which is the sample one dot XLS x, right? So now if we run this, then within one activity itself, right, we are sort of accomplishing task for three or four separate activities, right there. So they have the file open, because the argument was passed. And I mean, in, you know, using these parameters could come handy.
There might be a real challenge when we're using the virtual automation because, you know, when you're passing the filename and the arguments, there's a possibility that the bot might be looking for the file name or the application directory tree in the local system. So one way to, you know, remediate these sort of issues or to escape these issues is always whenever you're working on your virtual machine, maximize it Get to the full screen view mode, right? That way, the chances that all the input that you will be giving or the bottom will be giving will be reaching to the virtual machine and will not be operating on the local desktop. Right. Okay. All right.
So next our output activities, just like we have provided input to the virtual environments, which are treated as images by the bot, we need to extract data as output from them as well. Due to the operation on images, behavior of these activities is a bit different. There are two most commonly used output activities in the automation of virtual environments. So the first one is select and copy. Right. Let's show it here.
There it is select and copy activity right. It is the simplest output activity that copies data from the currently active feed. Within the virtual environment, the only limitation is that this activity works for selectable text only, like within the text boxes, text areas, table cells, etc. So any editable text field, it can select and copy the data from it. So, let's try, we have the virtual environment in front of us. And we want to let's say, get the data from this staff number.
Something is written in here, right? So first and foremost, you also have to, you know, understand, let's say how the behavior of any application is because if you understand the behavior of an application, it can actually help you in efficiently creating the bar and automating the process. So let's say if you click on any like on this window, and you make the application To right, then automatically, in the first text field itself, our cursor is going to be there, right? You click anywhere, it doesn't matter because the cursor is going to be in the first text field. So, imagine you have some data in here, right? But you want to first make the application active.
So the first and foremost activity that you're going to do is you're going to click on this sign in or click anywhere so that it becomes active, right? It could be anywhere doesn't matter. Once you did that, then you use select and copy because now your cursor is on the first text area is simply selecting copy and you send the whole thing to the virtual environment and you save an exit. You see, it also got selected and everything. So that means like when once we are doing these activities once we are selecting that in the recorder as well. The actual operation is happening on the screen.
You can see See that. So save and exit, right and a recording sequence is generated. You just open it up and you see that we have the container to attach to the virtual environment window, then we have a click image because we want to activate the ER, we want to make the application active, or as a foreground process is the correct terminology. And then you type in something into it like, you know, like Ctrl, a Ctrl, C sort of thing to a shortcut, which again, so even UiPath uses shortcuts to do a lot of its operations. So it's doing Ctrl A or something like that. It's selecting all the data, and it's copying the selected text using this.
And that's called selected text which will be copied is going to be put into this output variable copy selected text. So to see if that works fine or not, we simply Bring this data out. And doesn't matter, we can put anything we want in here. And once we run this, right, it's first going to active make the make sure that the application is active by clicking on the sign in, it copied the whole thing. And it showed us the data. So it always works for all the editable text fields.
That's the only limitation. But other than that, if you are working on a sophisticated application, most of the times you will be working on these sort of text fields itself. So and its reliability is pretty good. Because you're not depending on any OCR or any scraping methods, or anything, you know, like that, that could be changed like in terms of formatting in terms of, you know, the overall look and feel of the application. Right. So that's first output activity.
The second one is scrape relative So, similar to screen scraping, scrape relative provides the option to scrape data relative to an image provided to it. image that is used as a relative point to scrape data is also called anchor. We'll be using this term in further chapters. So anchor serves as the relative point for any place where we need to do where we need to do any extraction or any scraping. Like you know, for an example, take this option number, which is a label is going to serve as an anchor to this text field. Right.
So because you know, we may need we do need to have some relative elements so that the scraping or any input action, any interaction with the application UI elements needs to happen at the correct place. So a relative element selecting a relative element is really, really important and we're going to use that a lot of times Right. So scrape relative, as the name itself suggests is going to scrape relative to the data. So suppose we have this application open. And in here, we have a product code something. Right, we submit the order.
And once you submit the order, we get our order confirmation, we want this order confirmation number, right. So as you see this, please take note of your order reference and these colons, this whole text serves as an anchor point as the relative element. And the 694 is the number that we need to have in our workflow. Right that can be probably saved later into an Excel sheet can be printed out, displayed, sent to a queue, there could be a lot of other stuff that can be done. So what we do is we do scrape relative, right, and we select this whole image and we're gonna select a little more because there Maybe it's possible to case that, you know, the confirmation number would be a four or five digits or something. So just to be on the safer side, we're doing this right and we want to indicate the relative region.
This is the relative region that you take something like this. Right? Okay. Once you've selected that automatically screen scraper wizard is shown. And it shows nothing as of now. So because the full text and native methods are not suitable for image automation, so we use OCR, right?
If we refresh it, we see the value of 694. But there may be a possible case that you know, I want to increase a few more parameters because I think there may be a good case that if we invert the colors, and we use the scaling options to get a better resolution, I think this resolution is not as clean as it should be. I mean, just personally opinion and based on the practice that I have done so far on the other automation projects, and these things you learn when you, you know, play around with the tool. So I think this is also giving pretty much the same result. So we go and finish this activity, right, and we save and exit, the recording sequence is generated. Let's go into it.
So now we see that we have in a batch window, which is going to be the container at the root level, it's going to find a relative image, right, and which is going to be this whole image, it's going to translate this clipping region. Then it's going to get the OCR text from it using the Google OCR with a scale of three and with the invert option is checked. We again reset the clipping up region and we in the GED OCR text activity itself, we got the output STV Remote Desktop class anywhere doesn't matter Okay, so let's give it a shot and see if this works or not. We'll describe the activities that have been used in detail just now. We're getting 694 Let's try this again to see if this is gonna work in every case or not right. So, Product Code could be anything to a four let's run this there is a possibility that you know, because of the OCR methods, the data might not be read correctly, this in this case it does, but there is a possibility.
So, always remember, OCR was you know, like, the least reliable of all the three output methods. So that factors does come to play in many cases. Right. Okay. So this seems to be working fine, right. We are able to scrape the data from based on an anchor point and we are getting the correct result in a text variable right.
So, in this just to walk through it, the find image activity which is this one, it finds the anchor image relative to the parent relative to which data needs to be scraped. And this translate clipping region gets the region with keep scraping operation needs to take place, right it provides like which size it needs to take care of to you know, out of this image, which part needs to be focused upon to perform any operation. That's what translating clipping region does. And then we use the get OCR text on that region, so that we get the text out of it. And this is the engine that we are using with the properties set to whatever we want. And then we are printing the data out Simply that variable and we are resetting the clipping region by putting the sizes 000 these are automatically generated, but just so, you know, you know how this works.
This is how this is how the whole scraping relative data works like. A quick note to to take is that, you know all activities of scrape relative action are listed inside a container attaching to the virtual environment image to check whether the generated output is correct or not, you know you could have used instead of message box writeline that way your execution wouldn't have paused and it could have been just finished in once in cocoa and you can always check that from the output panel. Okay. All right.