Home / Nerdio Academy / Nerdio / Nerdio Fundamentals: Logging

Nerdio Fundamentals: Logging

0 commentsJune 04, 2019Videos

Joseph Landes
In this session, we are going to talk about how logging works in Nerdio for Azure. We’ll look at management and tasks and go deep into the caching functionality in the product, including things like intelligent caching and caching of shared mailboxes. Enjoy the session.

Vadim Vladimirskiy
The first thing I want to cover today are logs and how to look at logs, how to view errors, where things are, where things are stored, et cetera. So let’s start there. We spent some time the other day talking about the various roles in Nerdio. Hopefully everyone has watched that video, but just to remind, quick reminder, we are currently logged in as a partner administrator that has the ability to add new accounts and manage existing customer accounts, and logging is aggregated according to the roles. So there is a logs menu here in under navigation on the left hand side. There are three types of logs available at the partner admin level, provisioning, management and desktop logins, and that’s going to be an aggregation of all of the accounts that are visible or that are assigned to that particular partner admin account. If we go on their provisioning tasks, we should see the various provisioning tasks that are relevant to this particular partner account.

Vadim Vladimirskiy
Then we can search provisioning tasks by account ID or by the type of job, so for instance, if we searched for 5,009 we should see here’s when the 5,009 was provisioned. If we searched for provisioning, then we should be able to find all the provisioning jobs for the three accounts that are in the scope of this particular user. Okay, so I think that’s pretty simple. Then you have the same thing going on with management tasks, and we’ll talk about those in a lot of detail of briefly and then desktop logins, similar story. We have desktop logins information that is date filtered. So for instance, if we wanted to see all the logins in December, we would look at it like this, and it will show me logins from all the various accounts. If I wanted to narrow this down to a particular account, then obviously I would just do it like that. Okay? So again, pretty pretty simple stuff.

Vadim Vladimirskiy
Okay. So now let’s log into account 5,009 and take a look at management tasks. Again, provisioning tasks and desktop logs, one thing on the provisioning task, there’s a little button here actually expands the sub tasks. There isn’t much additional detail available from here, and then the time started is when the task started, so basically the time started of the task and the time started of the subsequent tasks minus the time started of the previous task gives you how long this took, right? So as soon as this one is done, this one starts, so we know this one took five minutes. Okay, that’s the logins. Again, not much here. We can filter by username, by account ID, by date, or by the description, which is generally the the host or the collection or the VDI machine that the user logs into, so that’s what desktop logins are. Desktop logins are pulled in from the SMS passcode database. Every authentication for a desktop login, whether it’s VDI, RDS session host, or an RDS collection triggers an event inside of the SMS passcode database, whether or not that user has two factor authentication enabled. So even if they don’t have it enabled, what will happen is that they’re going to go through that database and the desktop login will be pulled from that database.

Vadim Vladimirskiy
All right, let’s take a look at management tasks. Management task is where most of the activity is. So here you can see all the various tasks that have been done. You can see who triggered them, so the user that triggered them is listed in this column. You can see what the task was, the description of the task, a category by which you can sort. Then you can obviously filter by date. You can filter by status, complete, in progress, pending or an error task. Then you can also filter by individual users, descriptions, kind of a more of a free search type of way limit it. If I wanted to look for instance, for anything that was done by Nerdio admin at nerdio.com, I would do it like that, and here are all the tasks that that user did. Okay? In addition, we have these two columns on the left that are useful sometimes. When the task is complete, there obviously shouldn’t be anything under errors, so clicking on the errors was going to be empty. The message generally is success for any tasks that is complete. For any task that is pending or in progress, again, also there won’t be much useful information under message or error. Those are going to be empty for the most part. There should be no pending tasks now either.

Vadim Vladimirskiy
The error tasks is when this comes in handy. So these are the two different places where an error can show up. So there’s two types of errors. Sometimes when you click on the message, it will just tell you an error has occurred, which means there’s likely going to be more detail under the errors tab or under the errors links. If we click here, it’s actually going to give us the underlying error that generated that that process, so again, this case, cannot process command because one of the more missing mandatory parameters. Again, not sure exactly what this one was all about, but you can see more detail on their errors. Sometimes when you perform a task in NAP, there will be a little I next to the error right here, so instead of a checkbox, it’s going to say error, and there’s going to be a little tool tip, a letter I that you can mouse over, and it’s going to give you a message. For example, some very common situations will get an I next to them. For instance, if someone tries to add a user and that user already exists in Azure ID, by default, NAP will do the validation to make sure that it’s not going to conflict, and then it’s going to error out, but it’s going to have a little I that you can mouse over, and it will tell you what the error message is.

Vadim Vladimirskiy
It’s a pre-programmed error message to make things easier for the administrator. In many cases it’s not kind of an expected error message, may be not a validation but something that failed on the back end. Maybe Azure timed out, maybe something in active directory didn’t respond as expected, in which case it’s just going to say error without the little I, but always going to management tasks and looking at error states here will allow you to view the error message. So the the I error message is going to be under view, and then anything else that just says error has occurred that’s a generic message is going to be under the error popup so you can see what’s happening. Okay?

Vadim Vladimirskiy
A couple other things about the management tasks. What you’re seeing here are tasks that were triggered either by users or by things like Auto Scale. Okay, again, it will have no username, which means it was triggered by the system. In this case it was triggered by Auto Scale, and there no user associated with it. Now, there’s a whole bunch of other tasks that we’re not seeing here, things that run in the background. The way you can unhide them is by clicking on this little settings gear on top and say “display automated tasks” and this will reset itself every time that you refresh the page, so make sure to turn it on. It’s always off by default. What this does is it unhides the various tasks that run on a schedule. There are a number of sort of background refreshes that happen in the system, and we’ll talk about some of these caches when we talk about users and groups and active directory, but there are tasks that run in the background that are not visible by folders because they would otherwise just be appearing in list too often, and there’s no reason for an administrator to see them. But if you want to see what’s going on, you would just unhide them and then look at them. If there is an error, you can click on the errors link and see exactly what caused that task to fail. Okay?

Vadim Vladimirskiy
Let me mention a couple of other things about pending and in progress. There are two categories, so the entire NAP concept for the most part, I would say 90% plus everything, every action you do in NAP gets submitted as a task to be executed asynchronously, right? So when you create a new user, it’s actually going to package that request up and it’s going to submit it as a task. That task will go into active directory going into Azure ID, go into RDS and do everything as part of that task. You can track it under management tasks, what the status is, what it’s doing right now. There are certain tasks that are blocking and certain tasks that are not blocking. For example, if you are adding a user, and that task is in progress, that means you will not be able to perform any other actions on that user object while that task is running, right? So that kind of makes sense. Sometimes when there is a UI component involved, we actually block the UI components.

Vadim Vladimirskiy
If I were to go in here and I would say, I don’t know, let’s do reset RDS session in this one. I say confirm, so it actually locks this UI component while this task starts out as pending, and when it gets picked up it goes into in progress. So you’ve got picked up, or it will get picked up in a second, it will go to in progress, and then only when it’s complete does the role get unlocked. However, there are certain tasks that don’t have a UI component associated with them, and when they go into either pending or in progress status, until they’re done, certain other things can’t happen. An example would be if there is an active directory cache refresh that’s running in the background, then when new users are added, those new users cannot be completed, that task cannot run until the AD cache update process is done, so therefore a lot of times people will ask, “Hey, I submitted this thing and it’s taking very long time to run.” It could be that the task itself is taking a while to run, but that’s pretty unlikely. Usually once something starts, it’s sort of consistent of how long it takes, right?

Vadim Vladimirskiy
If you do a password reset and it normally takes one to two minutes, and in this case it’s taking 10 minutes, it’s not that the password reset task is taking 10 minutes, it’s likely that there is something else running that’s preventing that task from starting and it’s sitting in pending status. Okay? The way you would check for that is you would go under logs, you would go under management tasks, you would say show me everything that’s currently in progress, and by looking at this list you would see how okay, I see something else is running, and whatever else I submitted to happen must be dependent on it, which is why it hasn’t started yet. Now, a lot of times you won’t see anything here and you’ll have to go up here and unhide the task and see maybe it’s one of the background scheduled tasks that are currently in progress and they are blocking the execution of something else. Again, whenever something takes longer than expected, look for blocking tasks that may be happening at the same time, preventing whatever it is you’re trying to do from moving forward.

Vadim Vladimirskiy
Then let’s talk about caches and such. So the way NAP is designed, it’s really an aggregation of a bunch of metadata from other systems. So for instance, when you go to the user’s page, you will see that at the top of the user’s field there is a date stamp, right? This is the date stamp of the user list that you’re seeing here. This user list is stored in the sequel database that NAP is running on top of, and it has information about everything related to the users. Obviously you know the username and licensing information and desktop assignment and whether the account is locked or unlocked, what group membership it’s a part of. It also has information about mailboxes, who’s forwarding to it, who it’s forwarding to. Everything about that user object is stored in the database. Now, the information does not come from Nap, right? It pulls that information in from somewhere else, but if we were to do a on demand refresh of that information, every time you open the user’s screen, you can imagine that will take multiple minutes and depending on the number of users, it could be even longer than that.

Vadim Vladimirskiy
Every time you click on users, if you were pulling up the live data from all the various systems and kind of correlating and putting it all together and then displaying it, it will take a very long time. So primarily for performance reasons, we decided that we are going to be caching this data in a database. So when you click on the user’s list, it’s actually querying a local database that has a cache of that data. The date of that cache is listed up here, and then this cache gets updated under certain conditions. Number one, if no one is doing anything and the system gets updated every three hours, and that’s what these automated tasks are. So you’ll see automated tasks such as, let’s look at a few. So you have something called update performance monitoring data, submit Auto Scale schedule, refresh security settings, update security score, update share mailboxes, update active directory, accounts statistics.

Vadim Vladimirskiy
So every part of the system that you’ll see has a date stamp, has an associated background scheduled tasks that runs on a regular basis, typically once every three hours, some of them every hour. We have some intelligence that if let’s say your cache is more than x, and let’s say more than an hour long, and you come to the homepage, it will be smart enough to start refreshing for you automatically. There’s all kinds of algorithms in the back that figure out when to update it, but the important thing to remember is you’re almost always looking at the cache date and not live data. So again, scheduled task is one thing that updates the cache. The other thing that updates the cache is anytime you modify anything that the cache contains. So for instance, if I were to go in here and change the name of the user, what would happen is this change would be submitted to active directory right away, so it would be a real time a modification, but then the NAP would automatically trigger a refresh of that information to actually reflect it when the user list comes up. So any changes inside of NAP will intelligently refresh the cache when those changes are made.

Vadim Vladimirskiy
Okay, so why am I telling you all this? The reason I’m telling you this is that if a change occurs outside of NAP and it hasn’t been long enough for the NAP to automatically refresh the cache on a scheduled basis, the change won’t be reflected in the screen. What are some examples? An example is we have this account lock out settings. You have this button, you can actually click unlock user, it will unlock the user. A red icon with a closed lock will indicate the user is locked versus this gray one indicates that user is online. Now what if the user actually goes into AV right now and just hits the password too many times and correctly and then gets locked? Will the NAP know about that right away? And the answer is it will not. Right? So if you go in immediately into the NAP, you will not see that the user is locked until the list refreshes was. It looks like it’s refreshing right now, which can again happen on a scheduled basis or you can click the refresh button, which is these spinning arrows right here. You can come into the screen, hit the spinning arrows, that will refresh the cache. The new date and timestamp will show up, and then the latest status of that user will then be updated.

Vadim Vladimirskiy
Also, if someone goes into active directory on the DCO1 and makes changes, let’s say to the active directory username, or maybe someone goes into Office 365 and removes the office license, all of that will be pulled into the NAP, but only when the cache refreshes. Okay, so right now let’s illustrate this. You have a refresh process running. We go on management tasks and you see nothing is in progress. In progress. See, nothing is running. But if we unhide these tasks, you can see that there is that thing that just started, and it looks like it was started by the system rather than manually kicked off by a specific user, because otherwise there would be a username in this field right here. So users, groups, shared mailboxes, performance statistics, licensing and login data, all of those things are cached. How can you know? Whenever there is a date stamp at the top of the section and the refresh button somewhere, that means that information is cache. Some information is not.

Vadim Vladimirskiy
So how did we decide what information for cache and what information not to cache? Remember the goal of caching was primarily to improve performance and responsiveness. For screens that either don’t take a long time to refresh or are seldom used, they’re kind of not used on a regular basis where a few extra seconds to pull the realtime information is not a big deal, we decided to leave those as real time. So servers is an example of such a screen. So when you click on servers, what’s actually happening here is we’re pulling in real time information via Azure API. That’s why this is taking a few seconds. You will see this will update in just the little bit, and you’ll see the status of each server, whether the server’s stopped or running. Then you can also tell that there isn’t a date stamp next to the server. Okay? Usually that doesn’t take that long. Let’s go there again. See what happens. Hm. Interesting.

Vadim Vladimirskiy
So let’s give this a few seconds to see that come up. This section right here, we haven’t spoken about RDS collections in depth yet. We’ll have a whole session on it, but this status and all of these little spinning arrows will take a while. Those take probably a good minute if not more, because in addition to just pulling up the status of each individual VM inside, this is pulling in the information about who’s logged in on which server, whether the user is active or disconnected or or inactive, et cetera. So this one is intentionally sort of refreshing for awhile, so you will see this is going to continue running even though all of these guys are are now pulled. Then the reason that took a while is most likely because of this one. The status here, when it’s green, like that means that it’s on. If it’s the same icon but it’s red, it means that it’s off. When it’s a triangle like this, it means that either it couldn’t pull in the status from Azure successfully, or the status of the VM is that it stopped through the operating system, but it’s still not deallocated in Azure, so you’re still continuing to be charged for it.

Vadim Vladimirskiy
Okay. Networking is a similar thing, so if we go on their firewall, this is not cache information. This is going to be real time, so interacting in the firewall and the VPN and the backup screen, most of those are real time because they’re not used that often, not on a daily basis. It’s only something you do when you set it up initially. So again, see there’s no date stamp, there’s no refresh button. The way you would refresh it is just reload the page, and then you see all the information in real time. So any changes made to the firewall in Azure directly, as soon as I go to the firewall screen in NAP, it will appear immediately rather than waiting for that refresh interval. Shared mailboxes are also cached. Some of the things that we’ll talk about on the user’s configuration is you can do things like forward email or monitor email and, and the NAP will actually show you like a little arrow that you can mouse over, and you can see who is monitoring in particular user’s email.

Vadim Vladimirskiy
In order to get all of that information, you can imagine, if any IT admin is a monitoring some other user’s email, the NAP has to pull in all the mailboxes, all the forwarding rules or email permission assignments, and then basically figure out who’s got access to whom. That’s why in order for that arrow to show up, the NAP needs to pull in the list of everything that’s involved in displaying that information. In large environments, for instance, the mailbox information is the thing that takes the longest. We specifically split it out into a separate task, and we made it non blocking because in large environments with hundreds of users, it could take half an hour, it could take an hour to pull in all of the exchange online information and then correlate it all and figure out who’s got access to whose mailbox. We don’t want that to be blocking any other activities from happening, whereas initially it was actually doing that, right? So whenever someone reports, “Hey, I did something very simple and it took longer than anticipated to run,” we do a root cause analysis to figure out, okay, what was running at the same time, and whether that, which was running was justifiably blocking other things, or whether it could run in parallel with other things.

Videos in the series