Network applications suffer latencies or even downtimes, because of the network complexity (e.g. hardware, topology, middleware, software, from a LAN setup to the Internet, from one connection to billions). The latter implies large changes of data transmission performances with the passage of time. At one end of the data pipeline there are end users that consume data from an IT service perceiving a certain quality from it: basically, how much that app is responsive to provide requested data. The quality of service (QoS) assessment is business-critical for an IT provider and application monitoring becomes a key issue. Ideally, this kind of monitoring should be continuous and ubiquitous: at least, assessing QoS of IT services with a frequent sampling (several times per hour) from different locations (network nodes relatively far each other) is desirable. At the end of the day, thanks to a well designed application monitoring, IT providers are able to properly set up or maintain their software, to reallocate processing capacity moving virtual hardware resources and to eventually match SLAs with their customers. On a daily basis, we have to monitor several types of network applications: mainly, we count three of them. A first set of apps relies on standard communication protocols (known and open, e.g. http, json) and it is quite straightforward to design checks for monitoring their responses, in terms of availability, responsiveness and even their contents. A second set of apps are developed with proprietary communication protocols (e.g. custom socket programming): there are a huge amount of this kind of clients out there. In this case, the only chance to monitor them would be to ask the development of an API part with monitoring capabilities. That would lead to not negligible costs in terms of time and money, but especially that could solve just a particular case. There is a third and even a more severe scenario to face: virtualized applications which are served with remote desktop communication protocols through web browsers or receiver clients. In the latter scenario it is technically not possible to monitor the application in a traditional way: basically, a video streaming reaches end users and that is definitely not callable.
We were looking for a radically different approach, that could solve all of those three scenarios (open, proprietary and remote desktop protocols). We have decided to design a so-called visual synthetic monitoring approach. That means to reproduce transaction by transaction on graphical user interfaces interacting with them exactlty as a human would do by using mouse and keyboard. It is an approach based on computer vision, because it is firstly necessary to detect graphic elements on the screen in order then to interact with them. The other crucial aspect of the solution, we waere looking for, should have dealt with performance measurement: monitoring the visual aspect of a transaction means detecting if it is appeared on the screen and how long it takes. Now, working with video streams allows us to overcome, in an elegant manner, all the monitoring issues previously described.
Finally, the idea is to automate entire user interaction flows, synthetizing their transactions in so called test cases. Every execution of a test case reports if a certain transaction has been available or not and how much it has been responsive in terms of time. The continuous execution of a test (e.g. every 10 minutes) outcomes a stack of time series (one for every transaction) that is able to describe the performance trend of the defined user interaction flow: latency spikes and service downtimes are evident. A system administrator can conduct a first level of analysis and diagnostic: within a latency spike, it is possible to identify which transactions have suffered lower performances (more time has been needed to detect them) or even, in case of a gap in the chart, the transaction that has broken the test.