Global Navigation

Lineage

Homepage » Our Clients » Case Studies » Performance Testing
Performance Testing Tools : Testronic Labs

PERFORMANCE TESTING

WITH BORLAND® SILKPERFORMER®

Our client, the largest internet publisher in the Netherlands, was looking for an external consultant / company that could load test their newly built web environment.  After comparing proposals from several suppliers, they chose Testronic Labs for its flexible approach and ability to think along with the client.

TEST SCOPE

The client’s network infrastructure hosts one of the Netherlands’ top news sites and thus is a high availability site with around 17 million page hits per day. The site is split up into different news, photo and video sections.
Having upgraded their infrastructure and back-end code the client was looking for a controlled QA process for the new environment. They sought to measure the whole infrastructure’s performance and its graceful deterioration, whilst keeping track of the user experience.

TECHNICAL CHALLENGES

Several technical challenges had to be overcome to ensure a realistic and balanced performance test for this application.

Because of the high profile nature of this site it was redundantly hosted in 2 data centres, each having their high speed internet connection. Each environment in the data centres was identical in its setup consisting of multiple layers of load balancing and caching. Therefore, we had to ensure that the user scripts were realistically following these layers and not circumventing them or hard-linking to any of the resources.

One of the goals set was to ensure a load of 5 million hits in one test run, which would translate in roughly 30,000 concurrent users.  To achieve this kind of testing capacity a close cooperation had to be set up with the technical people at the client side and their hosting partners.

Project flow

Getting the test requirements clear is of paramount importance in a project of this size and so a thorough intake phase was essential. This phase gathered the right people around the table (and around noisy servers) to allow our performance consultants to ask the necessary questions and process the answers to guarantee a smooth start and the smooth proceeding of the entire project. As a result of this intake phase, a high level master test plan covering all the necessary activities was devised in order to successfully conclude the project. Not only steps to be taken are documented, also risks and contingencies are part of the document, avoiding any unpleasant surprises as much as possible during the flow of the project.

Defined in the master test plan are scripts that cover all important scenarios that are identified following interviews with technical staff. Scripts are given parameters to align with any variables like login credentials and randomisation to allow for a more evenly spread usage of the application.

Baseline runs for each script were made to verify the good workings of the script and get an optimal timing of each element for comparison later on. We picked out the front page as our main measure point because it returned in almost every users' journey through the site. This also appeared to be the most resource intense page as its loading time was by default higher than the other pages.

Detailed focus runs were done on both video and photo subsections of the site. Running with 1,000 and 1,500 concurrent users respectively, these were school examples of how performance should be. With 1.5 and 6.4 million http hits per hour we established that the site could hit the expected load even with heavy media usage.

All further runs were done on the news section of the site in order to achieve high peak loads such as ones that would occur when an exceptional news event would happen. 30,000 concurrent users being the end goal, initial runs were done with user increases up to 10,000 concurrent to establish that this could be achieved and sustained for a decent period of time. During this run the focus was also on the various add-systems in place, looking at degradation points and at what point they might need to be disabled to not encumber the site.

Then the final set of runs was initiated, increasing the amount of concurrent users up to 30,000. During the first try we encountered a configuration error with one of the caching layers malfunctioning, resulting in data being pulled directly from the media servers instead of the cache. The media servers held out until approximately 11,000 users were pulling media from them, which resulted in server side connection refusals.

During the subsequent runs we managed to get to between 15,000-30,000 concurrent users and http hits peaked as high as 26.6 million/hour before errors started to occur. Different tweaks of the network infrastructure were done to improve performance during the runs, resulting in a better performing and stable environment at the end of the runs.

RESULTS ANALYSIS

Results gathered during different test runs are analysed and cleaned. Results are then correlated with data gathered during monitoring of system resources that run in parallel to each scripted test run.  Following the analysis results, all relevant data are gathered in a full report. During this test we gathered about 5 Gigabytes of raw results data, filtering out the required data and condensate it to the focus points per test run was needed to keep an overview.

Some of the focus points were easily summarised, but the bottlenecks with the high-end test runs with 30,000 users were more difficult to spot. After analysing all the runs and the monitoring data the problem appeared to be in one of the caching layers. By running another simulation run we could validate this finding and make estimations on how to increase efficiency without the need to invest in more equipment.

The final report provided our client with an overview of their application’s behavior under different levels of load. But not only that, they also got a graceful degradation scheme on how they could keep their site at optimal performance under increasing load. Last but not least they got the confidence that their infrastructure could withstand a real load of 30,000 concurrent users without falling over and leaving their customers with a blank screen.

testplan-flowchart

Language selection

Footer