Tuesday, July 7, 2009

Are QA Teams are responsible for delaying the product release?

Are QA Teams are responsible for delaying the product release?

I had some experience to delay the product release just before the release date. I found one show stopper issue before the release date and I was very happy to that I delayed product release and achieved something great like that.
Development team was working on day/night to solve the issue, and we were thinking that we have defeated the Dev team and it’s their responsibilities to fix it.
Before the postmortem meeting suddenly one big question banged on my head and I realized that it was my responsibility to find the Sev-1/2 issue at the early stages of the testing cycle, why/ how did you miss it. Then we started to analysis to find the root causes and took the corrective measures for future testing
And I realized that its QA responsibility to release the product on time with quality.

Friday, June 12, 2009

Network Performance identification and Tuning:

Network Performance identification and Tuning:

Useful Commands to find latency, bandwidth:
Netstat
Ping
Tracert
Pathping
Tools: - Wireshark (ethereal)
Using wireshark, we can find out which protocol the application being using.
Important Perfmon Counters:
Network Interface\Bytes Received/Sec
Network Interface\Bytes Sent/sec
Network Interface\Bytes Total/sec
Network Interface\Current Bandwidth
Network Interface\Output Queue Length

If the Network Interface\Bytes Total/sec is more than 50 percent of the total network utilization, then your server is having some problems under peak load conditions.
Correlate the network counter values with Physical Disk\% Disk Time and Processor\% Processor Time utilization. If the Disk\% disk time and %processor time values are low and the network queues are high then there might be a problem with your network card/ bandwidth.

Few TCP Stack Parameters Tuning:
1) RFC 1323

HKEY_LOCAL_MACHINE\Comm\Tcpip\Parms

Create new Dword Tcp1323Opts and enter the Value = 3
Description: This parameter controls the use of RFC 1323 Timestamp and Window Scale TCP options. Explicit settings for timestamps and window scaling are manipulated with flag bits. Bit 0 controls window scaling, and bit 1 controls timestamps.
2) Maximum Transmission Unit size
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\
Parameters\Interfaces\adapter ID MTU & value is

MTU size dictates the maximum packet size (in bytes) that the TCP/IP transport will attempt to transmit over the underlying network.
How to find MTU Value:
You can check the MTU size for your environment by opening a command prompt on your administration workstation and typingPING -f -i where MTUsize is the size you want to experiment with and gateway is the gateway of your network. Start out by using an MTU size of 1,454. Doing so will make the command look something like this:PING -F -I 1454 147.100.100.50When you enter the command, the PING will either be successful or it will fail with a message stating that the data must be fragmented. If you receive the error message, decrease the MTU value and keep trying until you find a value that works. If the PING command works the first time, try incrementing the value by five or 10 until you see the error message. You can then narrow down to the appropriate value.
HKEY_LOCAL_MACHINE\Comm\Tcpip\Parms à EnablePMTUDiscovery and value is 1

his value can be either 0 (False) or 1 (True) to indicate if TCP should perform Maximum Transmission Unit (MTU) discovery. Setting this value to 1 causes TCP to attempt to discover the MTU, or largest packet size, over the path to a remote host.

3) HKEY_LOCAL_MACHINE\Comm\Tcpip\ GlobalMaxTcpWindowSize and the value is
Valid Range: 0–0x3FFFFFFF (1073741823 decimal; however, values greater than 64 KB can only be achieved when connecting to other systems that support RFC 1323 window scaling, which is discussed in the TCP section of this article.)
Default: This parameter does not exist by default.
Description: The TcpWindowSize parameter can be used to set the receive window on a per-interface basis. This parameter can be used to set a global limit for the TCP window size on a system-wide basis.

4) HKEY_LOCAL_MACHINE\Comm\Tcpip\ TcpWindowSize and value is 8760
ValidRange: 0–0x3FFFFFFF (1073741823 decimal). In practice the TCP/IP stack will round the number set to the nearest multiple of maximum segment size (MSS). Values greater than 64 KB can be achieved only when connecting to other systems that support RFC 1323 Window Scaling, which is discussed in the "Transmission Control Protocol (TCP)" section of this article.
Default: The smaller of the following values:
0xFFFF
GlobalMaxTcpWindowSize (another registry parameter)
The larger of four times the MSS
16384 rounded up to an even multiple of the MSS
The default can start at 17520 for Ethernet, but may shrink slightly when the connection is established to another computer that supports extended TCP header options
Description: This parameter determines the maximum TCP receive window size offered. The receive window specifies the number of bytes that a sender can transmit without receiving an acknowledgment. In general, larger receive windows improve performance over high-delay, high-bandwidth networks. For greatest efficiency, the receive window should be an even multiple of the TCP Maximum Segment Size (MSS). This parameter is both a per-interface parameter and a global parameter, depending upon where the registry key is located. If there is a value for a specific interface, that value overrides the system-wide value. See also GobalMaxTcpWindowSize.

5) MaxUserPort
KLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters MaxUserPort and the value is 65,534

6) TcpTimedWaitDelay
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters à TcpTimedWaitDelay and the value is 60
7) Keeping a packet alive
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters à DefaultTTL and the value is 255

Friday, June 5, 2009

Response Time and Throughput




Response Time
The amount of time application Server takes to return the results of a client request. The response time is affected by factors such as network bandwidth, number of users, number and type of requests submitted and average think time.
Response time = Server Processing time + Think time + Network Time (Load runner does not include Client Processing time/rendering time)
How will you calculate Response time?
Response time: Processing time + Think time + Network Time + Client processing time.

Throughput
Throughput measures the amount of work performed by Application Server.
Application Server throughput is a function of many factors, including the nature and size of user requests, number of users, and performance of Application Server instances and back-end databases.

Throughput Vs Number of users. (Concurrent Users)
The number of users increases, throughput increases correspondingly. However, as the number of concurrent requests increases, server performance begins to saturate, and throughput begins to decline.
Ref Fig-A


Throughput and Response Time with Increasing Number of Users
Initially when the throughput increase, Response time also increase , After a certain point throughput are inversely proportional to response time.
As the number of users on the system increases, the response time starts to increase as well, even though the number of requests per minute declines
Ref : Fig -b

Perfmon Counter Ideal Values


Some of the key perfmon Counters and their acceptable values.

Monday, January 12, 2009

SQL Tuning Tips for performance

SQL Tuning Tips for performance:

1) Concatenation of Different Data Types
An SQL query involving concatenation of different datatypes takes more time to execute.

2) Usage of “WHERE” Instead of “HAVING”
Using a “where” clause in place of “having” is often effective, in group by statements.
The where clause is applied before selecting the data, whereas the having clause is
applied after data retrieval.

3) Position of Table with Fewer Rows in the “SELECT…FROM” Query
It is advisable to put the table that returns the fewest rows at the end of the from list.

4) Usage of “BETWEEN” in Place of Comparison Operators
If a query involves a range of values to be searched on, then usage of “between” is
advisable over the comparison operators.

5) Usage of Table Aliases
If more than one table is used in a query, then it is advisable to use table aliases, as they
would enhance the speed of the parse phase of the query.

5) Index
Index enables faster retrieval of data, but is an overhead when insertion, updating, and
deletion processes are involved.
The index is a separate structure attached to a table. This structure holds the indexed
column value and a pointer to the physical data.
Hence, any query which involves searching based on indexes would first access the index
structure and then would retrieve the data from the respective table.
But if the table contains more than 4 to 5 indexes, then the performance comes down.
The selectivity for index is determined by the ratio of unique values in a given column
to the total number of values.
If the value is nearer to 0, then usage of index is not advisable.
If the value is nearer to 1, then usage of index would enhance the performance of the
system.

6) Usage of “ORDER BY”
Avoid “ORDER BY” wherever possible as it works on output of the query and hence
involves double processing. The exception is if any of the columns to be sorted in DESC
order are wanted.


7 ) Resource Intensive Operations
Avoid using resource intensive operations like UNION, MINUS, DISTINCT, INTERSECT,
ORDER BY, and GROUP BY. DISTINCT uses one sort whereas other operators use
two sorts or more.

8) Usage of NULL
Null values are never stored in index structure.
Any query using the clause “IS NULL”, does not make use of index and does a FTS (Full
Table Scan), thereby taking more time to execute.


9) Usage of “EXISTS” and “NOT EXISTS” Clauses
Wherever it is possible,”EXISTS” or “NOT EXISTS” should be used. Using the
“EXISTS” clause may eliminate unnecessary table accesses.

Friday, January 9, 2009

Monitoring System Resources - Linux/Unix

What to monitor ?.

Some general performance expectations on any system.

Run Queues – A run queue should have no more than 1-3 threads queued per processor. For
example, a dual processor system should not have more than 6 threads in the run queue.
CPU Utilization – If a CPU is fully utilized, then the following balance of utilization should
be achieved.
Context Switches – The amount of context switches is directly relevant to CPU utilization.

1) vmstat
vmstat is a real-time performance monitoring tool. The vmstat command provides data that can be used to help find unusual system activity, such as high page faults or excessive context switches, and CPU usage.
Watch out for the main section
The cpu section reports the percentage of total CPU time in terms of user (us), system (sy), true idleness (id), and waiting for I/O completion (wa).

vmstat can be used to help find unusual system activity, such as high page faults or excessive context switches, that can lead to a degradation in system performance.You can monitor bi and bo for the transfer rate, and in for the interrupt rate. You can monitor swpd, si, and so to see whether the system is swapping.

2) Ps Command
To find out how the memory is used within a particular process, use ps for an overview of memory used per process:$ ps aux
The output of the ps aux command shows the total percentage of system memory that each process consumes, as well as its virtual memory footprint (VSZ) and the amount of physical memory that the process is currently using (RSS).

The RSS column provides the "resident set size" of a process; this is the amount of physical memory used by the process and a good indication of how much real memory a given process is using. The VSZ column details the total amount of memory being used by a process, including what is allocated for internal storage, but often swapped to disk. Both of these columns are common to the majority of ps variants.

3) IOSTAT

Iostat stands for input output statistics and it provides data about the input output devices such as disk, terminals, serial devices, and CPU, but we will use it here for disk-related data only.
By default, iostat generates two reports, one for CPU utilization and one for device utilization. You can use the –c option to get just the CPU report or the –d option to get just the device report
Syntax
The basic syntax is:
iostat -d -x interval count
-d : Display the device utilization report (d == disk)
-x : Display extended statistics including disk utilization
Option -- This may differ among operating system.
Interval -- Time period in seconds between two samples. iostat 5 will give data at each 5 seconds interval.
Count -- Number of times the data is needed. iostat 5 4 will give data at 5 seconds interval 4 times. If no count is specified, the samples continue indefinitely and must be terminated by pressing ^c. Commonly, the command is run without count, and samples are observed to get a feel of system state.

The values to look from the iostat output are:
The average service time (svctm)
Percentage of CPU time during which I/O requests were issued (%util)
See if a hard disk reports consistently high reads/writes (r/s and w/s)

Eg: iostat –d -x 1

4) NETSTAT

It also provides information about network routes and cumulative statistics for network interfaces, including the number of incoming and outgoing packets and the number of packet collisions.

netstat

Tuesday, January 6, 2009

Reliability Testing or Endurance Testing..

Definition:
Reliability testing– is the probability of failure-free operation of a computer program in a specified environment for a specified time.

Informally software reliability is about how well an application accurately provides without failure the services that were defined in the original specification. In addition to how long the application runs before failure, reliability engineering is about providing correct results and handling error detection and recovery in order to avoid failures.

The main intention is to find out the memory leak with the application, unexpected errors, any crash if we run it for long hours.

In real Business Day based scenarios cases usually application run for week /month/more. We need to simulate the application under workload like as a real case. That the reason we must do for reliability test. For a short duration we could not able to find the memory leak, unexpected behavior with the application.


The objective of Reliability test is to measure:

Memory leaks
JVM Heap
Concurrency
Will the application performance be consistent over time
Unhandled exceptions - crashes, hangs
Number of user Number of hits/sec
%CPU utilization on all serversMemory utilization on all server
GC collection (if any)

Steps:

To execute a scalability tests, follow the following steps1. Finalize the business scenarios and decide the Key Performance Indicators (KPI) for each operation, memory, CPU etc.
2. Verify the scenarios manually, load the volumes of data required.
3. Create the load scripts using any load testing tool; customize the scripts to make it robust.
4. Check the environment settings and ensure that all the setting is as per the load testing recommendation.
5. Create the load test scenario as with Business Day scenarios

Design the scenario for Business Day based scenarios to simulate real application usage cases like peak hours run (8 hrs) then idle time for 30 minutes then ramp up and run for another 8 hrs then idle time for 30 minutes
Continue this cycle around 8 -10 times.

6. Add all the performance counters required.
7. Execute the load test for pre-defined number of users, by slowly ramping up the users
8. Run the test for about 48 or more hours.9. Collect the results and analyze the results by comparing with the KPI / previous build and release results.

Results:
Plot the graph for private bytes
Put the all the performance monitor counters in the result sheet
Response time
Throughput

Main Observation:

Monitor Memory -RSS, VSZ on UNIX
Private Bytes and Virtual Bytes on Windows
Observe that all the transaction have been passed successfully.
(If you have more failed transaction there would be chance of application hangs, crashing.)

Tips:
--> Set granularity = 5 secs for all the counters.
-->Verify that the measurement scale setting to 1 for all the counters

Thanks
Senthil