Page 1
Performance Capabilities Reference IBM i operating system Version 6.1 This document is intended for use by qualified performance related programmers or analysts from IBM, IBM Business Partners and IBM customers using the IBM Power running IBM i operating system. Information in this document may be readily shared with IBM i customers to understand the performance and tuning factors in IBM i operating system 6.1 and earlier where applicable.
Page 2
Note! Before using this information, be sure to read the general information under “Special Notices.” Twenty Fifth Edition (January/April/October 2008) SC41-0607-13 This edition applies to IBM i operating System V6.1 running on IBM Power Systems. You can request a copy of this document by download from IBM i Center via the System i Internet site at: http://www.ibm.com/systems/i/ .
Page 7
15.3 Workloads ..............15.4 Comparing Performance Data .
Page 8
17.2.4 Virtual Ethernet Connections: 17.2.5 IXS/IXA IOP Resource: 17.3 System i memory rules of thumb for IXS/IXA and iSCSI attached servers....17.3.1 IXS and IXA attached servers: ..........17.3.2 iSCSI attached servers: .
Page 9
C.12.1 AS/400e Model 7xx Servers ..........C.12.2 Model 170 Servers C.13 AS/400e Model Sxx Servers...
Special Notices DISCLAIMER NOTICE Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. This information is presented along with general recommendations to assist the reader to have a better understanding of IBM(*) products. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed.
Page 11
Microsoft Corporation in the United States, other countries, or both. Intel, Intel Inside (logos), MMX and Pentium are trademarks of Intel Corporation in the United States, other countries, or both. Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Purpose of this Document The intent of this document is to help provide guidance in terms of IBM i operating system performance, capacity planning information, and tips to obtain optimal performance on IBM i operating system. This document is typically updated with each new release or more often if needed. This October 2008 edition of the IBM i V6.1 Performance Capabilities Reference Guide is an update to the April 2008 edition to reflect new product functions announced on October 7, 2008.
Chapter 1. Introduction IBM System i and IBM System p platforms unified the value of their servers into a single, powerful lineup of servers based on industry leading POWER6 processor technology with support for IBM i operating system (formerly known as i5/OS), IBM AIX and Linux for Power. Following along with this exciting unification are a number of naming changes to the formerly named i5/OS, now officially called IBM i operating system.
Chapter 2. iSeries and AS/400 RISC Server Model Performance Behavior 2.1 Overview iSeries and AS/400 servers are intended for use primarily in client/server or other non-interactive work environments such as batch, business intelligence, network computing etc. 5250-based interactive work can be run on these servers, but with limitations. With iSeries and AS/400 servers, interactive capacity can be increased with the purchase of additional interactive features.
This chapter is organized into the following sections: Server Model Behavior Server Model Differences Performance Highlights of New Model 7xx Servers Performance Highlights of Current Model 170 Servers Performance Highlights of Custom Server Models Additional Server Considerations Interactive Utilization Server Dynamic Tuning (SDT)
2.1.4 V5R2 and V5R1 There were several new iSeries 8xx and 270 server model additions in V5R1 and the i890 in V5R2. However, with the exception of the DSD models, the underlying server behavior did not change from V4R5. All 27x and 8xx models, including the new i890 utilize the same server behavior algorithm that was announced with the first 8xx models supported by V4R5.
The new server algorithm only applies to the new hardware available in V4R5 (2xx, 8xx and SBx models). The behavior of all other hardware, such as the 7xx models is unchanged (see section 2.2.3 Existing Model section for 7xx algorithm). 2.2.2 Choosing Between Similarly Rated Systems Sometimes it is necessary to choose between two systems that have similar CPW values but different processor megahertz (MHz) values or L2 cache sizes.
Page 19
grows at a rate which can eventually eliminate server/batch capacity and limit additional interactive growth. It is best for interactive workloads to execute below (less than) the knee of the curve. (However, for those models having the knee at 1/3 of the total interactive capacity, satisfactory performance can be achieved.) The following graph illustrates these points.
2.3 Server Model Differences Server models were designed for a client/server workload and to accommodate an interactive workload. When the interactive workload exceeds an interactive CPW threshold (the “knee of the curve”) the client/server processing performance of the system becomes increasingly impacted at an accelerating rate beyond the knee as interactive workload continues to build.
Page 21
Custom Server Model CPU Distribution vs. Interactive Utilization Available for Client/Server Fraction of Interactive CPW Applies to: AS/400e Custom Servers, AS/400e Mixed Mode Servers Figure 2.2. Custom Server Model behavior Server Model CPU Distribution vs. Interactive Utilization Available for Client/Server Knee 1/3 Int-CPW Fraction of Interactive CPW...
2.4 Performance Highlights of Model 7xx Servers 7xx models were designed to accommodate a mixture of traditional “green screen” applications and more intensive “server” environments. Interactive features may be upgraded if additional interactive capacity is required. This is similar to disk, memory, or other features. Each system is rated with a processor CPW which represents the relative performance (maximum capacity) of a processor feature running a commercial processing workload (CPW) in a client/server environment.
170, as 10 disk access arms is the maximum configuration. When the model 170 servers are running less than the published interactive workload, no Server Dynamic Tuning (SDT) is necessary to achieve balanced performance between interactive and client/server (batch) workloads.
The next chart shows the performance capacity of the current and previous Model 170 servers. Previous vs. Current AS/400e server 170 Performance 1200 1000 Previous * 2159 2160 2164 2176 Figure 2.5. Previous vs. Current Server 170 Performance 2.6 Performance Highlights of Custom Server Models Custom server models were available in releases V4R1 through V4R3.
and higher than normal CFINT values. The goal is to avoid exceeding the threshold (knee of the curve) value of interactive capacity. 2.8 Interactive Utilization When the interactive CPW utilization is beyond the knee of the curve, the following formulas can be used to determine the effective interactive utilization or the available/remaining client/server CPW.
Logic was added in V4R1 and is still in use today so customers could better control the impact of interactive work on their client/server performance. Note that with the new Model 170 servers (features 2289, 2290, 2291, 2292, 2385, 2386 and 2388) this logic only affects the server when interactive requirements exceed the published interactive capacity rating.
Page 27
If customers modify an IBM-supplied class description, they are responsible for ensuring the priority value is 35 or less after each new release or cumulative PTF package has been installed. One way to do this is to include the Change Class (CHGCLS) command in the system Start Up program. NOTE: Several IBM-supplied class descriptions already have RUNPTY values of 35 or less.
Page 28
Server Dynamic Tuning Recommendations On the new systems and mixed mode servers have the QDYNPTYSCD and QDYNPTYADJ system value set on. This preserves non-interactive capacities and the interactive response times will be dynamic beyond the knee regardless of the setting. Also set non-interactive class run priorities to less than 35. On earlier servers and 2/98 model 170 systems use your interactive requirements to determine the settings.
2.10 Managing Interactive Capacity Interactive/Server characteristics in the real world. Graphs and formulas listed thus far work perfectly, provided the workload on the system is highly regu lar and steady in nature. Of course, very few systems have workloads like that. The more typical case is a dynamic combination of transaction types, user activity, and batch activity.
Page 30
There are other means for determining interactive utilization. The easiest of these is the performance monitoring function of Management Central, which became available with V4R3. Management Central can provide: Graphical, real-time monitoring of interactive CPU utilization Creation of an alert threshold when an alert feature is turned on and the graph is highlighted Creation of an reverse threshold below which the highlights are turned off Multiple methods of handling the alert, from a console message to the execution of a command to the forwarding of the alert to another system.
Page 31
2. A similar effect can be found with index builds. If parallelism is enabled, index creation (CRTLF, Create Index, Open a file with MAINT(*REBUILD), or running a query that requires an index to be build) will be sent to service jobs that operate in non-interactive mode, but charge their work back to the job that requested the service.
2.11 Migration from Traditional Models This section describes a suggested methodology to determine which server model is appropriate to contain the interactive workload of a traditional model when a migration of a workload is occurring. It is assumed that the server model will have both interactive and client/server workloads. To get the same performance and response time, from a CPU perspective, the interactive CPU utilization of the current traditional model must be known.
Page 33
*********************************************************************************** Member . . . : Q960791030 Model/Serial . : 310-2043/10-0751D Main St... Library. . : PFR System name. . : TEST01 Version/Re.. Tns/hr Rsp/Tns 10:36 6,164 10:41 7,404 10:46 5,466 10:51 5,622 10:56 4,527 11:51 5,068 11:56 5,991 Itv End------Interval end time (hour and minute) Tns/hr-------Number of interactive transactions per hour Rsp/Tns-----Average interactive transaction response time ***********************************************************************************...
one third of the total possible interactive workload, for non-custom models. The equation shown in this section will migrate a traditional system to a server system and keep the interactive workload at or below the knee of the curve, that is, using less than two thirds of the total possible interactive workload. In some environments these equations will be too conservative.
2.13 iSeries for Domino and Dedicated Server for Domino Performance Behavior In preparation for future Domino releases which will provides support for DB2 files, the previous processing limitations associated with DSD models have been removed in i5/OS V5R3. In addition, a PTF is available for V5R2 which also removes the processing limitations for DSD models and allows full use of DB2.
Page 36
Domino-Complementary Processing Prior to V5R1, processing that did not spend the majority of its time in Domino code was considered non-Domino processing and was limited to approximately 10-15% of the system capacity. With V5R1, many applications that would previously have been treated as non-Domino may now be considered as Domino-complementary when they are used in conjunction with Domino.
Page 37
Similar to previous DSD performance behavior for interactive processing, the Interactive CPW rating of 0 allows for system administrative functions to be performed by a single interactive user. In practice, a single interactive user will be able to perform necessary administrative functions without constraint. If multiple interactive users are simultaneously active on the DSD, the Interactive CPW capacity will likely be exceeded and the response times of those users may significantly lengthen.
Page 38
processing present in the Linux logical partition, and all resources allocated to the Linux logical partition can essentially be used as though it were complementary processing. It is not necessary to proportionally increase the amount of Domino processing in the OS/400 logical partition to account for the fact that Domino processing is not present in the Linux logical partition .
Chapter 3. Batch Performance In a commercial environment, batch workloads tend to be I/O intensive rather than CPU intensive. The factors that affect batch throughput for a given batch application include the following: Memory (Pool size) CPU (processor speed) DASD (number and type) System tuning parameters Batch Workload Description The Batch Commercial Mix is a synthetic batch workload designed to represent multiple types of batch...
3.3 Tuning Parameters for Batch There are several system parameters that affect batch performance. The magnitude of the effect for each of them depends on the specific application and overall system characteristics. Some general information is provided here. Expert Cache Expert Cache did not have a significant effect on the Commercial Mix batch workload.
Page 41
improve performance by eliminating disk I/O operations. If communications lines are involved in the batch application, try to limit the number of communications I/Os by doing fewer (and perhaps larger) larger application sends and receives. Consider blocking data in the application. Try to place the application on the same system as the frequently accessed data.
Chapter 4. DB2 for i5/OS Performance This chapter provides a summary of the new performance features of DB2 for i5/OS on V6R1, V5R4 and V5R3, along with V5R2 highlights. Summaries of selected key topics on the performance of DB2 for i5/OS are provided.
Page 43
DB2 Multisystem tables New function available in V6R1 whose use may affect SQL performance are derived key indexes, decimal floating point data type support, and the select from insert statement. A derived key index can have an expression in place of a column name that can use built-in functions, user defined functions, or some other valid expression.
the statement is complete. The implementation to invoke the locking causes a physical DASD write to the journal for each record, which causes journal waits. Journal caching on allows the journal writes to accumulate in memory and have one DASD write per multiple journal entries, greatly reducing the journal wait time.
Table Expressions (RCTE) which allow for more elegant and better performing implementations of recursive processing. In addition, enhancements have been made in i5/OS V5R4 to the support for materialize query tables (MQTs) and partitioned table processing, which were both new in i5/OS V5R3. i5/OS V5R4 SQE Query Coverage The query dispatcher controls whether an SQL query will be routed to SQE or to CQE.
Enhancements to extend the use of materialized query tables (MQTs) were added in i5/OS V5R4. New supported function in MQT queries by the MQT matching algorithm are unions and partitioned tables, along with limited support for scalar subselects, UDFs and user defined table functions, RCTE, and some scalar functions.
Page 47
SQL queries which continue to be routed to CQE in i5/OS V5R3 have the following attributes: Sensitive cursor Like/Substring predicates LOB columns References to DDS logical files i5/OS V5R3 SQE Performance Enhancements Many enhancements were made in i5/OS V5R3 to enable faster query runtime and use less system resource.
Partitioned Table Support Table partitioning is a new feature introduced in i5/OS V5R3. The design is localized on an individual table basis rather than an entire library. The user specifies one or more fields which collectively act as a partitioning key. Next the records in the table are distributed into multiple disjoint sets based on the partitioning scheme used: either a system-supplied hashing function or a set of value ranges (such as dates by month or year) supplied by the user.
Statistical Strategies SMP Considerations Administration Examples (Adding a Partition, Dropping a Partition, etc.) Materialized Query Table Support The initial release of i5/OS V5R3 includes the Materialized Query Table (MQT) (also referred to as automatic summary tables or materialized views) support in UDB DB2 for i5/OS as essentially a technology preview.
Page 50
more information may be used in the query plan costing phase than was available to the optimizer previously. The optimizer may now use newly implemented database statistics to make more accurate decisions when choosing the query access plan. Also, the enhanced optimizer may more often select plans using hash tables and sorted partial result lists to hold partial query results during query processing, rather than selecting access plans which build temporary indexes.
Page 51
should be made to determine if the needed statistics are available. Also in environments where long running queries are run only one time, it may be beneficial to ensure that statistics are available prior to running the queries. Some properties of database column statistics are as follows: Column statistics occupy little storage, on average 8-12k per column.
SQE for V5R2 Summary Enhancements to DB2 for i5/OS, called SQE, were made in V5R2. The SQE enhancements are object oriented implementations of the SQE optimizer, the SQE query engine and the SQE database statistics. In V5R2 a subset of the read-only SQL queries will be optimized and run with the SQE enhancements. The effect of SQE on performance will vary by workload and configuration.
4.6 DB2 Symmetric Multiprocessing feature Introduction The DB2 SMP feature provides application transparent support for parallel query operations on a single tightly-coupled multiprocessor System i (shared memory and disk). In addition, the symmetric multiprocessing (SMP) feature provides additional query optimization algorithms for retrieving data. The database manager can automatically activate parallel query processing in order to engage one or more system processors to work simultaneously on a single query.
limit the amount of data it brings into and keeps in memory to a job’s share of memory. The amount of memory available to each job is inversely proportional to the number of active jobs in a memory pool. The memory-sharing algorithms discussed above provide balanced performance for all the jobs running in a memory pool.
Page 55
Allows customers to replace current programming methods of capturing and transmitting journal entries between systems with more efficient system programming methods. This can result in lower CPU consumption and increased throughput on the source system. Can significantly reduce the amount of time and effort required by customers to reconcile their source and target databases after a system failure.
There are 3 sets of tasks which do the SMAPP work. These tasks work in the background at low priority to minimize the impact of SMAPP on system performance. The tasks are as follows: JO_EVALUATE-TASK - Evaluates indexes, estimates rebuild time for an index, and may start or stop implicit journaling of an index.
Page 57
multiple nodes in the cluster, access to the database files is seamless and transparent to the applications and users that reference the database. To the users, the partitioned files still behave as though they were local to their system. The most important aspect of obtaining optimal performance with DB2 Multisystem is to plan ahead for what data should be partitioned and how it should be partitioned.
4.10 Referential Integrity In a database user environment, there are frequent cases where the data in one file is dependent upon the data in another file. Without support from the database management system, each application program that updates, deletes or adds new records to the files must contain code that enforces the data dependency rules between the files.
The following are performance tips to consider when using triggers support: Triggers are activated by an external call. The user needs to weigh the benefit of the trigger against the cost of the external call. If a trigger is going to be used, leave as much validation to the trigger program as possible. Avoid opening files in a trigger program under commitment control if the trigger program does not cause changes to commitable resources.
Page 60
To create the variable length field just described, use the following DB2 statement: CREATE TABLE library/table-name (field VARCHAR(50) ALLOCATE(20) NOT NULL) In this particular example the field was created with the NOT NULL option. The other two options are NULL and NOT NULL WITH DEFAULT. Refer to the NULLS section in the SQL Reference to determine which NULLS option would be best for your use.
01 DESCR. 49 DESCR-LEN 49 DESCRIPTION EXEC SQL FETCH C1 INTO DESCR END-EXEC. For more detail about the vary-length character string, refer to the SQL Programmer's Guide. The above point is also true when using a high-level language to insert values into a variable length field.
In contrast, when reuse is active, the database support will process the added record more like an update operation than an add operation. The database support will maintain a bit map to keep track of deleted records and to provide fast access to them. Before a record can be added, the database support must use the bit-map to find the next available deleted record space, read the page containing the deleted record entry into storage, and seize the deleted record to allow replacement with the added record.
2. The System i information center section on DB2 for i5/OS under Database and file systems has information on all aspects of DB2 for i5/OS including the section Monitor and Tune database under Administrative topics. This can be found at url: http://www.ibm.com/eserver/iseries/infocenter 3.
Chapter 5. Communications Performance There are many factors that affect System i performance in a communications environment. This chapter discusses some of the common factors and offers guidance on how to help achieve the best possible performance. Much of the information in this chapter was obtained as a result of analysis experience within the Rochester development laboratory.
Page 65
IBM’s Host Ethernet Adapter (HEA) integrated 2-Port 10/100/1000 Based-TX PCI-E IOA supports checksum offloading, 9000-byte jumbo frames (1 Gigabit only) and LSO - Large Send Offload (IPv4 only). These adapters do not require an IOP to be installed in conjunction with the IOA. Additionally, each physical port has 16 logical ports that may be assigned to other partitions and allows each partition to utilize the same physical port simultaneously with the following limitation: one logical port, per physical port, per partition.
181A IBM 2-Port 10/100/1000 Base-TX PCI-e 181B IBM 2-Port Gigabit Base-SX PCI-e 181C IBM 4-Port 10/100/1000 Base-TX PCI-e 1819 IBM 4-Port 10/100/1000 Base-TX PCI-e Virtual Ethernet Blade Notes: Unshielded Twisted Pair (UTP) card; uses copper wire cabling Uses fiber optics Custom Card Identification Number and System i Feature Code Virtual Ethernet enables you to establish communication via TCP/IP between logical partitions and can be used without any additional hardware or software.
Page 67
To demonstrate communications performance in various ways, several workload scenarios are analyzed. Each of these scenarios may be executed with regular nonsecure sockets or with secure SSL using the GSK API: 1. Request/Response (RR): The client and server send a specified amount of data back and forth over a connection that remains active.
Page 68
Virtual Ethernet 1 Disk Unit ASP on 2757 IOA 1 Session 2 Sessions 3 Sessions 5.4 TCP/IP non-secure performance In table 5.4 you will find the payload information for the different Ethernet types. The most important factor with streaming is to determine how much data can be transferred. The results are listed in bits and bytes per second.
Transaction Type Request/Response (RR) 128 Bytes Asym. Connect/Request/Response (ACRR) 8K Bytes Notes: Capacity metrics are provided for nonsecure transactions The table data reflects System i as a server (not a client) The data reflects Sockets and TCP/IP This is only a rough indicator for capacity planning. Actual results may differ significantly. All measurement where taken with Packet Trainer off (See 5.6 for line dependent performance enhancements) Here the results show the difference in performance for different Ethernet cards compared with Virtual Ethernet.
Page 70
Table 5.6 Transaction Type: Request/Response (RR) 128 Byte Asym. Connect/Request/Response (ACRR) 8K Bytes Large Transfer (Stream) 16K Bytes Notes: Capacity metrics are provided for nonsecure and each variation of security policy The table data reflects System i as a server (not a client) This is only a rough indicator for capacity planning.
Page 71
Table 5.7 Transaction Type: Request/Response (RR) 128 Byte Asym. Connect/Request/Response (ACRR) 8K Bytes Large Transfer (Stream) 16K Bytes Notes: Capacity metrics are provided for nonsecure and each variation of security policy The table data reflects System i as a server (not a client) This is only a rough indicator for capacity planning.
Transaction Type: Request/Response (RR) 128 Byte Asym. Connect/Request/Response (ACRR) 8K Bytes Large Transfer (Stream) 16K Bytes Notes: Capacity metrics are provided for nonsecure and each variation of security policy The table data reflects System i as a server (not a client) VPN measurements used transport mode, TDES, AES128 or RC4 with 128-bit key symmetric cipher and MD5 message digest with RSA public/private keys.
Page 73
For additional information regarding your Host Ethernet Adapter please see your specification manual and the Performance Management 1 Gigabit Jumbo frame Ethernet enables 12% greater throughput compared to normal frame 1 Gigabit Ethernet. This may vary significantly based on your system, network and workload attributes. Measured 1 Gigabit Jumbo Frame Ethernet throughput approached 1 Gigabit/sec The jumbo frame option requires 8992 Byte MTU support by all of the network components including switches, routers and bridges.
only a few seconds may perform best. Setting this value too low may result in extra error handling impacting system capacity. No single station can or is expected to use the full bandwidth of the LAN media. It offers up to the media's rated speed of aggregate capacity for the attached stations to share.
Page 75
there is network congestion or overruns to certain target system adapters, then increasing the value from the default=*NONE to 2 or something larger may improve performance. MAXLENRU for APPC on the mode description (MODD): If a value of *CALC is selected for the maximum SNA request/response unit (RU) the system will select an efficient size that is compatible with the frame size (on the LIND) that you choose.
• FTS is a less efficient way to transfer data. However, it offers built in data compression for line speeds less than a given threshold. In some configurations, it will compress data when using LAN; this significantly slows down LAN transfers. 5.8 HPR and Enterprise extender considerations Enterprise Extender is a protocol that allows the transmission of APPC data over IP only infrastructure.
5.9 Additional Information Extensive information can be found at the System i Information Center web site at: http://www.ibm.com/eserver/iseries/infocenter For network information select “Networking”: See “TCP/IP setup” See “Network communications” For application development select “Programming”: See “Communications” Information about Ethernet cards can be found at the IBM Systems Hardware Information Center. The link for this information center is located on the IBM Systems Information Centers Page at: http://publib.boulder.ibm.com/eserver See “Managing your server and devices”...
Chapter 6. Web Server and WebSphere Performance This section discusses System i performance information in Web serving and WebSphere environments. Specific products that are discussed include: HTTP Server (powered by Apache) (in section 6.1), PHP - Zend Core for i (6.2), WebSphere Application Server and WebSphere Application Server - Express (6.3), Web Facing (6.4), Host Access Transformation Services (6.5), System Application Server Instance (6.6), WebSphere Portal Server (6.7), WebSphere Commerce (6.8), WebSphere Commerce Payments (6.9), and Connect for iSeries (6.10).
Information source and disclaimer: The information in the sections that follow is based on performance measurements and analysis done in the internal IBM performance lab. The raw data is not provided here, but the highlights, general conclusions, and recommendations are included. Results listed here do not represent any particular customer environment.
Page 80
CGI: HTTP invokes a CGI program which builds a simple HTML page and serves it via the HTTP server. This CGI program can run in either a new or a named activation group. The CGI programs were compiled using a "named" activation group unless specified otherwise. Web Server Capacity Planning: Please use the IBM Systems Workload Estimator to do capacity planning for Web environments using the following workloads: Web Serving, WebSphere, WebFacing, WebSphere Portal Server, WebSphere Commerce.
Page 81
Table 6.1 i5/OS V5R4 Web Serving Relative Capacity - Static Page Transaction Type: Static Page - IFS Static Page - Local Cache Static Page - FRCA Notes/Disclaimers: Data assumes no access logging, no name server interactions, KeepAlive on, LiveLocalCache off Secure: 128-bit RC4 symmetric cipher and MD5 message digest with 1024-bit RSA public/private keys These results are relative to each other and do not scale with other environments Transactions using more complex programs or serving larger files will have lower capacities that what is listed here.
Page 82
Table 6.2 i5/OS V5R4 Web Serving Relative Capacity - CGI Transaction Type: CGI - New Activation CGI - Named Activation Notes/Disclaimers: Data assumes no access logging, no name server interactions, KeepAlive on, LiveLocalCache off Secure: 128-bit RC4 symmetric cipher and MD5 message digest with 1024-bit RSA public/private keys These results are relative to each other and do not scale with other environments Transactions using more complex programs or serving larger files will have lower capacities that what is listed here.
Page 83
Table 6.3 i5/OS V5R4 Web Serving Relative Capacity for Static (varied sizes) Transaction Type: KeepAlive Static Page - IFS 1.558 2.407 Static Page - Local Cache Static Page - FRCA 11.564 Notes/Disclaimers: These results are relative to each other and do not scale with other environments. IBM System i CPU features without an L2 cache will have lower web server capacities than the CPW value would indicate HTTP Server (powered by Apache) for i5/OS <- - - - - - - - KeepAlive on - - - - - - - - >...
Page 84
a. V5R4 provides similar Web server performance compared with V5R3 for most transactions (with similar hardware). In V5R4 there are opportunities to exploit improved CGI performance. More information can be found in the FAQ section of the HTTP server website http://www.ibm.com/servers/eserver/iseries/software/http/services/faq.html improve the performance of my CGI b.
Page 85
variable overhead of encryption/decryption, which is proportional to the number of bytes in the transaction. Note the capacity factors in the tables above comparing non-secure and secure serving. From Table 6.1, note that simple transactions (e.g., static page serving), the impact of secure serving is around 20%.
Page 86
11. HTTP and TCP/IP Configuration Tips: Information to assist with the configuration for TCP/IP and HTTP can be viewed at http://www.ibm.com/servers/eserver/iseries/software/http/ a. The number of HTTP server threads: The reason for having multiple server threads is that when one server is waiting for a disk or communications I/O to complete, a different server job can process another user's request.
Page 87
13. File System Considerations: Web serving performance varies significantly based on which file system is used. Each file system has different overheads and performance characteristics. Note that serving from the ROOT or QOPENSYS directories provide the best system capacity. If Web page development is done from another directory, consider copying the data to a higher-performing file system for production use.
6.2 PHP - Zend Core for i This section discusses the different performance aspects of running PHP transaction based applications using Zend Core for i, including DB access considerations, utilization of RPG program call, and the benefits of using Zend Platform. Zend Core for i Zend Core for i delivers a rapid development and production PHP foundation for applications using PHP running on i with IBM DB2 for i or MySQL databases.
Page 89
Throughput - Orders Per Minute (OPM). Each order actually consists of 10 web requests to complete the order. Order response time (RT) in milliseconds Total CPU - Total system processor utilization CPU Zend/AP - CPU for the Zend Core / Apache component. CPU DB - CPU for the DB component Database Access The following four methods were used to access the backend database for the DVD Store application.
Page 90
Conclusions: 1. The performance of each DB connection interface provides exceptional response time at very high throughput. Each order processed consisted of ten web requests. As a result, the capacity ranges from about 650 transactions per second up to about 870 transactions per second. Using Zend Platform will provide even higher performance (refer to the section on Zend Platform).
Page 91
Conclusions: 1. As stated earlier, persistent connections can dramatically improve overall performance. When using persistent connections for all transactions, the DB CPU utilization is significantly less than when using non-persistent connections. 2. For any transactions that run with autocommit turned on, use persistent connections. If the transaction requires that autocommit be turned off, use of non-persistent connections may be sufficient for pages that don’t have heavy usage.
Page 92
OS / DB Zend Version V2.5.2 Connect db2_pconnect db2_pconnect 5041 RT (ms) Total CPU CPU - Zend/AP CPU - DB Conclusions: 1. In both cases above, the overall system capacity improved significantly when using Zend Platform, by about 15-35% for this workload. With each order consisting of 10 web requests, processing 6795 orders per minute translates into about 1132 transactions per second.
6.3 WebSphere Application Server This section discusses System i performance information for the WebSphere Application Server, including WebSphere Application Server V6.1, WebSphere Application Server V6.0, WebSphere Application Server V5.0 and V5.1, and WebSphere Application Server Express V5.1. Historically, both WebSphere and i5/OS Java performance improve with each version. Note from the figures and data in this section that the most recent versions of WebSphere and/or i5/OS generally provides the best performance.
Page 94
because the improvements largely resulted from significant reductions in pathlength and CPU, environments that are constrained by other resources such as IO or memory may not show the same level of improvements seen here. Tuning changes in V6R1 As indicated above, most improvements will require no changes to an application. However, there are a few changes that will require some tuning in order to be realized: Using direct map (native JDBC) For System i, the JDBC interfaces run more efficiently if direct mapping of data is used, where the...
Page 95
For WebSphere 5.1 and earlier refer to the Performance Considerations guide at: www.ibm.com/servers/eserver/iseries/software/websphere/wsappserver/product/PerformanceConsideratio ns.html For WebSphere 5.1, 6.0 and 6.1 please refer to the following page and follow the appropriate link: www.ibm.com/software/webservers/appserv/was/library/ Although some capacity planning information is included in these documents, please use the IBM Systems Workload Estimator as the primary tool to size WebSphere environments.
Page 96
Trade 6 Benchmark (IBM Trade Performance Benchmark Sample for WebSphere Application Server) Description: Trade 6 is the fourth generation of the WebSphere end-to-end benchmark and performance sample application. The Trade benchmark is designed and developed to cover the significantly expanding programming model and performance technologies associated with WebSphere Application Server.
Page 97
The Trade 6 application allows a user, typically using a Web browser, to perform the following actions: Register to create a user profile, user ID/password and initial account balance Login to validate an already registered user Browse current stock price for a ticker symbol Purchase shares Sell shares from holdings Browse portfolio...
Page 98
WebSphere Application Server V6.1 Historically, new releases of WebSphere Application Server have offered improved performance and functionality over prior releases of WebSphere. WebSphere Application Server V6.1 is no exception. Furthermore, the availability of WebSphere Application Server V6.1 offers an entirely new opportunity for WebSphere customers.
Page 99
Trade3 Measurement Results: Tra d e o n S y s te m i - H is to ric a l V ie w Trade3-EJB V5R2 WAS 5.0 V5R3 WAS 5.1 V5R3 WAS 5.0 V5R3 WAS 6.0 (Trade6) WebSphere Application Server Trade Results Notes/Disclaimers: Trade3 chart: WebSphere 5.0 was measured on both V5R2 and V5R3 on a 4 way (LPAR) 825/2473 system...
Page 100
Trade Scalability Results: Trade on System i Scaling of Hardware and Software Trade 3 4000 3500 3000 2500 2000 1500 1000 JDBC V5R2 WAS 5.0 V5R2 WAS 5.1 V5R3 WAS 5.1 WebSphere Application Server Trade Results Notes/Disclaimers: Trade 3 chart: V5R2 - 890/2488 32-Way 1.3 GHz, V5R2 was measured with WebSphere 5.0 and WebSphere 5.1 V5R3 - 890/2488 32-Way 1.3 GHz, V5R3 was measured with WebSphere 5.1 POWER5 chart:...
Page 101
Primitive Name Description of Primitive PingHtml PingHtml is the most basic operation providing access to a simple "Hello World" page of static HTML. PingServlet PingServlet tests fundamental dynamic HTML creation through server side servlet processing. PingServletWriter PingServletWriter extends PingServlet by using a PrintWriter for formatted output vs. the output stream used by PingServlet.
Page 103
Accelerator for System i Coinciding with the release of i5/OS V5R4, IBM introduces new entry IBM System i models. The models introduce accelerator technologies and/or L3 cache in order to improve options for clients in the low-end server space. As an overview, the Accelerator for System i affects two 520 Models: (1) 600 CPW with no L3 cache and (2) 1200 CPW with L3 cache.
Page 104
Figure 6.6 provides insight into response time information regarding low-end System i models. There are two key concepts that are displayed in the data in Figure 6.6. The first is that Accelerator for System i models can provide substantially better response times than previous models for a single or many users. The 600 CPW accelerated to 3100 CPW reduces the response time by 5 times while the 1200 CPW accelerated to 3800 CPW reduces the response time by 2.5 times.
Page 105
Performance Considerations When Using WebSphere Transaction Processing (XA) In a general sense, a transaction is the execution of a set of related operations that must be completed together. This set of operations is referred to as a unit-of-work. A transaction is said to commit when it completes successfully.
Page 106
Restriction: You cannot benefit from the one-phase commit optimization in the following circumstances: If your application uses a reliability attribute other than assured persistent for its JMS messages. If your application uses Bean Managed Persistence (BMP) entity beans, or JDBC clients. Before you configure your system, ensure that you consider all of the components of your J2EE application that might be affected by one-phase commits.
6.4 IBM WebFacing The IBM WebFacing tool converts your 5250 application DDS display files, menu source, and help files into Java Servlets, JSPs, JavaBeans, and JavaScript to allow your application to run in either WebSphere Application Server V5 or V4. This is an easy way to bring your application to either the Internet, or the Intranet, both quickly and inexpensively.
Page 108
details on the number of I/O fields for each of these workloads. We ran the workloads on three separate machines (see table 6.5) to validate the performance characteristics with regard to CPW. In our running of the workloads, we tolerated only a 1.5 second server response time per panel. This value does not include the time it takes to render the image on the client system, but only the time it took the server to get the information to the client system.
Page 109
• (Advanced Edition Only) Struts-compliant code generated by the WebFacing Tool conversion process which sets the foundation for extending your Webfaced applications using struts-compliant action architecture • Automatic configuration for UTF-8 support when you deploy to WebSphere Application Server version 5.0 •...
Page 110
When set to an appropriate level for the Webfaced application, the Record Definition Cache can provide a decrease in memory usage, and slightly decreased processor usage. The number of record definitions that the cache will retain is set by an initialization parameter in the Webfaced application’s deployment descriptor (web.xml).
Page 111
To enable the servlet that will display the contents of the cache, first add the following segments to the Webfaced application’s web.xml. <servlet> <servlet-name>CacheDumper</servlet-name> <display-name>CacheDumper</display-name> <servlet-class>com.ibm.etools.iseries.webfacing.diags.CacheDumper</servlet-class> </servlet> <servlet-mapping> <servlet-name>CacheDumper</servlet-name> <url-pattern>/CacheDumper</url-pattern> </servlet-mapping> This servlet can then be invoked with a URL like: Then a Web page like that shown below will be displayed.
Page 112
Button Operation Reset Counters Resets the cache hit and miss counters back to 0. Set Limit Temporarily sets the cache limit to a new value. Setting the value lower than the current value will cause the cache to be cleared as well. Refresh Refresh the display of cache elements.
Page 113
Refer to the following table for the functionality provided by the Record Definition Loader servlet. Record Definition Loader Button operations Button Operation Infer from JSP This will cause the loader servlet to infer record Names definition names from the names or the JSP's contained in the RecordJsps directory.
Page 114
WebSphere Application Server. On System i servers, the recommended WebSphere application configuration is to run Apache as the web server and WebSphere Application Server as the application server. Therefore, it is recommended that you configure HTTP compression support in Apache. However, in certain instances HTTP compression configuration may be necessary using the Webfacing/WebSphere Application Server support.
Page 115
You also need to add the directive: SetOutputFilter DEFLATE to the container to be compressed, or globally if the compression can always be done. There is documentation on the Apache website on mod_deflate (http://httpd.apache.org/docs-2.0/mod/mod_deflate.html) that has information specific to setting up for compression.
6.5 WebSphere Host Access Transformation Services (HATS) WebSphere Host Access Transformation Services (HATS) gives you all the tools you need to quickly and easily extend your legacy applications to business partners, customers, and employees. HATS makes your 5250 applications available as HTML through the most popular Web browsers, while converting your host screens to a Web look and feel.
IBM Systems Workload Estimator for HATS The purpose of the IBM Systems Workload Estimator (WLE) is to provide a comprehensive System i sizing tool for new and existing customers interested in deploying new emerging workloads standalone or in combination with their current workloads. The Estimator recommends the model, processor, interactive feature, memory, and disk resources necessary for a mixed set of workloads.
6.7 WebSphere Portal The IBM WebSphere Portal suite of products enables companies to build a portal website serving the individual needs of their employees, business partners and customers. Users can sign on to the portal and view personalized web pages that provide access to the information, people and applications they need. This personalized, single point of access to resources reduces information overload, accelerates productivity and increases website usage.
6.9 WebSphere Commerce Payments Use the IBM Systems Workload Estimator to predict the capacities and resource requirements for WebSphere Commerce Payments. The Estimator allows you to predict a standalone WCP environment or a WCP environment associated with the buy visits from a WebSphere Commerce estimation. Work with your marketing representative to utilize this tool.
Page 123
of access mechanisms. Please see the Connect for iSeries white paper located at the following URL for more information on Connect for iSeries. http://www-1.ibm.com/servers/eserver/iseries/btob/connect/pdf/whtpaperv11.pdf “B2B New Order Request” Workload Description: This workload is driven by a program that runs on a client work station that simulates multiple Web users.
Page 124
Connector relative capacity: The different back-end connector types are meant to allow users a simple way to connect the Connect for iSeries product to their back-end application. Your choice in a connector type may be dictated by several factors. Clearly, one of these factors relate to your existing back-end application and the programming language it is written in.
Chapter 7. Java Performance Highlights: Introduction What’s new in V6R1 IBM Technology for Java (32-bit and 64-bit) Classic VM (64-bit) Determining Which JVM to Use Capacity Planning Tips and Techniques Resources 7.1 Introduction Beginning in V5R4, IBM began a transition to a new VM implementation for i5/OS, IBM Technology for Java, to replace the Classic VM.
option for Java applications which require large amounts of memory. The Classic VM remains available in V6R1, but future i5/OS releases are expected to support only IBM Technology for Java. The default VM in V6R1 is IBM Technology for Java 5.0, 32-bit. Other supported versions of IBM Technology for Java include 5.0 64-bit, 6.0 32-bit, and 6.0 64-bit.
On i5/OS, IBM Technology for Java runs in i5/OS Portable Application Solutions Environment (i5/OS PASE) with either a 32-bit (for the 32-bit VM) or 64-bit (for the 64-bit VM) environment. Due to sophisticated memory management, both the 32-bit and 64-bit VMs provide a significant reduction in memory requirements over the Classic VM for most applications.
Fortunately, it is not too difficult to come up with parameter values which will provide good performance. If you are moving an application from the Classic VM to IBM Technology for Java, you can use a tool like DMPJVM or verbose GC to determine how large the heap grows when running your application. This value can be used as the maximum heap size for 64-bit IBM Technology for Java;...
Page 129
performance, it pays to apply analysis and optimizations to the Java bytecodes, and the resulting machine code. One approach to optimizing Java bytecode involves analyzing the object code “ahead of time” – before it is actually running. This “ahead-of-time” (AOT) compiler technology was used exclusively by the original AS/400 Java Virtual Machine, whose success proved the power of such an approach.
applications with a large number of classes. Running CRTJVAPGM with OPTIMIZE(*INTERPRET) will create this program ahead of time, making the first startup faster. Garbage Collection Java uses Garbage Collection (GC) to automatically manage memory by cleaning up objects and memory when they are no longer in use.
display; rates of 20 to 30 faults per second are usually acceptable, but larger values may indicate a performance problem. In this case, the size of the memory pool should be increased, or the collection threshold value (GCHINL or -Xms) should be decreased so the heap isn’t allowed to grow as large. In many cases the scenario may be complicated by the fact that multiple applications may be running in the same memory pool.
later releases the cache is enabled and the maxpgms set to 20000 by default, so no adjustment is usually necessary. The verification cache operates by caching JVAPGMs that have been dynamically created for dynamically loaded classes. When the verification cache is not operating, these JVAPGMs are created as temporary objects, and are deleted as the JVM shuts down.
Page 133
libraries and environments may require a particular version. The Classic VM continues to support JDK 1.3, 1.4, 1.5 (5.0), and 1.6 (6.0) in V5R4, and JDK 1.4, 1.5 (5.0), and 1.6 (6.0) in V6R1. 3. The Classic VM supported an i5/OS-specific feature called Adopted Authority. IBM Technology for Java does not support this feature, so applications which require Adopted Authority must run in the Classic VM.
application itself or a reasonably complete subset of the application, using a load generating tool to simulate a load representative of your planned deployment environment. WebSphere applications running with IBM Technology for Java will be subject to the same constraints as plain Java applications;...
Beware of misleading benchmarks. Many benchmarks are available to test Java performance, but most of these are not good predictors of server-side Java performance. Some of these benchmarks are single-threaded, or run for a very short period of time. Others will stress certain components of the JVM heavily, while avoiding other functionality that is more typical of real applications.
4. Database Specific. Use of database can invoke significant path length in i5/OS. Invoking it efficiently can maximize the performance and value of a Java application. i5/OS Specific Java Tips and Techniques Load the latest CUM package and PTFs To be sure that you have the best performing code, be sure to load the latest CUM packages and PTFs for all products that you are using.
does take advantage of programs created at optimization *INTERPRET. These programs require significantly less space and do not need to be deleted. Program objects (even at *INTERPRET) are not used by IBM Technology for Java. Consider the special property This property sets the threshold for the MMI of the JIT. Setting this to a small value will result is compilation of the classes at startup time and will increase the start up time.
Page 138
The I/O method readLine( ) (e.g. in java.io.BufferedReader) will create a new String. String concatenation (e.g.: “The value is: “ + value) will generally result in creation of a StringBuffer, a String, and a character array. Putting primitive values (like int or long) into a collection (like List or Map) requires wrapping it in a new object (e.g.
int i = 0; try { while (true) { System.out.println (arr[i++]); } catch (ArrayOutOfBoundsException e) { // Reached the end of the array...exit Instead, the above procedure should be written as: public void goodPrintArray (int arr[]) { int len = arr.length; for (int i = 0;...
Page 140
applications. The Toolbox driver supports remote access, and should be used when accessing the database on a separate system. This recommendation is true for both the 64-bit Classic VM and the new 32-bit VM. Pool Database Connections Connection pooling is a technique for sharing a small number of database connections among a number of threads.
Resources The i5/OS Java and WebSphere performance team maintains a list of performance-related documents at http://www.ibm.com/systems/i/solutions/perfmgmt/webjtune.html. The Java Diagnostics Guide provides detailed information on performance tuning and analysis when using IBM Technology for Java. Most of the document applies to all platforms using IBM’s Java VM; in addition, one chapter is written specifically for i5/OS information.
Chapter 8. Cryptography Performance With an increasing demand for security in today’s information society, cryptography enables us to encrypt the communication and storage of secret or confidential data. This also requires data integrity, authentication and transaction non-repudiation. Together, cryptographic algorithms, shared/symmetric keys and public/private keys provide the mechanisms to support all of these requirements.
CSP API Sets User applications can utilize cryptographic services indirectly via i5/OS functions (SSL/TLS, VPN IPSec) or directly via the following APIs: The Common Cryptographic Architecture (CCA) API set is provided for running cryptographic operations on a Cryptographic Coprocessor. The i5/OS Cryptographic Services API set is provided for running cryptographic operations within the Licensed Internal Code.
8.3 Software Cryptographic API Performance This section provides performance information for System i systems using the following cryptographic services; i5/OS Cryptographic Services API and IBM JCE 1.2.1, an extension of JDK 1.4.2. Cryptographic performance is an important aspect of capacity planning, particularly for applications using secure network communications.
Table 8.2 Encryption Threads Algorithm SHA-1 / RSA SHA-1 / RSA SHA-1 / RSA SHA-1 / RSA Notes: Transaction Length set at 1024 bytes See section 8.2 for Test Environment Information Table 8.3 Encryption Threads Algorithm SHA-1 SHA-1 SHA-256 SHA-256 SHA-384 SHA-384 SHA-512...
Page 146
which is designed to meet FIPS 140-2 Level 4 security requirements. This new cryptographic card offers the security and performance required to support e-Business and emerging digital signature applications. For banking and finance applications the 4764 Cryptographic Coprocessor delivers improved performance for T-DES, RSA, and financial PIN processing.
Table 8.5 Encryption Threads Algorithm SHA-1 / RSA SHA-1 / RSA SHA-1 / RSA SHA-1 / RSA Notes: Transaction Length set at 1024 bytes See section 8.2 for Test Environment information Table 8.6 Threads Notes: See section 8.2 for Test Environment information 8.5 Cryptography Observations, Tips and Recommendations The IBM Systems Workload Estimator, described in Chapter 23, reflects the performance of real user applications while averaging the impact of the differences between the various communications...
Supported number of 4764 Cryptographic Coprocessors: Table 8.8 server models IBM System i5 570 8/12/16W, 595 IBM System i5 520, 550, 570 2/4W Applications requiring a FIPS 140-2 Level 4 certified, tamper resistant module for storing cryptographic keys should use the IBM 4764 Cryptographic Coprocessor. Cryptographic functions demand a lot of a system CPU, but the performance does scale well when you add a CPU to your system.
Microsoft Windows XP Professional Version 2002 Service Pack 1 Controller PC: 6862-27U IBM PC 300PL Pentium II 400 MHz 512KB L2, 320 MB RAM, 6.4 GB disk drive Intel® 8255x based PCI Ethernet Adapter 10/100 Microsoft Windows 2000 5.00.2195 Service Pack 4 Workload PC Magazine’s NetBench®...
Chapter 10. DB2 for i5/OS JDBC and ODBC Performance DB2 for i5/OS can be accessed through many different interfaces. Among these interfaces are: Windows .NET, OLE DB, Windows database APIs, ODBC and JDBC. This chapter will focus on access through JDBC and ODBC by providing programming and tuning hints as well as links to detailed information.
Use the lowest isolation level required by the application. Higher isolation levels can reduce performance levels as more locking and synchronization are required. Transaction levels in order of increasing level are: TRANSACTION_NONE, TRANSACTION_READ_UNCOMMITTED, TRANSACTION_READ_COMMITTED, TRANSACTION_REPEATABLE_READ, TRANSACTION_SERIALIZABLE Reuse connections. Minimize the opening and closing of connections where possible. These operations are very expensive.
Page 154
Employ efficient SQL programming techniques to minimize the amount of data processed Prepared statement reuse to minimize parsing and optimization overhead for frequently run queries Use stored procedures when appropriate to bundle processing into fewer database requests Consider extended dynamic package support for SQL statement and package caching Process data in blocks of multiple rows rather than single records when possible (e.g.
Page 155
Packages may be shared by several clients to reduce the number of packages on the System i server. To enable sharing, the default libraries of the clients must be the same and the clients must be running the same application. Extended dynamic support will be deactivated if two clients try to use the same package but have different default libraries.
‘All libraries on the system’ will cause all libraries on the system to be used for catalog requests and may cause significant degradation in response times due to the potential volume of libraries to process. References for ODBC DB2 Universal Database for System i SQL Call Level Interface (ODBC) is found under the System i Information Center under Printable PDFs and Manuals The System i Information Center Http://publib.boulder.ibm.com/iseries/...
Chapter 11. Domino on i This chapter includes performance information for Lotus Domino on the IBM i operating system. Some of the information previously included in this section has been removed. Earlier versions of the document can be accessed at http://www.ibm.com/systems/i/solutions/perfmgmt/resource.html April 2008 Update: Workload Estimator 2008.2...
IBM Lotus Domino V8 server with the IBM Lotus Notes V8 client: Performance, October 2007 http://www.ibm.com/developerworks/lotus/library/domino8-performance/index.html Lotus Domino 7 Server Performance, Part 2, November 2005 http://www.ibm.com/developerworks/lotus/library/domino7-internet-performance/index.html Lotus Domino 7 Server Performance, Part 3, November 2005 http://www.ibm.com/developerworks/lotus/library/domino7-enterprise-performance/ Best Practices for Large Lotus Notes Mail Files, October 2005 http://www.ibm.com/developerworks/lotus/library/notes-mail-files/ Lotus Domino 7 Server Performance, Part 1, September 2005 http://www.ibm.com/developerworks/lotus/library/nd7-perform/index.html...
Delete documents marked for deletion Create 1 appointment (every 90 minutes) Schedule 1 meeting invitation (every 90 minutes) Close the view Domino Web Access (formerly known as iNotes Web Access) Each user completes the following actions an average of every 15 minutes except where noted: Open mail database which contains documents that are 10Kbytes in size.
optimal performance but of course without the function provided in the Domino 7 templates. The following links refer to these articles: Lotus Domino 7 Server Performance, Part 1, September 2005 http://www.ibm.com/developerworks/lotus/library/nd7-perform/index.html Lotus Domino 7 Server Performance, Part 2, November 2005 http://www.ibm.com/developerworks/lotus/library/domino7-internet-performance/index.html Lotus Domino 7 Server Performance, Part 3, November 2005 http://www.ibm.com/developerworks/lotus/library/domino7-enterprise-performance/...
Domino Version Number of Domino Web Access users Domino 5.0.11 2,000 Domino 6 2,000 Domino 5.0.11 3,800 Domino 6 3,800 Domino 5.0.11 20,000 Domino 6 20,000 The 3000 user comparison above was done on an iSeries model i270-2253 which has a 2-way 450MHz processor.
The 2000 user comparison was done on a model i825-2473 with 6 1.1GHz POWER4 processors, 45GB of memory, and 60 18GB disk drives configured with RAID5, in a single Domino partition. The 3800 user comparison used a single Domino partition on a model i890-0198 with 32 1.3GHz POWER4 processors.
shopping application, but would provide even better response times than the 270-2423 as projected in Figure 11.3. When using MHz alone to compare performance capabilities between models, it is necessary for those models to have the same processor technology and configuration. Factors such as L2 cache and type and speed of memory controllers also influence performance behavior and must be considered.
The eServer i5 Domino Edition builds on the tradition of the DSD (Dedicated Server for Domino) and the iSeries for Domino offering - providing great price/performance for Lotus software on System i5 and i5/OS. Please visit the following sites for the latest information on Domino Edition solutions: http://www.ibm.com/servers/eserver/iseries/domino/ http://www.ibm.com/servers/eserver/iseries/domino/edition.html 11.7 Performance Tips / Techniques...
Page 165
that the larger the buffer pool size, the higher the fault rate, but the lower the cpu cost. If the faulting rate looks high, decrease the buffer pool size. If the faulting rate is low but your cpu utilization is high, try increasing the buffer pool size. Increasing the buffer pool size allocates larger objects specifically for Domino buffers, thus increasing storage pool contention and making less storage available for the paging/faulting of other objects on the system.
Page 166
Full text indexes Consider whether to allow users to create full text indexes for their mail files, and avoid the use of them whenever possible. These indexes are expensive to maintain since they take up CPU processing time and disk space. Replication.
11.8 Domino Web Access The following recommendations help optimize your Domino Web Access environment: 1. Refer to the redbooks listed at the beginning of this chapter. The redbook, “iNotes Web Access on the IBM eServer iSeries server,” contains performance information on Domino Web Access including the impact of running with SSL.
11.10 Performance Monitoring Statistics Function to monitor performance statistics was added to Domino Release 5.0.3. Domino will track performance metrics of the operating system and output the results to the server. Type "show stat platform" at the server console to display them. This feature can be enabled by setting the parameter PLATFORM_STATISTICS_ENABLED=1 in the NOTES.INI file and restarting your server and is automatically enabled in some versions of Domino.
Page 169
2. *MINIMIZE The main storage will be allocated to minimize the space used by the object. That is, as little main storage as possible will be allocated and used. This minimizes main storage usage while increasing the number of disk I/O operations since less information is cached in main storage. 3.
Page 170
The following is an example of how to issue the command: CHGATR OBJ( name of object) ATR(*MAINSTGOPT) VALUE(*NORMAL, *MINIMIZE, or *DYNAMIC) The chart below depicts V5R3-based paging curve measurements performed with the following settings for the mail databases: *NORMAL, *MINIMIZE, and *DYNAMIC. V5R3 Main Storage Options Fault Rates 60775040 47026568...
During the tests, the *DYNAMIC and *MINIMIZE settings used up to 5% more CPU resource than *NORMAL. Figure 11.5 below shows the response time data rather than fault rates for the same test shown in Figure 11.4 for the attributes *NORMAL, *DYNAMIC, and *MINIMIZE. V5R3 Main Storage Options Response Times 60775040 47026568...
NOTE: MCU ratings should NOT be used directly as a sizing guideline for the number of supported users. MCU ratings provide a relative comparison metric which enables System i models to be compared with each other based on their Domino processing capability. MCU ratings are based on an industry standard workload and the simulated users do not necessarily represent a typical load exerted by “real life”...
users or relatively low transaction rates, response times may be significantly higher for a small LPAR (such as 0.2 processor) or partial processor model as compared to a full processor allocation of the same technology. The IBM Systems Workload Estimator will not recommend the 500 CPW or 600 CPW models for Domino processing.
Chapter 12. WebSphere MQ for iSeries 12.1 Introduction The WebSphere MQ for iSeries product allows application programs to communicate with each other using messages and message queuing. The applications can reside either on the same machine or on different machines or platforms that are separated by one or more networks. For example, iSeries applications can communicate with other iSeries applications through WebSphere MQ for iSeries, or they can communicate with applications on other platforms by using WebSphere MQ for iSeries and the appropriate MQ Series product(s) for the other platform (HP-UX, OS/390, etc.).
enhancement should allow customers to run with smaller, more manageable, receivers with less concern about the checkpoint taken following a receiver roll-over during business hours. 12.3 Test Description and Results Version 5.3 of WebSphere MQ for iSeries includes several performance enhancements designed to significantly improve queue manager throughput and application response time, as well as improve the overall throughput capacity of MQ Series.
Page 176
applications using MQ Series are running, you may need to consider adding memory to these pools to help performance. Nonpersistent messages use significantly less CPU and IO resource than persistent messages do because persistent messages use native journaling support on the iSeries to ensure that messages are recoverable.
Chapter 13. Linux on iSeries Performance 13.1 Summary Linux on iSeries expands the iSeries platform solutions portfolio by allowing customers and software vendors to port existing Linux applications to the iSeries with minimal effort. But, how does it shape up in terms of performance? What does it look like generally and from a performance perspective? How can one best configure an iSeries machine to run Linux? Key Ideas...
Shared Processors. This variation of LPAR allows the Hypervisor to use a given processor in multiple partitions. Thus, a uni-processor might be divided in various fractions between (say) three LPAR partitions. A four way SMP might give 3.9 CPUs to one partition and 0.1 CPUs to another. This is a large and potentially profitable subject, suitable for its own future paper.
Linux can, by a suitable and reasonably conventional mount point strategy, intermix both native and virtual disks. The Virtual Disk is analogous to some of the less common Linux on Intel distributions where a Linux file system is emulated out of a large DOS/Windows file, except that on OS/400, the storage is automatically “striped”...
13.4 Basic Configuration and Performance Questions Since, by definition, iSeries Linux means at least two independent partitions, questions of configuration and performance get surprisingly complicated, at least in the sense that not everything is on one operating system and whose overall performance is not visible to a single set of tools. Consider the following environments: A machine with a Linux and an OS/400 partition, both running CPU-bound work with little I/O.
13.5 General Performance Information and Results A limited number of performance related tests have been conducted to date, comparing the performance of iSeries Linux to other environments on iSeries and to compare performance to similarly configured (especially CPU MHz) pSeries running the application in an AIX environment. Computational Performance -- C-based code A factor not immediately obvious is that most Linux and Open Source code are constructed with a single compiler, the GNC (gcc or g++) compiler.
Fraction of ILE Performance Linux Computational Environment One virtue of the i870, i890, and i825 machines is that the hardware floating point unit can make up for some of the code generation deficit due to its superior hardware scheduling capabilities. Computational Performance -- Java Generally, Java computational performance will be dictated by the quality of the JVM used.
Here, a model 840 was subdivided into the partition sizes shown and a typical web serving load was used. A "hit" is one web page or one image. The kttpd is a kernel-based daemon available on Linux which serves only static web pages or images. It can be cascaded with ordinary Apache to provide dynamic content as well.
As noted above, many distributions are based on the 2.95 gcc compiler. The more recent 3.2 gcc is also used by some distributions. Results there shows some variability and not much net improvement. To the extent it improves, the gap with ILE should close somewhat. Floating point performance is improved, but proportionately.
The absolute numbers are less important than the fact that there is an advantage. The point is: To be sure of this level of performance from the Intel side, more work has to be done, including getting the right hardware, BIOS, and Linux tools in place. Similar work would also have to be done using Native Disk on iSeries Linux.
typically recommended because it allows the Linux partitions to leverage the storage subsystem the customer has in the OS/400 hosting partition. 2. As the application gains in complexity, it is probably less likely that the application should switch from one product to the other. Such applications tend to implicitly play to particular design choices of their current product and there is probably not much to gain from moving them between products.
Page 187
do so, you may wish to compare with the next previous version. This would be especially important if you have one key piece of open source code largely responsible for the performance of a given partition. There is no way of ensuring that a new distribution is actually faster than the predecessor except to test it out.
Page 188
substantial amount of Virtual I/O. This is probably on the high side, but can be important to have something left over. If the hosting partition uses all its CPU, Virtual I/O may slow substantially. Use Virtual LAN for connections between iSeries partitions whether OS/400 or Linux. If your OS/400 PTFs are up to date, it performs roughly on a par with gigabit ethernet and has zero hardware cost, no switches and wires, etc.
Chapter 14. DASD Performance This chapter discusses DASD subsystems available for the System i platform. There are two separate considerations. Before IBM i operating system V6R1, one only had to consider particular devices, IOAs, IOPs, and SAN devices. All attached through similar strategies directly to IBM i operating system and were all supported natively.
14.1.2 iV5R2 Direct Attach DASD This section discusses the direct attach DASD subsystem performance improvements that were new with the iV5R2 release. These consist of the following new hardware and software offerings : 2757 SCSI PCI RAID Disk Unit Controller (IOA) 2780 SCSI PCI RAID Disk Unit Controller (IOA) 2844 PCI Node I/O Processor (IOP) 4326 35 GB 15K RPM DASD...
Page 193
14.1.2.2 I/O Intensive Workload 2778 IOA vs 2757 IOA 15 RAID DASD IOA and operation Number of 35 GB DASD units (Measurement numbers in GB/HR) 2778 IOA Save *SAVF Restore 2757 IOA Save *SAVF Restore This restrictive test is intended to show the effect of the 2757 IOAs in a backup and recovery environment.
14.1.3 571B iV5R4 offers two new options on DASD configuration. RAID6 which offers improved system protection on supported IOAs. NOTE: RAID6 is supported under iV5R3 but we have chosen to look at performance data on a iV5R4 system. IOPLess operation on supported IOAs. 14.1.3.1 571B RAID5 vs RAID6 - 10 15K 35GB DASD 0.25 0.15...
14.1.4 571B, 5709, 573D, 5703, 2780 IOA Comparison Chart In the following two charts we are modeling a System i 520 with a 573D IOA using RAID5, comparing 3 70GB 15K RPM DASD to 4 70GB 15K RPM DASD. The 520 is capable of holding up to 8 DASD but many of our smaller customers do not need the storage.
Page 196
The charts below are an attempt to allow the different IOAs available to be compared on a single chart. An I/O Intensive Workload was used for our throughput measurements. The system used was a 520 model with a single 5094 attached which contained the IOAs for the measurements. Note: the 5709 and 573D are cache cards for the built in IOA in the 520/550/570 CECs, even though I show them in the following chart like they are the IOA.
14.1.5 Comparing Current 2780/574F with the new 571E/574F and 571F/575B NOTE: iV5R3 has support for the features in this section but all of our performance measurements were done on iV5R4 systems. For information on the supported features see the IBM Product Announcement Letters.
14.1.6 Comparing 571E/574F and 571F/575B IOP and IOPLess In comparing IOP and IOPLess runs we did not see any significant differences, including the system CPU used. The system we used was a model 570 4 way, on the IOP run the system CPU was 11.6% and on the IOPLess run the system CPU was 11.5%.
14.1.7 Comparing 571E/574F and 571F/575B RAID5 and RAID6 and Mirroring System i protection information can be found at System i Handbook or the Info Center http://publib.boulder.ibm.com/iseries/ . When comparing RAID5, RAID6 and Mirroring we are interested in looking at the strength of failure protection vs storage capacity vs the performance impacts to the system workloads.
Page 200
In comparing Mirroring and RAID one of the concerns is capacity differences and the hardware needed. We tried to create an environment where the capacity was the same in both environments. To do this we built the same size database on “15 35GB DASD using RAID5” and “14 70GB DASD using Mirroring spread across 2 IOAs”.
14.1.8 Performance Limits on the 571F/575B In the following charts we try to characterize the 571F/575B in different DASD configuration. The 15 DASD experiment is used to give a comparison point with DASD experiments from chart 14.1.5.1 and 14.1.5.2. The 18, 24 and 36 DASD configurations are used to help in the discussion of performance vs capacity.
14.1.9 Investigating 571E/574F and 571F/575B IOA, Bus and HSL limitations. With the new DASD controllers and IOPLess capabilities, IBM has created many new options for our customers. Customers who needed more storage in their smaller configurations can now grow. With the ability to add more storage into an HSL loop the capacity and performance have the potential to grow.
14.1.10 Direct Attach 571E/574F and 571F/575B Observations We did some simple comparison measurements to provide graphical examples for customers to observe characteristics of new hardware. We collected performance data using Collection Services and Performance Explorer to create our graphs after running our DASD IO workload (small block reads and writes).
14.2.2 RAID Hot Spare 9406-570 4 way 24 4328 140 GB RAID5 24 active 9406-570 4 way 24 4328 140 GB RAID5 22 active 2 Hot Spares 9406-570 4 way 24 4328 140 GB RAID6 24 active 9406-570 4 way 24 4328 140 GB RAID6 22 active 2 Hot Spares 0.14 0.11 0.08...
14.2.3 12X Loop Testing 12X Loop testing from 1 571F to 8 571F IOAs with 36 DASD off each 571F 1800 1600 1400 1200 1000 A 9406-MMA 8 Way system with 96 GB of mainstore and 396 DASD in #5786 EXP24 Disk Drawer on 3 12X loops for the system ASP were used, ASP 2 was created on a 4th 12X loop by adding 5796 system expansion units with 571F IOAs attaching 36 4327 70 GB DASD in #5786 EXP24 Disk Drawer with RAID5 turned on.
14.3.2 57B8/57B7 IOA With the addition of the POWER6 520 and 550 systems comes the new 57B8/57B7 SAS Raid Ennoblement Controller with Auxiliary Write Cache. This controller is only available in the POWER6 520 and 550 systems and provides RAID5/6 capabilities, with 175MB redundant write cache.
Page 211
The POWER6 520 and 550 also have an external SAS port, that is controlled by the 57B8/57B7, used to connect a single #5886 - EXP 12S SAS Disk Drawer which can contain up to 12 SAS DASD. Below is a chart showing the addition of the #5886 - EXP 12S SAS Disk Drawer. POWER6 520 57B8/57B7 6 RAID5 DASD in CEC 12 RAID5 DASD in EXP 12S SAS Disk Drawer POWER6 520 57B8/57B7 6 RAID5 DASD in CEC 0.25...
14.3.3 572A IOA The 572A IOA is a SAS IOA that is mainly used for SAS tape attachment but the 5886 EXP 12S SAS Disk Drawer can also be attached. Performance will be poor as the IOA does not have any cache. The following charts help to show the performance characteristics that resulted during experiments in the Rochester lab.
Page 215
14.5 iV6R1M0 -- VIOS and IVM Considerations Beginning in iV6R1M0, IBM i operating system will participate in a new virtualization strategy by becoming a client of the VIOS product. Customers will view the VIOS product two different ways: On blade products, through the regular configuration tool IVM (which includes an easy to use interface to VIOS).
14.5.1 General VIOS Considerations 14.5.1.1 Generic Concepts 520 versus 512. Long time IBM i operating system users know that IBM i operating system disks are traditionally configured with 520 byte sectors. The extra eight bytes beyond the 512 used for data are used for various purposes by Single Level Store. For a variety of reasons, VIOS will always surface 512 byte sectors to IBM i operating system whatever the actual sector size of the disk may be.
14.5.1.2 Generic Configuration Concepts There are several important principles to keep track of in terms of getting good performance. Most of the following are issues when the disks are configured. A great many problems can be eliminated (or, created) when the drives are originally configured. The exact nature of some of these difficulties might not be easily predicted.
Page 218
3. Prefer external disks attached directly to IBM i operating system over those attached via VIOS This is basically a statement of the Fibre Channel adapter and who owns it. In some cases, it affects which adapter is purchased. If you do not need to share a given external disk's resources with non-IBM i operating system partitions, and the support is available, avoiding VIOS altogether will give better performance.
Page 219
8. Ensure, within reason, a reasonable number of virtual disks are created and made available to IBM i operating system. One is tempted to simply lump all the storage one has in a virtual environment into a couple (or even one) large virtual disk. Avoid this if at all possible. For traditional (non-blade) systems: There is a great deal of variability here, so generalizations are difficult.
14.5.1.3 Specific VIOS Configuration Recommendations -- Traditional (non-blade) Machines 1. Avoid volume groups if possible. VIOS "hdisks" must have a volume identifier (PVID). Creating a volume group is an easy way to assign one and some literature will lead you to do it that way.
Page 221
3. Limited number of virtual devices per virtual SCSI adapter. You will have to configure some number of virtual SCSI adapters so that VIOS can provide a path for IBM i operating system to talk to VIOS as if these were really physical SCSI devices. These adapters, in turn, implement some existing rules, so that only 16 virtual disks can be made part of a given virtual adapter.
14.5.1.3 VIOS and JS12 Express and JS22 Express Considerations Most of our work consisted of measurements with the JS22 offering and external disks using the DS4800 product. The following are results obtained in various measurements and then a few general comments about configuration will follow. 14.5.1.3.1 BladeCenter H JS22 Express running IBM i operating system/VIOS The following tests were run using a 4 processor JS22 Express in a BladeCenter H chassis, 32 GB of memory and a DS4800 with a total of 90 DDMs, (8 DDMs using RAID1 externalized in...
Page 223
VIOS/IBM i operating system JS22 Express IBM i operating system .8 Processor VIOS .2 Processor IBM i operating system 1.7 Processor VIOS .3 Processor IBM i operating system 2.6 Processors VIOS .4 Processors IBM i operating system 3.5 Processors VIOS .5 Processors 0.01 0.001 10000...
Page 224
The following charts are a view of the characteristics we observed during our Commercial Performance Workload testing on our JS22 Express. The first chart shows the effect on the Commercial Performance Workload when we apply 3 Dedicated processors and then switch to 3 shared processors.
Page 225
In following single partition Commercial Performance Workload runs the average VIOS CPU stayed under 40%. So we seem to have VIOS resource available but in a lot of customer environments communications and other resources are also running and these resources will also be routed through VIOS.
Page 226
The following chart shows two IBM i operating system partitions using 14GB of memory and 1.7 processors each served by 1 VIOS partition using 2GB of memory and .6 processors. The Commercial Performance Workload was running the same amount of transactions on each of the partitions for the same time intervals.
14.5.1.3.2 BladeCenter S and JS12 Express The IBM i operating system is now supported on a JS12 Express in a BladeCenter S. The system is limited to 12 SAS DASD and the following charts try to characterize the performance we achieved during experiments with the Commercial Performance Workload in the IBM lab. Using a JS22 Express in a BladeCenter H connected to a DS4800, we limited the resources in order to get a comparison to the SAS DASD used in the BladeCenter S.
14.5.1.3.3 JS12 Express and JS22 Express Configuration Considerations 1. The aggregate total of virtual disks (LUNs) will be sixteen at most. Many customers will want to deploy between 12 and 16 LUNs and maximize symmetry. Consult carefully with your support team on the choices here. This is the most important consideration as it is difficult to change later.
14.5.1.3.4 DS3000/DS4000 Storage Subsystem Performance Tips Physical disks can be configured various ways with RAID levels, number of disks in each array and number of LUNs created over those arrays. There are also various reasons for the configurations that are chosen. One end user might be looking for ease of use and choose to create one array with multiple LUNs, where another end user might consider performance to be a more critical issue and select to create multiple arrays.
14.6 IBM i operating system 5.4 Virtual SCSI Performance The primary goal of virtualization is to lower the total cost of ownership of equipment by improving utilization of the overall system resources and reducing the labor requirements to operate and manage many servers. With virtualization, the IBM Power Systems can now be used similar to the way mainframes have been used for decades, sharing the hardware between many programs, services, applications, or users.
Page 232
In the test results that follow, we see the CPU required for IBM i operating system Virtual SCSI server and the benefits of the IBM i operating system Virtual SCSI implementation should be assessed for a given environment. Simultaneous multithreading should be enabled in a virtual hosted disk environment.
14.6.1 Introduction In general, applications are functionally isolated from the exact nature of their storage subsystems by the operating system. An application does not have to be aware of whether its storage is contained on one type of disk or another when performing I/O. But different I/O subsystems have subtly different performance qualities, and virtual SCSI is no exception.
All measurements were completed on a POWER5 570+ 4-Way (2.2 GHz). Each system is configured as an LPAR, and each virtual SCSI test was performed between two partitions on the same system with one CPU for each partition. IBM i operating system 5.4 was used on the virtual SCSI server and AIX 5.3 was used on the client partitions.
14.6.2.1 Native vs. Virtual Performance Figure 1 shows a comparison of measured bandwidth using virtual SCSI and local attached DASD for reads with varying block sizes of operations. The difference in the reads between virtual I/O and native I/O in these tests is attributable to the increased latency using virtual I/O. The difference in writes is caused by misalignment, which causes a read for every write.
14.6.2.3 Virtual SCSI Bandwidth-Network Storage Description (NWSD) Scaling Figure 3 shows a comparison of measured bandwidth while scaling network storage descriptions with varying block sizes of operations. Each of the network storage descriptions have a single network storage space attached to them. The difference in the scaling of these tests is attributable to the performance gain which can be achieved by adding multiple network storage descriptions.
14.6.2.4 Virtual SCSI Bandwidth-Disk Scaling Figure 4 shows a comparison of measured bandwidth while scaling disk drives with varying block sizes of operations. Each of the network storage descriptions have a single network storage space attached to them. The difference in the scaling of these tests is attributable to the performance gain which can be achieved by adding disk drives and IO adapters.
14.6.3 Sizing Sizing methodology is based on the observation that processor time required to perform an I/O on the IBM i operating system Virtual SCSI server is fairly constant for a given I/O size. The I/O devices supported by the Virtual SCSI server are sufficiently similar to provide good recommendations.
Page 239
To calculate IBM i operating system Virtual SCSI CPU requirements the following formula is provided. The number of transactions per second could be collected by the IBM i operating system command WRKDSKSTS. Based on the average transaction size in WRKDSKSTS, select a number from the table.
14.6.3.2 Sizing when using Micro-Partitioning Defining Virtual SCSI servers in micro-partitions enables much better granularity of processor resource sizing and potential recovery of unused processor time by uncapped partitions. Tempering those benefits, use of micro-partitions for Virtual SCSI servers slightly increases I/O response time and creates somewhat more complex processor entitlement sizing.
14.6.3.3 Sizing memory The IBM i operating system Virtual SCSI server supports data read caching on the virtual hosted disk server partition. Thus all I/Os that it services could benefit from effects of caching heavily used data. Read performance can vary depending upon the amount of memory which is assigned to the server partition.
14.6.4 AIX Virtual IO Client Performance Guide The following is a link which will direct you to more in-depth performance tuning for AIX virtual SCSI client. Advanced POWER Virtualization on IBM Considerations http://www.redbooks.ibm.com/abstracts/sg247940.html? 14.6.5 Performance Observations and Tips • In order to achieve best performance 1 network storage description should be used for every 2-4 disks within an ASP.
Chapter 15. Save/Restore Performance This chapter’s focus is on the IBM i operating system platform. For legacy system models, older device attachment cards, and the lower performing backup devices see the V5R3 performance capabilities reference. Many factors influence the observable performance of save and restore operations. These factors include: The backup device models, number of DASD units the data is spread across, processors, LPAR configurations, IOA used to attach the devices.
15.2 Save Command Parameters that Affect Performance Use Optimum Block Size (USEOPTBLK) The USEOPTBLK parameter is used to send a larger block of data to backup devices that can take advantage of the larger block size. Every block of data that is sent has a certain amount of overhead that goes with it.
15.3 Workloads The following workloads were designed to help evaluate the performance of single, concurrent and parallel save and restore operations for selected devices. Familiarization with these workloads can help in understanding differences in the save and restore rates. Database File related Workloads: The following workloads are designed to show some possible customer environments using database files.
15.4 Comparing Performance Data When comparing the performance data in this document with the actual performance on your system, remember that the performance of save and restore operations is data dependent. If the same backup device was used on data from three different systems, three different rates may result. The performance of save and restore operations are also dependent on the system configuration, most directly affected by the number and type of DASD units on which the data is stored and by the type of storage IOAs being used.
15.5 Lower Performing Backup Devices With the lower performing backup devices, the devices themselves become the gating factor so the save rates are approximately the same, regardless of system CPU size (DVD-RAM). Table 15.5.1 Lower performing backup devices LossFromWorkLoadType Approximations (Save Operations) Workload Type Large Database File User Mix / Domino / Network Storage Space...
15.8 The Use of Multiple Backup Devices Concurrent Saves and Restores - The ability to save or restore different objects from a single library/directory to multiple backup devices or different libraries/directories to multiple backup devices at the same time from different jobs. The workloads that were used for the testing were Large Database File and User Mix from libraries.
15.9 Parallel and Concurrent Library Measurements This section discusses parallel and concurrent library measurements for tape drives, while sections later in this chapter discuss measurements for virtual tape drives. 15.9.1 Hardware (2757 IOAs, 2844 IOPs, 15K RPM DASD) Hardware Environment. This testing consisted of an 840 24 way system with 128 GB of memory.
15.9.2 Large File Concurrent For the concurrent testing 16 libraries were built, each containing a single 320 GB file with 80 4 GB members. The file size was chosen to sustain a flow across the HSL, system bus, processors, memory and tapes drives for about an hour.
15.9.3 Large File Parallel For the measurements in this environment, BRMS was used to manage the save and restore, taking advantage of the ability built into BRMS to split an object between multiple tape drives. Starting with a 320 GB file in a single library and building it up to 2.1 TB for tape drive tests 1 - 4 and 8. The file was then duplicated in the library for tape drive tests 12 - 16, a single library with two 2.1 TB files was used.
15.9.4 User Mix Concurrent User Mix will generally portray a fair population of customer systems, where the real data is a mixture of programs, menus, commands along with their database files. The new ultra tape drives are in their glory when streaming large file data, but a lot of other factors play a part when saving and restoring multiple smaller objects.
15.10 Number of Processors Affect Performance With the Large Database File workload, it is possible to fully feed two backup devices with a single processor, but with the User Mix workload it takes 1+ processors to fully feed a backup device. A recommendation might be 1 and 1/3 processors for each backup device you want to feed with User Mix data.
15.11 DASD and Backup Devices Sharing a Tower The system architecture does not require that DASD and backup devices be kept separated. Testing in the IBM Rochester Lab, we had attached one backup device to each tower and all towers had 45 DASD units in them, when we did the 3580 002 testing.
15.12 Virtual Tape Virtual tape drives are being introduced in iV5R4 so those customers can make use of the speed of saving to DASD, then save the data using DUPTAP to the tape drives reducing the backup window where the system is unavailable to users. There are a lot of pieces to consider in setting up and using Virtual tape drives.
Page 256
The following measurements were done on a system with newer hardware including a 3580 Ultrium 3 4Gb Fiber Channel Tape Drive, 571E storage adapters, and 4327 70GB (U320) DASD. Save to Tape Vs. Save to Virtual Tape then DUPTAP to Tape 570, 8 Way, 96GB Memory, 305 DASD units for Virtual Tape Drives Restricted State Save to Tape Save to Tape Vs.
15.13 Parallel Virtual Tapes NOTE: Virtual tape is reading and writing to the same DASD so the maximum throughput with our concurrent and parallel measurements is different than our tape drive tests where we were reading from DASD and writing to tape. 800 DASD units for Virtual Tape Drives 3500 3000...
15.14 Concurrent Virtual Tapes NOTE: Virtual tape is reading and writing to the same DASD so the maximum throughput with our concurrent and parallel measurements is different than our tape drive tests where we were reading from DASD and writing to tape. Concurrent Virtual Tape for Large File 800 DASD units for Virtual Tape Drives 3500...
15.15 Save and Restore Scaling using a Virtual Tape Drive. A 570 8 way System i was used for the following tests. A user ASP was created using up to 3 571F IOAs with up to 36 U320 70 GB DASD on each IOA. The Chart shows the number of DASD in each test and the Virtual tape drive was created using that DASD.
15.16 Save and Restore Scaling using 571E IOAs and U320 15K DASD units to a 3580 Ultrium 3 Tape Drive. A 570 8 way System i was used for the following tests. A user ASP was created with the number of DASD listed in each test .
15.17 High-End Tape Placement on System i The current high-end tape drives (ULTRIUM-2 / ULTRIUM-3 and 3592-J / 3592-E) need to be placed carefully on the System i buses and HSLs in order to avoid bottlenecking. The following rules-of thumb will help optimize performance in a large-file save environment, and help position the customer for future growth in tape activity: v Limit the number of drives per fibre tape adapter as follows:...
15.18 BRMS-Based Save/Restore Software Encryption and DASD-Based ASP Encryption The Ultrium-3 was used in the following experiments, which attempt to characterize the effects of BRMS-based save /restore software encryption and DASD-based ASP encryption. Some of the newer tape drives offer hardware encryption as an option but for those who are not looking to upgrade or invest in these tape units at this time, software encryption can be a fair solution.
Page 264
9406-MMA-4w ay Encrypted ASP RSTLIBBRM NO Softw are Encryption 9406-MMA-4w ay Encrypted ASP RSTLIBBRM With Softw are Encryption 9406-MMA-4w ay NON Encrypted ASP RSTLIBBRM NO Softw are Encryption 9406-MMA-4w ay NON Encrypted ASP RSTLIBBRM With Softw are Encryption 9406-570-4w ay NON Encrypted ASP RSTLIBBRM NO Softw are Encryption 9406-570-4w ay NON Encrypted ASP RSTLIBBRM With Softw are Encryption 1 GB Source File 9406-MMA-4w ay %CPU Used Encrypted ASP RSTLIBBRM NO Softw are Encryption...
Page 265
15.19 5XX Tape Device Rates Note: Measurements for the high speed devices were completed on a 570 4 way system with 2844 IOPs and 2780 IOA’s and 180 15K RPM RAID5 DASD units. The smaller tape device tests were completed on a 520 2 way with 75 DASD units.
Page 266
Table 15.19.2 - iV5R4M0 Measurements on an 5XX 1-way system 8 RAID5 protected DASD Units 8 GB memory Measurements in (GB/HR) all 8 DASD in the system ASP . Workload S = Save R = Restore Release Measurements were done Source File 1GB User Mix 12GB Large File 32GB...
15.20 5XX Tape Device Rates with 571E & 571F Storage IOAs and 4327 (U320) Disk Units Save/restore rates of 3580 Ultrium 3 (2Gb and 4Gb Fiber Channel) tape devices and of virtual tape devices were measured on a 570 8-way system with 571E and 571F storage adapters and 714 type 4327 70GB (U320) disk units.
15.21 5XX DVD RAM and Optical Library Table 15.21.1 - iV5R3 Measurements on an 520 2-way system 53 RAID protected DASD Units 16 GB memory Measurements in (GB/HR) ASP 1 (System ASP 23 DASD) ASP 2 (30 DASD) Workload data Saved and Restored from User ASP 2. Workload S = Save R = Restore...
Page 269
15.22 Software Compression The rates a customer will achieve will depend upon the system resources available. This test was run in a very favorable environment to try to achieve the maximum rates. Software compression rates were gathered using the QSRSAVO API. The CPU used in all compression schemes was near 100%. The compression algorithm cannot span CPUs so the fact that measurements were performed on a 24-way system doesn’t affect the software compression scenario.
15.23 9406-MMA DVD RAM Table 15.23.1 - iV5R4M5 Measurements on an 9406-MMA 4-way system 6 Mirrored DASD in the CEC and 24 RAID5 protected DASD Units attached 32 GB memory Measurements in (GB/HR) all 30 DASD in the system Workload S = Save R = Restore Release Measurements were...
15.24 9406-MMA 576B IOPLess IOA Table 15.24.1 - iV6R1M0 Measurements on an 9406-MMA 4-way system 200 RAID5 protected DASD Units in the system ASP, attached via 571F IOAs 40 GB memory Measurements in (GB/HR). Two different Virtual tape experiments with 60 RAID5 DASD ASP2 and 120 RAID5 DASD in ASP2 Workload S = Save R = Restore...
Page 272
15.25 What’s New and Tips on Performance What’s New iV6R1M0 March 2008 BRMS-Based Save/Restore Software Encryption and DASD-Based ASP Encryption 576B IOPLess Storage IOA iV5R4M5 July 2007 3580 Ultrium 4 - 4Gb Fiber Channel Tape Drive 6331 SAS DVD RAM for 9406-MMA system models iV5R4 January 2007 571E and 571F storage IOAs (see DASD Performance chapter for more information)
Chapter 16 IPL Performance Performance information for Initial Program Load (IPL) is included in this section. The primary focus of this section is to present observations from IPL tests on different System i models. The data for both normal and abnormal IPLs are broken down into phases, making it easier to see the detail.
16.3 9406-MMA System Hardware Information 16.3.1 Small system Hardware Configuration 9406-MMA 7051 4 way - 32 GB Mainstore DASD / 30 70GB 15K rpm arms, 6 DASD in CEC Mirrored 24 DASD in a #5786 EXP24 Disk Drawer attached with a 571F IOA RAID5 Protected Software Configuration 100,000 spool files (100,000 completed jobs with 1 spool file per job) 500 jobs in job queues (inactive)
16.4 9406-MMA IPL Performance Measurements (Normal) The following tables provide a comparison summary of the measured performance data for a normal and abnormal IPL. Results provided do not represent any particular customer environment. Measurement units are in minutes and seconds Table 16.4.1 iV5R4M5 Normal IPL - Power-On (Cold Start) iV5R4M5 GA1 Firmware...
16.6 NOTES on MSD MSD is Mainstore Dump. General IPL phase as it relates to the SRCs posted on the operation panel: Processor MSD includes the D2xx xxxx and C2xx xxxx right after the system is forced to terminate. SLIC MSD IPL with Copy follows with the next series of C6xx xxxx, see the next heading for more information on the SLIC MSD IPL with Copy.
16.7 5XX System Hardware Information 16.7.1 5XX Small system Hardware Configuration 520 7457 2 way - 16 GB Mainstore DASD / 23 35GB 15K rpm arms, RAID Protected Software Configuration 100,000 spool files (100,000 completed jobs with 1 spool file per job) 500 jobs in job queues (inactive) 500 active jobs in system during Mainstore dump 1000 user profiles...
16.8 5XX IPL Performance Measurements (Normal) The following tables provide a comparison summary of the measured performance data for a normal and abnormal IPL. Results provided do not represent any particular customer environment. Measurement units are in minutes and seconds Table 16.8.1 Normal IPL - Power-On (Cold Start) V5R3 iV5R4...
16.10 5XX IOP vs IOPLess effects on IPL Performance (Normal) Measurement units are in minutes and seconds Table 16.10.2 Normal IPL - Power-On (Cold Start) iV5R4 GA7 Firmware 16 Way IOP 256 GB 924 DASD Hardware 17:44 SLIC OS/400 Total 26:59 16.11 IPL Tips Although IPL duration is highly dependent on hardware and software configuration, there are tasks that...
Integrated xSeries Adapter (IXA) extend the utility of the System i solution by integrating x86 and AMD based servers with the System i platform. Selected models of Intel based servers may run Windows® 2000 Server editions, Windows Server 2003 editions, Red Hat® Enterprise Linux®, or SUSE®...
Integrated xSeries Servers (IXS) An Integrated xSeries Server is an Intel processor-based server on a PCI-based interface card that plugs into a host system. This card provides the processor, memory, USB interfaces, and in some cases, a built-in gigabit Ethernet adapter. There are several hardware versions of the IXS: The 2.0 GHz Pentium®...
Write Cache Property When the disk device write cache property is disabled, disk operations have similar performance characteristics to shared disks. You may examine or change the “Write Cache” property on Windows by selecting disk “properties” and then the “Hardware tab”. Then view “Properties” for a selected disk and view the “Disk Properties”...
With iSCSI, there are some Windows side disk configuration rules you must take into account to enable efficient disk operations. Windows disks should be configured as: 1 disk partition per virtual drive. File system formatted with cluster sizes of 4 kbyte or 4 kbyte multiples. 2 gigabyte or larger storage spaces (for which Windows creates a default NTFS cluster size of 4kbytes).
2. Vary on any Network Server Description (NWSD) with a Network server connection type of *ISCSI. During the iSCSI network server vary on processing the QFPHIS subsystem is automatically started if necessary. The subsystem will activate the private memory pool. iSCSI network server descriptions that are varied on will then utilize the first private memory pool configured with at least the minimum (4MB) size for virtual disk I/O operations.
IXS and IXA I/O operations (disk, tape, optical and virtual Ethernet) communications occur through the individual IXS and IXA IOP resource. This IOP imposes a finite capacity. The IOP processor utilization may be examined via the iSeries Collection Services utilities. The performance results presented in the rest of this chapter are based on measurements and projections using standard IBM benchmarks in a controlled environment.
Machine Pool: Base Pool: QFPHIS Private Pool: Total: Warning: To ensure expected performance and continuing machine operation, it is critical to allocate sufficient memory to support all of the devices that are varied on. Inadequate memory pools can cause unexpected machine operation. 17.4 Disk I/O CPU Cost Disk Operation Rules of Thumb iSCSI linked disks...
CPW per 1k Disk Operations 6 00 5 00 4 00 3 00 2 00 1 00 iSCSI IXS/IXA w Caching Disabled or Shared The charts shows the relative cost when performing 5 different types of operations Random write operations of a uniform size (512, 1k, ... 64k). Random read operations of a uniform size (512, 1k, ...
A storage space which is linked as shared, or a disk with caching disabled, requires more CPU to process write operations (approx. 45%). Sequential operations cost approximately 10% less than the random I/O results shown above. Even though a Windows disk driver may have write cache enabled, some Windows applications may request to bypass the cache for some operations (extended writes), and these operations would incur the higher CPW cost.
The blue square line shows an iSCSI connection with a single target iSCSI HBA - single initiator iSCSI HBA connection, configured to run with standard frames. The pink circle line is a single target iSCSI HBA to multiple servers and initiators running also running with standard frames. With the initiators and switches configured to use 9k jumbo frames, a 15% to 20% increase in upper capacity is demonstrated.
than an IXS or IXA attached VE connection. “Stream” means that the data is pushed in one direction, with only the TCP acknowledge packets running in the other direction. PP TCP Stream i5/OS to Windows iscsi jumbo 1000 iscsi standard IXS/IXA external NICs Transaction Size (bytes)
The chart above shows the CPW efficiency of operations (larger is better). Note the CPW per Mbits/sec scale on the left - as it’s different for each chart. For an IXS or IXA, the port-based VE has the least CPW or smaller packets due to consolidation of transfers available in Licensed Internal Code.
Windows 2000 Server, Windows Server 2003 or with Intel Linux editions. They provide flexible consolidation of System i solutions and Windows or Linux services, in combination with improved hardware control, availability, and reduced maintenance costs.
Page 293
Choose V5R4. In the “Contents” panel choose “iSeries Information Center”. Expand “Integrated operating environments” and then “Windows environment on iSeries” for Windows environment information or “Linux” and then “Linux on an integrated xSeries solution for Linux Information on an IXS or attached xSeries server. Microsoft Hardware Compatibility Test URL: http://www.microsoft.com/whdc/hcl/search.mspx search on IBM for product types Storage/SCSI Controller and System/Server Uniprocessor.
Chapter 18. Logical Partitioning (LPAR) 18.1 Introduction Logical partitioning (LPAR) is a mode of machine operation where multiple copies of operating systems run on a single physical machine. A logical partition is a collection of machine resources that are capable of running an operating system. The resources include processors (and associated caches), main storage, and I/O devices.
Allocate fractional CPUs wisely. If your sizing indicates two partitions need 0.7 and 0.4 CPUs, see if there will be enough remaining capacity in one of the partitions with 0.6 and 0.4 or else 0.7 and 0.3 CPUs allocated. By adding fractional CPUs up to a "whole" processor, fewer physical processors will be used.
The reasons for the LPAR overhead can be attributed to contention for the shared memory bus on a partitioned system, to the aggregate bandwidth of the standalone systems being greater than the bandwidth of the partitioned system, and to a lower number of system resources configured for a system partition than on a standalone system.
Page 297
Also note that part of the performance increase of an larger system may have come about because of a reduction in contention within the CPW workload itself. That is, the measurement of the standalone 12-way system required a larger number of users to drive the system’s CPU to 70 percent than what is required on a 4-way system.
Page 298
LPAR Throughput Increase Total Increase in CPW Capacity of an LPAR System 5400 5300 5200 5100 5000 4900 4800 4700 4600 12-way Figure 18.2. 12 way LPAR Throughput Example To illustrate the impact that varying the workload in the partitions has on an LPAR system, the CPW workload was run at an extremely high utilization in the stand-alone 12-way.
18.4 LPAR Measurements The following chart shows measurements taken on a partitioned 12-way system with the system’s CPU utilized at 70 percent capacity. The system was at the V4R4M0 release level. Note that the standalone 12-way CPW value of 4700 in our measurement is higher than the published V4R3M0 CPW value of 4550.
Page 300
The following chart shows projected LPAR capacities for several LPAR configurations. The projections are based on measurements on 1 and 2 way measurements when the system’s CPU was utilized at 70 percent capacity. The LPAR overhead was also factored into the projections. The system was at the V4R4M0 release level.
Chapter 19. Miscellaneous Performance Information 19.1 Public Benchmarks (TPC-C, SAP, NotesBench, SPECjbb2000, VolanoMark) iSeries systems have been represented in several public performance benchmarks. The purpose of these benchmarks is to give an indication of relative strength in a general field of computing. Benchmark results can give confidence in a system's capabilities, but should not be viewed as a sole criterion for the purchase or upgrading of a system.
Page 302
The most commonly run of these is the SAP-SD (Sales and Distribution) benchmark. It can be run in a 2-tier environment, where the application and database reside on the same system, or on a 3-tier environment, where there are many application servers feeding into a database server. Care must be taken to ensure that the same level of software is being run when comparing results of SAP benchmarks.
This web site is primarily focused on results for systems that the Volano company measures themselves. These results tend to be for much smaller, Intel-based systems that are not comparable with iSeries servers. The web site also references articles written by other groups regarding their measurements of the benchmark, including AS/400 and iSeries articles.
Page 304
of relatively lower delay cost. Waiting Time The waiting time is used to determine the delay cost of a job at a particular time. The waiting time of a job which affects the cost is the time the job has been waiting on the TDQ for execution. Delay Cost Curves The end-user interface for setting job priorities has not changed.
Page 305
Priority 47-51 Priority 52-89 Priority 90-99 Jobs in the same group will have the same resource (CPU seconds and Disk I/O requests) usage limits. Internally, each group will be associated with one set of delay cost curves. This would give some preferential treatment to jobs of higher user priorities at low system utilization.
less CPU utilization resulting in slightly lower transaction rates and slightly longer response times. However, the batch job gets more CPU utilization and consequently shorter run time. It is recommended that you run with Dynamic Priority Scheduling for optimum distribution of resources and overall system performance.
of printers in the configuration. 70% of the remaining memory is allocated to the interactive pool; 30% to the base pool. A QPFRADJ value of 1 ensures that memory is allocated on the system in a way that the system will perform adequately at IPL time.
files of differing characteristics are being accessed. The pool attribute can be changed from *FIXED to *CALC and back at any time, so making a change and evaluating its affect over a period of time is a fairly safe experiment. More information about Expert Cache can be found in the Work Management guide.
Page 309
To determine a reasonable level of page faulting in user pools, determine how much the paging is affecting the interactive response time or batch throughput. These calculations will show the percentage of time spent doing page faults. The following steps can be used: (all data can be gathered w/STRPFRMON and printed w/PRTSYSRPT). The following assumes interactive jobs are running in their own pool, and batch jobs are running in their own pool.
NOTE: It is very difficult to predict the improvement of adding storage to a pool, even if the potential gain calculated above is high. There may be instances where adding storage may not improve anything because of the application design. For these circumstances, changes to the application design may be necessary.
Page 312
Conclusions/Recommendations for NetFinity 1. The time to collect hardware or software information for a number of clients is fairly linear. 2. The size of the AS/400 CPU is not a limitation. Data collection is performed at a batch priority. CPU utilization can spike quite high (ex.
Chapter 20. General Performance Tips and Techniques This section's intent is to cover a variety of useful topics that "don't fit" in the document as a whole, but provide useful things that customers might do or deal with special problems customers might run into on iSeries.
Page 314
Problem It is too easy to use the overall pool's value of MAXACT as a surrogate for controlling the number of Jobs. That is, you can forget the distinction between jobs and threads and use MAXACT to control the activity in a storage pool. But, you are not controlling jobs; you are controlling threads. It is also too easy to have your existing MAXACT set too low if your existing QBATCH subsystem suddenly sees lots of new Java threads from new Java applications.
20.2 General Performance Guidelines -- Effects of Compilation In general, the higher the optimization, the less easy the code will be to debug. It may also be the case that the program will do things that are initially confusing. In-lining For instance, suppose that ILE Module A calls ILE Module B.
20.3 How to Design for Minimum Main Storage Use (especially with Java, C, C++) The iSeries family has added popular languages whose usage continues to increase -- Java, C, C++. These languages frequently use a different kind of storage -- heap storage. Many iSeries programmers, with a background in RPG or COBOL are unaware of the influence this may have on storage consumption.
Where a and b are constants. “a” is determined by adding up things like the static storage taken up by the application program. “b” is the size of the data base record plus the size of anything else, such as a Java object, that is created one entity per data base record.
Order(1) Order(j) ILE and OS/400 Just In Time compiled Programs programs (Java *JIT) Subsystem Descriptions Total Job Storage Direct Execution Java Static storage from Programs RPG and COBOL. Static final in Java. System values Java Virtual Machine and most WebSphere storage A Brief Example To show these concepts, consider a simple example.
How practical this change would be, if it represented a large, existing data base, would be a separate question. If this is at the initial design, however, this is an easy change to make. Boundary considerations. In Java, we are done because Java will order the three entities such that the least amount of space is wasted.
One thing easily misunderstood is variable length characters. At first, one would think every character field should be variable length, especially if one codes in Java, where variable length data is the norm. However, when one considers the internals of data base, a field ought to be ten to twenty bytes long before variable length is even considered.
20.4 Hardware Multi-threading (HMT) Hardware multi-threading is a facility present in several iSeries processors. The eServer i5 models instead have the Simultaneous Multi-threading (SMT) facility, which are discussed in the SMT white paper at the following website: http://www-1.ibm.com/servers/eserver/iseries/perfmgmt/pdf/SMT.pdf. HMT is mentioned here primarily to compare-and-contrast with the SMT. Moreover, several system facilities operate slightly differently on HMT machines versus SMT machines and these differences need some highlighting.
HMT and SMT Compared and Contrasted Some key similarities and differences are: HMT Feature HMT is can be turned on and off only by a whole system IPL. All partitions have the same value for HMT HMT executes only one instruction stream at a time.
20.5 POWER6 520 Memory Considerations Because of the design of the Power6 520 system, there are some key factors with the memory subsystem that one should keep in mind when sizing this system. The Power6 520, unlike the Power6 570, has no L3 cache, which does have an effect on memory sensitive workloads, like Java applications for instance.
activation time. This means that a partition that requires 4 GB of memory could be assigned 2 GB from the quad with 4 GB DIMMs and the other 2 GB from the quad with 8 GB DIMMs. This too can cause an application to have different performance characteristics on partitions configured with exactly the same amount of resources.
Page 325
floating-point data may be copied using the floating-point loads and store, resulting in an alignment interrupt. As an example, consider the following structures, one specifying "packed" and the other allowed to be aligned per the compiler. For example: struct FPAlignmentStruct Packed long FloatingPointOp1;...
Chapter 21. High Availability Performance The primary focus of this chapter is to present data that compares the effects of high availability scenarios using different hardware configurations. The data for the high availability test are broken down into two different categories which include Switchable IASP’s, and Geographic Mirroring. High Availability Switchable Resources Considerations Switchable IASPs are the physical resource that can be switched between systems in a cluster.
Page 327
· Inactive switchover - The switching time is measured from the point at which the CHGCRGPRI command is issued from the primary system which has no work until the IASP is available on the new primary system. · Partition - An active partition is created by starting the database workload on the IASP. Once the workload is stabilized an option 22(force MSD) is issued on the panel.
Switchover Measurements NOTE: The information that follows is based on performance measurements and analysis done in the Server Group Division laboratory. Actual performance may vary significantly from these tests. Switchable IASP’s using Hardware Resources Time Required to Switch the IASP using Hardware Resources Inactive Switchovers Time(Minutes) 4:31...
Page 329
Active State: In geographic mirroring, pertaining to the configuration state of a mirror copy that indicates geographic mirroring is being performed, if the IASP is online. Workload Description Synchronization: This workload is performed by starting the synchronization process on the source side from an unsynchronized geographic mirrored IASP.
Page 330
Workload Configuration The wide variety of hardware configurations and software environments available make it difficult to characterize a ‘typical’ high availability environment and predict the results. The following section provides a simple description of the high availability test. Large System Configuration Hardware Configuration System A Model...
Page 331
Geographic Mirroring Measurements NOTE: The information that follows is based on performance measurements and analysis done in the IBM Server Group Division laboratory. Actual performance may vary significantly from this test. Synchronization on an idle system: The following data shows the time required to synchronize 1 terabyte of data. This test case could vary greatly depending on the speed and latency of communication between the two systems.
Page 332
Geographic Mirroring Tips • For a quicker switchover time, keep the user-ID (UID) and group-ID (GID) of user profiles that own objects on the IASP the same between nodes of the cluster group. Having different UID’s lengthens vary on times. •...
Chapter 22. IBM Systems Workload Estimator 22.1 Overview The IBM Systems Workload Estimator (a.k.a., the Estimator or WLE), located at: http://www.ibm.com/systems/support/tools/estimator, is a web-based sizing tool for System i, System p, and System x. You can use this tool to size a new system, to size an upgrade to an existing system, or to size a consolidation of several systems.
typical disclaimers that go with any performance estimate ("your experience might vary...") are especially true. We provide these sizing estimates as general guidelines only. 22.2 Merging PM for System i data into the Estimator The Measured Data workload of the Estimator is designed to accept data from various data sources. The most common ones are the PM for System i™...
Page 335
account features like detailed journaling, resource locking, single-threaded applications, time-limited batch job windows, or poorly tuned environments. The Estimator is a capacity sizing tool. Even though it does not represent actual transaction response times, it does adhere to the policy of giving recommendations that abide by generally accepted utilization thresholds.
Appendix A. CPW and CIW Descriptions "Due to road conditions and driving habits, your results may vary." "Every workload is different." These are two hallmark statements of measuring performance in two very different industries. They are both absolutely correct. For iSeries and AS/400 systems, IBM has provided a measure called CPW to represent the relative computing power of these systems in a commercial environment.
Page 337
CPW Application Description The CPW application simulates the database server of an online transaction processing (OLTP) environment. Requests for transactions are received from an outside source and are processed by application service jobs on the database server. It is based, in part, on the business model from benchmarks owned and managed by the Transaction Processing Performance Council.
A.2 Compute Intensive Workload - CIW Unlike CPW values, CIW values are not derived from specific measurements of a single workload. They are modeled projections which are based upon the characteristics of internal workloads such as Domino workloads and application server environments such as can be found with SAP or JDEdwards applications.
Page 339
category that often fits into the CIW-like classification is overnight batch. Even though batch jobs often process a great deal of database work, there are relatively few jobs which means there is little switching of jobs from processor to processor. As a result, overnight batch data processing jobs sometimes act more like compute-intensive jobs.
Appendix B. System i Sizing and Performance Data Collection Tools The following section presents some of the alternative tools available for sizing and capacity planning. (Note: There are products from vendors not included here that perform similar functions.) All of the tools discussed here support the current range of System i products, and include the capability to model logical partitions, partial processors (micropartitions) and server workload consolidation.
B.1 Performance Data Collection Services Collecting performance data with Collection Services is an operating system function designed to run continuously that collects system and job level performance data at regular intervals which can be set from 15 seconds to 1 hour. It runs a number of collection routines called probes which collect data from many system resources including jobs, disk units, IOPs, buses, pools, and communication lines.
predefined profile containing commonly used categories. For example, if you do not have a need to monitor the performance of SNADS transaction data on a regular basis, you can choose to turn that category off so that SNADS transaction data is not collected. Since Collection Services is intended to be run continuously and trace mode is not, trace mode was not integrated into the start options of Collection Services.
Page 343
http://www.ibm.com/servers/eserver/iseries/perfmgmt/batch.html Unzip this file, transfer to your System i platform as a save file and restore library QBCHMDL. Add this library to your library list and start the tool by using the STRBCHMDL command. Tips, disclaimers, and general help are available in the QBCHMDL/README file. It is recommended that you work closely with your IBM Technical Support Representative when using this tool.
Appendix C. CPW and MCU Relative Performance Values for System i This chapter details the relative system performance values: Commercial Processing Workload (CPW). For a detailed description, refer to Appendix A, “CPW Benchmark Description”. CPW values are relative system performance metrics and reflect the relative system capacity for the CPW workload.
C.1 V6R1 Additions (October 2008) C.1.1 CPW values for the IBM Power Systems Table C.1.1. CPW values for Power System Models Processor Chip Speed Model Feature 570 (9117-MMA) 7387 570 (9117-MMA) 7388 *Note: 1. These models have a dedicated L2 cache per processor core, and share the L3 cache between two processor cores.
2. Memory speed differences account for some slight variations in performance difference between models. 3. CPW values for Power System models introduced in October 2008 were based on IBM i 6.1 plus enhancements in post-release PTFs. C.1.4 CPW values for IBM Power Systems Table C.1.4.
Table C.3.1. CPW values for Power System Models Model 520 (9407-M15) 520 (9408-M25) 550 (9409-M50) *Note: 1. These models have a dedicated L2 cache per processor core, and share the L3 cache between two processor cores. 2. The range of the number of processor cores per system. C.3.2 CPW values for IBM BladeCenter JS12 - IBM i operating system Table C.3.2.
Table C.4.1. IBM BladeCenter models Server Blade Model Feature JS22 (7998-61X) JS22 (7998-61X) *Note: 1. These models have a dedicated L2 cache per processor core, and no L3 cache 2. CPW value is for a 3-core dedicated partition and a 1-core VIOS CPW value is for a 3.7-core partition with shared processors and a 0.3-core VIOS partition C.5 V5R4 Additions (July 2007) IBM System i...
8. The 64-way is measured as two 32-way partitions since i5/OS does not support a 64-way partition. 9. IBM stopped publishing CIW ratings for iSeries after V5R2. It is recommended that the IBM Systems Workload Estimator be used for sizing guidance, available at: http://www.ibm.com/eserver/iseries/support/estimator C.8 V5R2 Additions (February, May, July 2003) New for this release is a product line refresh of the iSeries hardware which simplifies the model structure...
C.8.2 Model 810 and 825 iSeries for Domino (February 2003) Table C.8.2.1. iSeries for Domino 8xx Servers Chip Speed Model 825-2473 (7416) 1100 825-2473 (7416) 1100 810-2469 (7428) 810-2467 (7410) 810-2466 (7407) *Note: 1. 5250 OLTP CPW - With a rating of 0, adequate interactive processing is available for a single 5250 job to perform system administration functions.
C.10.4 Capacity Upgrade on-demand Models New in V4R5 (December 2000) , Capacity Upgrade on Demand (CUoD) capability offered for the iSeries Model 840 enables users to start small, then increase processing capacity without disrupting any of their current operations. To accomplish this, six processor features are available for the Model 840. These new processor features offer a Startup number of active processors;...
C.11 V4R5 Additions For the V4R5 hardware additions, the tables show each new server model characteristics and its maximum interactive CPW capacity. For previously existing hardware, the tables show for each server model the maximum interactive CPW and its corresponding CPU % and the point (the knee of the curve) where the interactive utilization begins to increasingly impact client/server performance.
C.11.4 SB Models Table C.11.4.1 SB Models Chip Speed Model SB2-2315 SB3-2316 SB3-2318 * Note: The "Processor CPW" values listed for the SB models are identical to the 830-2403-1531 (8-way), the 840-2418-1540 (12-way) and the 840-2420-1540 (24-way). However, due to the limited disk and memory of the SB models, it would not be possible to measure these values using the CPW workload.
CPU % used by Processor @ Knee = 100 - CPU % used by Interactive @ Knee CPU % used by Interactive @ Max = Max Interactive CPW / Processor CPW * 100 Table C.12.2.1 Current Model 170 Servers Chip...
Note that a constrained (c) CPW rating means the maximum memory or DASD configuration is the constraining factor, not the processor. An unconstrained (u) CPW rating means the processor is the first constrained resource. Table C.12.2.3 Previous Model 170 Servers Constrain / Client / Server...
C.13 AS/400e Model Sxx Servers For AS/400e servers the knee of the curve is about 1/3 the maximum interactive CPW value. Table C.13.1 AS/400e Servers Model Feature # CPUs 2118 2119 2161 2163 2165 2166 2257 2258 2259 2260 2207 2208 2256 2261...