Saturday, February 6, 2010
Misconceptions About the Requirements to Accuracy and Value of Modeling
We often hear questions about the accuracy of models and value of modeling. Hardware is cheap so why bother with spending time and effort on modeling. Accuracy of modeling results depends on many factors. If modeling results cannot guarantee 100% accuracy, why bother with data collection, workload characterization, workload forecasting, and building and calibrating models.
My typical answer is: “How can you manage your system effectively if you do not know what to expect?”
Let’s review factors affecting model accuracy, the role of modeling, limitations of commercial modeling tools, and success stories of our customers when modeling results helped to compare options, justify decisions, reduce risk of surprises and enable organization of the continuous proactive performance management.
There is no doubt that benchmark tests can provide more accurate results than modeling, but benchmark-test preparation time and its cost prohibit use of the benchmarks to justify every decision. Modeling complements benchmarks and expands the horizon of performance evaluation.
Indeed, if you are at a fork in the road and have to make a decision about how to reach the destination sooner, you do not need 100% accuracy in measurements of the distance to the destination to make a decision. You just need the answer to the question “which path is shorter?”
Work on building model and evaluating modeling results create a collaborative environment between DBAs, capacity planners, architects, managers, and other representatives of the business and IT in describing business requirements/workloads, setting realistic SLOs and evaluating prediction results. It helps to understand the environment, and avoid mistakes in making decisions. Modeling results enable comparison of the actual results with expected.
Modeling Overhead
Commercial modeling tools perform similar functions, including data collection, workload characterization, workload forecasting, scenario planning, model calibration and model evaluation. Tools include:
• BEZVision
• BMC Perform Predict
• Hyperformix
• TeamQuest
• Metron
• OptNet
• VMWare
Typical Overhead of Modeling Tools: 1-3%
Typical Cost of Modeling Tool: $100K - $500K
Typical Manpower Required to Support Modeling Tool: 0.2 – 1 FTE
Differences between Modeling Tools Affecting Modeling Accuracy
• Accuracy of measurement data
• Workload characterization and ability to represent non-exponential distribution of the arrival rate, service time
• Ability to represent current and planned IT Infrastructure, including hardware configuration, software configuration, database design, parallel processing
• Ability to take into consideration the interdependence between workloads and servers in a multi-tier distributed environment
• Ability to correlate workload performance, resource utilization and data usage profiles in multi-tier environment
• Use of the open queueing network models assuming that the arrival rate is constant, regardless of server utilization
• Use of the closed queueing network models
• Ability to predict how change in database design and hardware upgrade will affect the level of parallelism within DBMS servers
• Ability to identify users, programs, tables and SQL requests that will cause performance in the future
• Ability to take into consideration the impact, not only of hardware configuration, but also changes/tuning of software parameters affecting the level of concurrency within application tier and DBMS tier.
• Ability to predict the impact of the database tuning
• Ability to predict the impact of the expected growth and change in hardware and software configurations, not only on usage of resources, but also on response time and throughput.
• Ability to model virtualized environments and predict the impact of planned growth and changes on Hypervisor overhead and scalability of the virtualized environment
• Ability to predict how change in pattern of usage of data and level of parallel processing will affect the performance of different workloads
• Ability to predict how change in priority and level of concurrency will affect workload performance
• Accuracy of workload forecasting
• Ability to extract performance measurement data during stress testing and predict the impact of new application implementation in production environment
How Modeling Tools Support Virtualization
VMWare capacity planning tool focuses on application tier. VMware predicts the resource consumption of virtualization and helps to determine size of the host server required to support expected workload growth. VMware does not predict how virtualization will affect the response time and throughput of the individual workloads. It does not take into consideration how expected workload growth will affect the contention for DBMS server resources and affect the response time. It does not take into consideration the interdependence between workloads, interdependence between servers of application tier and DBMS tier, impact of the changes of the software parameters, impact of the changes/tuning database design.
TeamQuest uses a simplified approach in representing workloads and modeling. It does not take into consideration the details of architecture and workloads affecting the parallel processing and concurrency in workload management of DBMS server.
BEZVision, in contrast to other modeling tools, uses closed queueing network models and focuses on detailed representation of both application tier and DBMS tier distributed environments, enabling it to predict the impact of different changes, not only on resource usage, but also on response time and throughput for individual workloads. BEZVision workload characterization automates building hourly performance, resource utilization and data usage profiles for each workload. BEZVision has been modeling complex systems based on parallel processing for years. BEZVision takes into consideration the impact of workload growth and increase in the number of VMs on Hypervisor overhead and scalability of the application servers.
Value of Modeling
Accuracy of modeling is limited but the value is high. It can help:
• Justify strategic capacity planning, tactical performance management and operational workload management decisions during the application and systems life cycle starting from the feasibility study, performance management, capacity planning, workload management, disaster recovery and finally, with server consolidation
• Organize collaborative process
• Reduce uncertainty and risk of surprises
• Answer “what is” questions, compare alternatives
• Predict the impact of the workload growth and increase in volume of data
• Justify hardware upgrade required to support SLO of multiple workloads in multi-tier distributed environment
• Predict the impact of virtualization
• Predict the impact of server consolidation
• Predict the impact of new application implementation
• Enable organization of continuous proactive performance management by comparing actual results with expected and developing corrective measures
BEZ Customers Success Stories
Companies in different industries successfully used our modeling technology to justify strategic capacity planning, tactical performance management and operational workload management decisions since 1992.
• Retail: Immediately after the announcement of our commercial modeling tools in 1992, major retailers Wal-Mart and Kmart started using the modeling tools to justify hardware configuration upgrades for complex systems supporting multiple workloads based on Massively Parallel Processing architecture.
• Wal-Mart—Rick Dolzel, VP of Operations, brought BEZPlus tools in to evaluate the impact of the expected workload and database size growth; justify hardware configurations; balance the utilization of the MPP environments and justify hardware upgrades.
• Kmart—Used BEZPlus to justify upgrades. John Hootman, Manager of Operations, wanted to incorporate database performance tuning functionality. Modeling was used to justify multiple multi-million configuration upgrades. The products not only demonstrated high accuracy in performance prediction, but also helped analysts to understand their complex environment and the impact of the expected workload and database growth. According to Dennis Cooper, Manager of DBAs, “Using BEZPlus, system analysts at Kmart, we were able to convince management that a two-phase, multi-million-dollar upgrade of its existing data warehouse was necessary. By going ahead with the upgrade to our data warehouse, we got an overall boost in performance of about 22%—exactly what the BEZ analysis had indicated we would get." According to John Hootman, who was head of operations at Kmart, the cost of the modeling tool is a drop in the bucket compared with its value.
• JC Penney—According to Barry Hicks, Manager of Corporate Capacity Planning, "Using BEZPlus, we were able to identify architectural bottlenecks that needed to be corrected. In the process, we balanced the utilization of our high capacity EMC2 storage disks and improved the throughput of the system.” He continues, “BEZ has incorporated our Statistical Process Control (SPC) approach into the CorpView product. This function gives us a strategic advantage that insures the efficient use of our corporate resources."
• Gap, Lowe’s, Sears, American Stores, Safeway, Kroger, The Limited and several other retail stores used BEZPlus and BEZVision to plan and manage their data warehouse environment supporting mixed workload environment.
• Communication: AT&T—uses BEZVision for both strategic and tactical purposes during planning and managing distributed environments, server consolidation to support growing business needs. Mike Bankowsky requested incorporation of the SQL tables use analysis and developing root cause analysis identifying SQL and tables that will cause problems in the future.
• VIVO is using BEZVision for capacity planning and performance management of the distributed environment
• Lucent Technologies, Sprint, SBC, NYNEX, Bell Canada, AT&T Canada—used BEZPlus for strategic planning, performance management and tuning, and for justification of hardware configuration upgrades.
• Companies from Finance: Visa, Bank of America, Canadian Imperial Bank of Commerce, The Royal Bank of Canada, Fidelity Investments, Hewitt Associates, Insurance: Blue Cross/Blue Shield, Nationwide Insurance, Prudential Insurance, CNA Insurance, Transportation: Delta and Canadian Airlines, Manufacturing: FedEx, Xerox, Packaging Corporation of America, 3M and others used our modeling solutions, Government: US Internal Revenue Service, Public Work Supply and Service, Canada, US Senate and US Department of Agriculture used our modeling solutions for capacity planning and performance management, Energy: South California Edison, Public Services Company of Colorado and American Electric Power used our modeling solutions, and Health Care: Intermountain Health Care, and Medco used our modeling solution for capacity management and performance tuning and evaluation of new application implementation impact
Monday, December 14, 2009
CMG Conference 2009
During CMG 2009 I presented Half a day Workshop "Hands on Workshop on Modeling and Optimization in Virtualized Multi-tier Environment"
Paper on "Capacity Management Opportunities for Oracle Database Machine Exadata v2"
and participated in panel discussion on "Role of modeling"
On Sunday I presented half a day hands on workshop . Each year I include in this workshop new examples reflecting latest announcements and challenges in planning and managing IT resources.
Outline of the workshop this year included:
Introduction and Objectives
Simple Analytical Queueing Network Model
Modeling Inputs
How to Predict the Impact of Expected Workload Growth and Hardware Upgrade
How Modeling Helps to Evaluate Performance of New Oracle, IBM DB2 and Teradata Data Warehouse Appliances
How Modeling Helps to Set Realistic SLO
How to Justify Performance Management and Tuning Measures
How to Predict Impact of New Application Implementation
How to Verify Accuracy of Modeling Results and Organize Continuous Proactive Service Level Management
Summary and Next Steps
Objectives of the workshop were:
Learn how simple open and closed queueing network models and commercial modeling tools can be used for proactive performance management of multi-tier virtualized distributed environments
Learn how to predict:
The impact of the expected workload and database size growth
The impact of implementation of new applications
The impact of virtualization and server consolidation
The impact of database, application and software tuning
Learn how modeling can help to justify:
Strategic capacity planning decisions
Tactical performance management and database tuning decisions
Operational workload management decisions
Learn how to organize a proactive performance management:
How to compare actual results with expected
How to compare options and develop timely corrective actions
Workshop is based on usage of examples presented on Excell spreadsheets.
Participants were answering the following questions:
Hardware’s cheap, so why do modeling?
Can you give an example of challenges related to strategic capacity planning,
tactical performance management and operational workload management
What is Utilization Law?
What is Response Time Law?
What is Little’s Law?What is the objective of data collection and workload characterization?
What is a workload performance profile?
What is a workload resource utilization profile?
What is a workload data usage profile?
What are the typical workload aggregation rules?
How do you predict the impact of the expected workload growth?
How do you justify hardware upgrade?
How do you predict the impact of virtualization?How do you compare scalability of the different DBMS and hardware platforms?How do you set up realistic SLO?
How do you negotiate SLA?How do you justify tuning efforts?
How does a change in the number of JVM threads affect performance?
How does an increase in connection pool size affect performance?
How does reduction in level of concurrency for one workload affect performance of other workloads?
How do you collect new application performance data?
How do you predict new application performance prior to implementation on a production system?
How do you predict what impact implementing a new application will have on current production workload?How do you improve modeling accuracy?
How do you verify modeling accuracy?
How do you set up realistic SLO and negotiate SLA?
How do you organize proactive SLM?
At the end of workshop participants have enough material to present a formal capacity management report with findings and recommendations.
Tuesday morning I presented a join paper with Charlie Garry on "Capacity Management Opportunities for Oracle Database Machine Exadata v2"
On September 15, 2009 Oracle announced the world’s first database appliance designed to run both OLTP and data warehousing workloads. Oracle’s Database Machine V2 is based on Sun hardware utilizing commodity components and x86 processors.
In this presentation we will review architecture and functionality of Database Machine enabling high performance and scalability and will discuss challenges of strategic capacity planning, tactical performance management and operational workload management for this environment.
In this paper we discussed:
Intro to the Oracle Database Machine V2
How to measure performance?
Challenges of the workload characterization
How to predict performance and justify capacity planning, performance management and workload management solutions
Case study, including:
How to define the best strategy, tactics and workload management to support SLOs effectively?
What will be the impact of the expected workload growth and changes, and when will the current system be out of capacity?
What will be the impact of implementing Oracle DB Machine v2?
What will be the impact hardware upgrade?
What will be the impact of performance tuning?
What will be the impact of new application?
What will be the impact of limiting CPU utilization?
What will be the impact of changing workload concurrency?
What will be the impact of changing workload priority?
What is the best plan of action?
We empasized that:
Oracle Database Machine includes smart scan, columnar data compression and flash memory significantly improving performance of storage subsystem and enabling concurrent support of OLTP and BI/DSS workloads
It includes mechanism allowing to change priority, resource allocation and concurrency for individual workloads
We demonstrated how BEZVision modeling and performance optimization results can be used to justify strategic capacity planning, tactical performance management and operational workload management decisions
It enables organization of the continuous proactive performance management process.
Wednesday, October 28, 2009
Our Paper at Teradata Partners Conference
According to Scott Gnau, Teradata Chief Development Officer, the new appliance based on multi-core Intel processor technology and the 64-bit SLES operating system will allow scaling from seven to 200 terabytes of user data.
Teradata announced versions of its Teradata Express software for Amazon's Elastic Compute Cloud (EC2) and VMware Player. Teradata Express provides Teradata developers and testers access to a database at no charge. This announcement directly competes with Greenplum’s "Enterprise Data Cloud" strategy.
Planning and managing a DW environment is difficult. Teradata TASM simplifies creation of rules to manage performance of mixed workload environments, but it can still be difficult to select the right TASM parameters capable of satisfying SLO for each workload. In our joint paper “Capacity Management and Optimization in TASM Environments” co-authored with Doug Brown we demonstrated that:
• It is easy to change TASM settings, but it is difficult to decide how to change values to satisfy SLGs for each workload.
• Modeling and optimization technology can be used to justify strategic capacity planning and tactical performance management decisions and set TASM rules to satisfy workloads SLGs
• Workload characterization and performance prediction results can be used to justify realistic SLGs, set throttling, priorities and resource allocation TASM rules and organize a continuous process of proactive Service Level Management.
Progress in technology provides many options to decision makers for planning, managing and controlling performance of critical applications supporting business processes. Even with the currently available tools, it is still difficult to evaluate different options and make the right decisions. The role of modeling and optimization is to automate the process of evaluation and provide information to justify capacity planning, performance management and workload management decisions and enable verification of actual results with expectations while helping define a process for continuous proactive performance management.
Our Presentation at Oracle Open World 2009
On Tuesday, I presented a paper that I co-authored with Alex Lupersolsky on “Modeling and Optimization for Multi-tier Virtualized Oracle Environments”. In this paper, we reviewed the challenges of planning and managing a complex environment, where easy-to-add hardware, changing software parameters controlling workload concurrency, priorities, and allocation of CPU and memory resources are all available but it is difficult to make decisions which will satisfy SLOs effectively.
We also reviewed several case studies illustrating the impact of workload growth and evaluating different options, including creation of RAC and an Oracle OLTP Database Machine V2.
We analyzed modeling results predicting the impact of migration ETL, OLTP, BI and archiving workloads to Oracle DB Machine v2. We used the following measurement data to build the model:
1) For each RAC node we used performance measurement data contained in GV$ views and Oracle OEM Grid Control:
• total physical CPU utilization
• the number of CPUs used
• total I/O rate in IOps and KBps
• the number of OS-visible disks
• read/write ratio
• average I/O operation response time
2) for each database instance per workload element (User/Program/Machine/Module):
• #executions
• total or average server response time per execution
• average number of parallel sessions (client sessions existing at the same time)
• parsing and execution CPU time consumed
• # physical IO operations with storage
• GV$ views contain information about master and slave sessions running in the same or different instances allowing us to estimate average intra-request parallelism and an average amount of data transferred between master and slave sessions trough a "node interconnect"
3) For each Exadata cell:
• arrival rate/throughput in number of SQL requests/hour,
• average response time,
• CPU utilization,
• number of logical and physical I/Os per hour per User/Program/Machine/Module
We discussed how modeling and optimization can be used to compare alternatives, justify and verify operational workload management, tactical performance tuning and strategic capacity planning decisions to ensure SLO support for the critical workloads.
We illustrated the importance of workload management. Without any constraints, low priority ETL workloads can monopolize resources. Workload management, database tuning and hardware configuration changes can all improve performance for one workload, but they also carry the risk of moving rather than eliminating bottlenecks and negatively affect other workloads. Strategic capacity planning, tactical performance management and operational workload management decisions should take into consideration the interdependence between servers and workloads and virtualization overhead.
It is impossible to manually evaluate all of the possible permutations of changes in concurrency, priority or resource allocation, database tuning or hardware upgrade options. We demonstrated how comparing the actual with expected results, based on modeling and optimization, enables organizations to practice continuous, proactive service level management.
Monday, August 31, 2009
Challenges of Teradata Workload Management in TASM Environment
Modeling and optimization technology can be used to justify not only strategic capacity planning, tactical performance management, but also operational workload management TASM parameters to satisfy workloads SLGs.
Let's review how workload characterization and performance prediction results can be used to justify realistic SLGs, set Concurrency/Throttling, Priorities and Resource Allocation TASM rules and organize continuous proactive Service Level Management.
Reducing the level of concurrency/throtling reduces the number of concurrently processed requests (Multi Programming Level (MPL) ), but it increase the number of requests waiting for the tread as it shown on Figure below:
One of the challenges is to find for each workload the approximation of the distribution of the probability requests in the system, number of requests waiting for service and number of requests being processed.
Below are performance prediction results illustrating how throttling for Batch workload can improve performance of other workloads

Change of priority for one of the workloads can improve it's performance, but negatively affect the performance of other workloads.
One of the approaches is to reduce priority for the not critical workloads using excesive amount of resources.
Hardware upgrade, change of the DBMS or OS release can change balance in usage of resources and it require reevaluation of the workload management TASM parameters.
Below are performance prediction results illustrating the impact of the proposed hardware upgrade and change workload management TASM parameters.
- As we can see the challenge in Teradata workload management is to coordinate selection of TASM parameters to satisfy SLGs for each workload.
- Modeling and optimization technology can be used to justify strategic capacity planning and tactical performance management decisions and set TASM rules to satisfy workloads SLGs
- Workload characterization and performance prediction results can be used to justify realistic SLGs, set Throtteling, Priorities and Resource Allocation TASM rules and organize continuous proactive Service Level Management.
Sunday, August 23, 2009
Hot Summer
During last several months, we've seen a significant burst of activity. Many customers, in spite of the budget cuts, are starting to evaluate how to streamline and optimize their IT operations. In the next couple of postings, I will describe several examples illustrating how analytic modeling technology is used to justify movement of workloads and data from one system to another, how modeling technology is used to reduce the risk of performance surprises during implementation of new applications, and why planning of hardware upgrades, changes of OS and migration to a new release of the DBMS should include reevaluation of the workload management rules. Next week I will be on vacation and plan to finish several papers.
I am working on paper for Oracle World 2009: "Modeling and Optimization of Virtualized Multi-Tier Distributed Environment. We will review the challenges of planning and managing complex multi-tier virtualized distributed environments with many interdependent servers supporting multiple workloads.
Any change in workload management, database tuning, or hardware upgrades can improve performance for one workload while also moving one or more bottlenecks to another server on another tier and negatively affect other workloads for variety of reasons:
- There many parameters you can change, including concurrency, priority or resource allocation by workload, you can change database design, create new indexes, materialized views or upgrade the hardware configuration
- It is impossible to evaluate all possible permutations of parameters
- We will discuss how modeling technology can answer specific "what if" questions
- We will also review how optimization technology iteratively and intelligently generates "what if" questions for the modeling engine to find what should be changed within workload management, performance tuning and hardware upgrades to satisfy SLOs for critical workloads
- We will also review how comparison of the actual results (after the change) with expected results enables organizations to implement a continuous proactive performance management process
Another paper I am preparing for the upcoming Teradata Partners Conference about the application of modeling and optimization for workload management and creation of the continuous, closed loop proactive performance management titled "Teradata Infrastructure Optimization in TASM Environment". In this paper we will discuss:
- Challenges of setting workload management Teradata TASM parameters
- The role of modeling and optimization in finding optimal workload management parameters to meet Service Level Goals for each workload
- Workload characterization in TASM environment
- Strategic capacity planning in a TASM environmen
- Tactical performance management in a TASM environment
- Operational workload management in a TASM environment
- How to optimize the selection of TASM throttling, priority and resource allocation rules based on SLG for each workload
- How to use performance prediction results to organize a continuous, closed loop proactive performance management process in a TASM environment
For CMG 2009 I am preparing a half day session titles "Hands on Workshop on Modeling and Optimization in Virtualized Multi-tier Environments". This is an intensive "hands on" workshop for performance management professionals who would like to learn how to build and apply analytic models to proactively manage the performance of applications in virtualized multi-tier environments based on VMware, WebLogic and WebSphere Application Servers as well as Oracle, DB2, Teradata and SQL Server Database Servers. During the workshop, attendees will learn how to build and apply analytic models to predict the impact of workload and database size, growth, the impact of implementing new applications, adding or moving VMs and upgrading hardware. We will not use our commercial modeling tool, BEZVision, but instead I will teach attendees how to use an Excel spreadsheet with prepared exercises to illustrate how to perform workload characterization, build simple analytic queueing network models, and apply modeling results to justify strategic capacity planning, tactical performance management and operational workload management recommendations. At the end of the workshop, participants will summarize results and will be ready to present a report with capacity management recommendations.
In addition, Tim R. Norton invited me to participate in a Panel Discussion at CMG titled "Hardware’s Cheap so Why Do Modeling?". I will be preparing some materials for this panel as well. The cost of hardware is rapidly trending down while other costs are rising even more rapidly. The result of this interplay is that cost saving opportunities are shrinking while the analysis takes increasingly more time, effort and money. This panel of world-renowned experts in application and systems modeling will candidly discuss this and other questions related to the future of modeling as a tool to achieve business objectives. Panel discussion areas:
- Hardware’s cheap so why do modeling at all?
- Does analysis cost more than just buying the hardware?
- How close is good enough?
- Hardware’s evermore powerful so why try for prediction precision?
- Business vs. Math: What’s the trade-off between political costs and technical value?
- How can the modeler find the tipping-point?
- How does application and infrastructure complexity affect the value of modeling?
- Is there such a thing as a “simple” model anymore?
- Is modeling headed to the clouds?
- Is traditional modeling at odds with the utility model of cloud computing?
- Where’s the value as datacenters move toward commodity pricing and “on demand” capacity?
- What’s driving the costs du jour?
- Can a modeling analysis effort be successful before it is superseded by the next management priority?
- How can modeling optimize multiple mutually exclusive objectives?
My wife does not know yet, but if I have time left between hiking and finishing papers, I have an obligation to prepare an abstract for CMG on a Late Breaking paper with Charlie Gary on "New application infrastructure modeling and optimization"
Wednesday, July 15, 2009
Predicting New Application Implementation Impact
Oracle Real Application Testing (RAT) allows you to capture, analyze and replay production transactions on a small test system to evaluate the impact of upgrades and system changes, including implementing a new OS or DBMS patch or version, the impact of the performance tuning, the impact of Database upgrades, patches, parameters, schema changes, configuration changes, such as conversion from a single instance to RAC, ASM, etc.
DBAs can test and upgrade data center infrastructure components. In fact, the goal of RAT is to assist DBAs in testing and identifying the full impact of upgrades and system changes and include them in a certification process.
Value of new application certification
· Identify potential problems with new application and justify changes required to be sure that new application will perform well and to be sure that existing applications will be able to meet their SLOs after new application implementation
· Organize collaboration between business people, application developers and IT management in setting realistic SLO, negotiating SLA and organizing proactive SLM
· Provide a basis for comparison of the actual with expected results and organizing a continuous Proactive Performance Management (PPM) process during application life cycle