Dell Configuration Information
Application tier scalability is achieved using a highly flexible, horizontally scaled infrastructure. Multiple instances of Blackboard Learn can be deployed on a single Dell server using either a virtualization package or bare-metal configuration to isolate Blackboard application instances from each other and to manage allocation of CPU and memory resources.
Each Dell server in the application tier can run multiple VMs, each with its own instance of the Blackboard Learn software. To achieve a high volume of concurrent user sessions, the architecture employs load balancing to distribute requests across all of the Blackboard Learn instances deployed on the virtualized servers.
The architecture takes advantage of 64-bit computing by using Java heap sizes that range from 2G to 16GB for each Blackboard application instance. Some customers on rare occasions have deployed Java Virtual Machines (JVMs) as large as 32GB in size. The Java heap had been limited to 1.7GB in a 32-bit environment, but this limitation is overcome in the 64-bit environment. Because this feature is now available for all platforms with Blackboard Learn Release 9.1, Blackboard strongly recommends that customers adopt 64-bit for their primary deployment OSs.
The architecture includes a recommended 8GB of memory for each VM, leaving up to 2GB for the Apache Web server, and up to 2GB for the OS and monitoring tools. This assumes a JVM heap size of 4GB. Obviously, a larger JVM heap size will require additional memory for the VM. The larger memory footprint for Blackboard Learn enables each application instance to more efficiently service a high volume of user requests. In other words, each application instance can scale vertically while the architecture can also be scaled horizontally by deploying additional VMs.
This “best of both worlds” approach takes advantage of many application threads in the application tier to service thousands of requests per minute. The recommended servers for the application tier are Dell PowerEdge™ R710 servers, which were chosen for their small footprint, low cost, and powerful performance with Intel 5500 series processors. The application tier could also be built using other Dell servers such as Dell PowerEdge R610 rack mountable servers or Dell PowerEdge M610 or M710 blade servers, all of which feature the latest Intel 5500 series processors.
The highly flexible and scalable application tier can help customers achieve the following:
- Lower costs: Increased utilization of servers enables good performance on a low-cost, consolidated infrastructure.
- Increased flexibility: VMs can be moved easily to other physical servers to redistribute workloads or recover quickly from a hardware failure.
- Faster Deployment: With fewer physical systems to setup and configure, less time is required for deployment.
Will Hyper-Threading Make a Difference?
Hyper-threading has been a feature-set in the Intel processor architecture for about a decade. In previous processor models, hyper-threading had limited advantages for performance improvements. With the 5500-series processor, the hyper-threading option yielded acceptable performance improvements when moving from 8 CPU threads (2 sockets x 4 cores) to 16 CPU threads (2 sockets x 4 cores x 2 logical hyper-threads). Hyper-threading is a viable, performance configuration worthy of running in a production environment.
Will Tomcat Clustering Make a Difference?
Tomcat Clustering is deprecated for Blackboard Learn 9.1 Service Pack 6, and will be removed from Blackboard Learn 9.1 Service Pack 8. For alternative solutions, see Optimizing Performance.
Customers should begin migrating from Tomcat clusters. Tomcat clustering was introduced for scalability purposes when the Blackboard Learn architecture was 32-bit and customers wanted the ability to increase their memory usage on a single server. With the option to virtualize on both 32-bit and 64-bit, Blackboard’s benchmarking efforts have moved away from Tomcat cluster deployments. Customers can achieve similar performance in a virtual environment on the same physical server with multiple virtual instances as with a bare metal configuration with many Tomcat cluster instances. The difference in configurations is a smaller demand on the Apache or IIS web server fronting the Tomcat instance(s). The option to deploy a 64-bit JVM with larger heap sizes has suppressed the need for customers to run in a cluster option.
Blackboard recommends that customers consider a deployment approach consisting of larger 64-bit JVMs that are distributed across physical servers with the option to virtualize the hardware to take advantage of the CPU and Memory capacity of these systems.
Sizing by CPU and Memory
The configuration table in this section includes options to size based on 2 CPU threads versus 4 CPU threads as well as an option to size based on a 32-bit OS versus 64-bit OS. As noted above, Blackboard recommends deploying Blackboard Learn in a 64-bit configuration.
Choosing the CPU capacity should be a decision based on two inputs:
- CPU Utilization: A metric based on an average utilization of the CPUs between two points in time
- Run Queue: Responsible for queuing CPU operations while waiting for the CPU to process a request
CPU utilization is a topic argued by many as to what percentage value is optimal for a production system. The goal should be to use as much of the CPU available. So, an obvious goal is to identify a workload that will fully utilize the CPU, but not at the expense of having a high run queue. For every request that is stuck in the run queue, latency will be added until the CPU can service the request. CPU utilization of 100% is only one factor that will force a request into a wait state.
A 2 CPU thread deployment is acceptable for a 32-bit deployment and would be the minimum configuration requirement for a 2GB JVM in a 64-bit address space. Customers seeking larger JVMs in the range of 4GB to 16GB should consider a 4 CPU thread deployment approach because of better throughput processing of Java garbage collection in parallel. Response times will improve with a larger heap and more available CPU threads for processing. Allocations of more than 4 CPUs did not yield major performance improvements and, therefore, are not recommended.
Customers should be aware of their run queue for capacity management. A run queue equal or slightly above the CPU count is acceptable at normal times and during peak. When the run queue is double or even greater the CPU count, customers should strongly consider allocating more CPUs to the configuration or increasing the number of instances of the application. Special care and attention should be applied to the database when making the decision to introduce additional application servers. Impact can be seen on the memory footprint of the database, as well as the load average of the system because of the need for more remote connections and processes.
Allocations of memory should be based on need and rate of collection. The application is 100% Java-based and thus has a greater demand for memory within the JVM than in previous versions of the application. A 32-bit 1.7GB JVM simply will not satisfy the performance demands of the user community as it was capable of doing in past releases with a hybrid technology stack that included PERL. Therefore, customers deploying with a 32-bit configuration will require greater amounts of equipment to support faster responsiveness and larger concurrency. A 64-bit JVM sized between 4GB and 16GB will provide serviceable response times with less latency. Memory tends to be the primary resource constraint. As a result, Blackboard recommends using a larger JVM with the performance options defined in the Release Notes for the release that you are installing.
Bare-Metal or Virtual Configurations
The sizing table in this section makes no assumptions about whether a deployment is virtual or bare-metal. The term bare-metal implies a single operating system installed on a single physical server. Clients have the option to deploy their environments in either a virtual or bare-metal configuration. The CPU and Memory capacity of the recommended Dell systems is ideal for virtualized deployments. Customers will be able to save substantially year after year by choosing to deploy in a virtualized manner.
In the paper From Vision to Reality—Online Learning in a Completely Digital World, Blackboard deployed up to four installations of Blackboard Learn on a single R710 server. Each VM was configured to provide four virtual CPUs to each installation, as well as 8GB of memory to the guest OS. The Blackboard Learn JVM was configured for 64-bit addressable memory and a 4GB maximum heap size. The consolidation of four VMs to one physical server loaded with 48 GB of memory instead of purchasing four physical servers with 8GB to 12GB of memory can result in a more than 300% cost savings. The cost of memory justifies the overall savings of purchasing fewer, under-sized systems.
Preparing for the Westmere EP-Processor and R910 Nehalem
At the time this document was published, Dell had yet to release the 5600-series (Westmere EP-Processor) for the R610, R710, M610 and M710 systems. In addition, the R910 Nehalem, which leverages an 8-core processor (Intel® Xeon® X7560), was not ready for production. Changes will be made to this information to reflect the new processors and models recommended for sizing purposes. The following changes will be made after these new CPUs and models become available:
Intel® Xeon® 5520 series (2.26Ghz, 8M Cache, Turbo, HT, 1066MHz Max Memory)
Intel® Xeon® E5620 series (2.40GHz, 12M cache, Turbo, HT, 1066MHz Max Memory)
Intel® Xeon® X5550 (2.66Ghz, 8M Cache, Turbo, HT, 1333MHz Max Memory)
Intel® Xeon® X5650 (2.66GHz, 12M cache, Turbo HT, 1333MHz Max Memory) – 6-core
Intel® Xeon® X7460 (2.67GHz, 16M Cache, 1066Mhz FSB)
Intel® Xeon® X7560 (2.26GHz), 24M Cache, 1066Mhz FSB) – 8-core