Dell Configuration Information
Application tier scalability is achieved using a highly flexible, horizontally scaled infrastructure. Multiple instances of Blackboard Learn can be deployed on a single Dell server using either a virtualization package or bare-metal configuration to isolate Blackboard application instances from each other and to manage allocation of CPU and memory resources.
Each Dell server in the application tier can run multiple VMs, each with its own instance of the Blackboard Learn software. To achieve a high volume of concurrent user sessions, the architecture employs load balancing to distribute requests across all of the Blackboard Learn instances deployed on the virtualized servers.
The architecture takes advantage of 64-bit computing by using Java heap sizes that range from 2G to 16GB for each Blackboard application instance. Some customers on rare occasions have deployed Java Virtual Machines (JVMs) as large as 32GB in size.
The architecture includes a recommended 8GB of memory for each VM, leaving up to 2GB for the Apache Web server, and up to 2GB for the OS and monitoring tools. This assumes a JVM heap size of 4GB. Obviously, a larger JVM heap size will require additional memory for the VM. The larger memory footprint for Blackboard Learn enables each application instance to more efficiently service a high volume of user requests. In other words, each application instance can scale vertically while the architecture can also be scaled horizontally by deploying additional VMs.
This “best of both worlds” approach takes advantage of many application threads in the application tier to service thousands of requests per minute. The recommended servers for the application tier are Dell PowerEdge™ R710 servers, which were chosen for their small footprint, low cost, and powerful performance with Intel 5500 series processors. The application tier could also be built using other Dell servers such as Dell PowerEdge R610 rack mountable servers or Dell PowerEdge M610 or M710 blade servers, all of which feature the latest Intel 5500 series processors.
The highly flexible and scalable application tier can help customers achieve the following:
- Lower costs: Increased utilization of servers enables good performance on a low-cost, consolidated infrastructure.
- Increased flexibility: VMs can be moved easily to other physical servers to redistribute workloads or recover quickly from a hardware failure.
- Faster Deployment: With fewer physical systems to setup and configure, less time is required for deployment.
Will Hyper-Threading Make a Difference?
Hyper-threading has been a feature-set in the Intel processor architecture for about a decade. In previous processor models, hyper-threading had limited advantages for performance improvements. With the 5500-series processor, the hyper-threading option yielded acceptable performance improvements when moving from 8 CPU threads (2 sockets x 4 cores) to 16 CPU threads (2 sockets x 4 cores x 2 logical hyper-threads). Hyper-threading is a viable, performance configuration worthy of running in a production environment.
Sizing by CPU and Memory
The configuration table in this section includes options to size based on 2 CPU threads versus 4 CPU threads.
Choosing the CPU capacity should be a decision based on two inputs:
- CPU Utilization: A metric based on an average utilization of the CPUs between two points in time
- Run Queue: Responsible for queuing CPU operations while waiting for the CPU to process a request
CPU utilization is a topic argued by many as to what percentage value is optimal for a production system. The goal should be to use as much of the CPU available. So, an obvious goal is to identify a workload that will fully utilize the CPU, but not at the expense of having a high run queue. For every request that is stuck in the run queue, latency will be added until the CPU can service the request. CPU utilization of 100% is only one factor that will force a request into a wait state.
A 2 CPU thread deployment is the minimum acceptable configuration requirement for a 2GB JVM in a 64-bit address space. Customers seeking larger JVMs in the range of 4GB to 16GB should consider a 4 CPU thread deployment approach because of better throughput processing of Java garbage collection in parallel. Response times will improve with a larger heap and more available CPU threads for processing. Allocations of more than 4 CPUs did not yield major performance improvements and, therefore, are not recommended.
Customers should be aware of their run queue for capacity management. A run queue equal or slightly above the CPU count is acceptable at normal times and during peak. When the run queue is double or even greater the CPU count, customers should strongly consider allocating more CPUs to the configuration or increasing the number of instances of the application. Special care and attention should be applied to the database when making the decision to introduce additional application servers. Impact can be seen on the memory footprint of the database, as well as the load average of the system because of the need for more remote connections and processes.
Allocations of memory should be based on need and rate of collection. The application is 100% Java-based and thus has a greater demand for memory within the JVM than in previous versions of the application. A 64-bit JVM sized between 4GB and 16GB provides serviceable response times with minimal latency. Memory tends to be the primary resource constraint. As a result, Blackboard recommends using a larger JVM with the performance options defined in the Release Notes for the release that you are installing.
Bare-Metal or Virtual Configurations
The sizing table in this section makes no assumptions about whether a deployment is virtual or bare-metal. The term bare-metal implies a single operating system installed on a single physical server. Clients have the option to deploy their environments in either a virtual or bare-metal configuration. The CPU and Memory capacity of the recommended Dell systems is ideal for virtualized deployments. Customers will be able to save substantially year after year by choosing to deploy in a virtualized manner.
In the paper From Vision to Reality—Online Learning in a Completely Digital World, Blackboard deployed up to four installations of Blackboard Learn on a single R710 server. Each VM was configured to provide four virtual CPUs to each installation, as well as 8GB of memory to the guest OS. The Blackboard Learn JVM was configured for 64-bit addressable memory and a 4GB maximum heap size. The consolidation of four VMs to one physical server loaded with 48 GB of memory instead of purchasing four physical servers with 8GB to 12GB of memory can result in a more than 300% cost savings. The cost of memory justifies the overall savings of purchasing fewer, under-sized systems.
Preparing for the Westmere EP-Processor and R910 Nehalem
At the time this document was published, Dell had yet to release the 5600-series (Westmere EP-Processor) for the R610, R710, M610 and M710 systems. In addition, the R910 Nehalem, which leverages an 8-core processor (Intel® Xeon® X7560), was not ready for production. Changes will be made to this information to reflect the new processors and models recommended for sizing purposes. The following changes will be made after these new CPUs and models become available:
Intel® Xeon® 5520 series (2.26Ghz, 8M Cache, Turbo, HT, 1066MHz Max Memory)
Intel® Xeon® E5620 series (2.40GHz, 12M cache, Turbo, HT, 1066MHz Max Memory)
Intel® Xeon® X5550 (2.66Ghz, 8M Cache, Turbo, HT, 1333MHz Max Memory)
Intel® Xeon® X5650 (2.66GHz, 12M cache, Turbo HT, 1333MHz Max Memory) – 6-core
Intel® Xeon® X7460 (2.67GHz, 16M Cache, 1066Mhz FSB)
Intel® Xeon® X7560 (2.26GHz), 24M Cache, 1066Mhz FSB) – 8-core