The HPE Performance Cluster Manager (HPCM) integration provides centralized monitoring, discovery, and management of large-scale HPC (High-Performance Computing) environments. Designed for scientific, AI, ML, and enterprise HPC workloads, HPCM helps IT teams efficiently monitor thousands of nodes with high availability and performance.
With this integration, you can:
- Automatically discover hardware, nodes, network interfaces, and firmware details.
- Monitor critical metrics, track performance, and generate real-time alerts.
- Organize resources hierarchically and visualize the cluster structure within OpsRamp.
Here is the highlevel architecture of HPE Performance Cluster Manager (HPCM):
Supported Target Version
HPE Performance Cluster Manager (HPCM) v1.12
Resource Hierarchy of HPCM
- HPE Performance Cluster Manager
- HPC Nodes
- HPC NICs
Key Use Cases
Discovery Use cases
Gain visibility into all HPCM components with relationship mapping:
- Automatically detects compute, management, and storage nodes.
- Identifies network interfaces, switches, and fabric topology.
- Discovers hardware (CPU, memory, GPU, disk), BIOS, and firmware versions.
- Groups system components for streamlined management.
Monitoring Use cases
Track system health and performance with actionable insights:
- Collects time-series metric data to detect anomalies and threshold breaches.
- Triggers alerts to support fast issue resolution and reduce downtime.
- Monitors job scheduling time, status, and related performance indicators.
Related Topics
- Learn about the metrics collected from HPCM resources and how to configure the integration in OpsRamp.
- Find steps to resolve common debugging issues and risks and limitations of the integration.
Version History
Application Version | Bug fixes / Enhancements |
---|---|
1.0.0 | Initial Version with discovery and monitoring features. |