In this blog post we will walk through the process of creating a collectd plugin in C for monitoring NVIDIA GPUs.
In a previous blog we introduced the monitoring objectives of the TANGO solution. One of these objectives is the monitoring of different kind of processors, including NVIDIA GPUs, in order to get from them energy measurements, among other metrics. For this purpose, we decided to rely on collectd and integrate it in the stack of tools responsible for the monitoring tasks. Collectd is a UNIX daemon, written in C, which periodically collects system and applications performance metrics. One of the main advantages of this tool is that it uses a modular approach which allows the use of custom plugins to enhance its monitoring capabilities.
General applications used in HPC are based on SPMD models – SPMD meaning Simple Program-Multiple Data. These applications are the one we usually understand well for HPC. The same binary (program) is started on several HPC devices but use different input (data). The devices are defined as processors including or defining as threads, and enhanced by accelerators such as GPU (graphics processing unit) or other specialized processing unit.
This arcticle shows the potential of Device Supervisor of TANGO for MPMD models.
OpenCL is now one of the preferred “non-proprietary” way for programming the “acceleration”
part of a heterogeneous platform. OpenCL is supported by a large panel of manufacturers, enabling it
for programming on various GPUs, including SoCs accelerators but also FPGAs.
OpenCL enables writing computing kernels with a C-like language in a very simple way.
But a drawback of OpenCL is the code that is required to access these kernels from the CPU. Provided
APIs propose “buffer oriented” data transfers, and execution control requires multiple statements.
This blog article presents how Placer can be used on a real-world use case. In particular, it provides
information to an embedded application developer for exploring quickly the optimal placement and
schedule of the tasks computed by an embedded application named AquaScan. Alternatively, Placer
can also handle additional placement constraints to let the developer study the impact on time
performance in case the optimal placement and schedule are not followed.
In the last 10 years, the IT world has suffered the virtualization and Cloud revolution. It started at the beginning with Virtual Machines (VMs) and nowadays the tendency it is to move to Container-based execute applications. The former version, the VM one, has not impact in the High-Performance Computing (HPC) world, but with containers, we can see a significant effort to start using it on supercomputers. To see why, let’s start first with the differences between VMs and Containers.
In the emerging era of Internet of Things and Big Data processing, the paradigm of software development is shifting. In particular, many IoT and Big Data problems can be solved using vector and matrix operations and these operations can be drastically accelerated by using massively parallel processing units such as GPU and FPGA, which are now integrated in many recent heterogeneous platforms found on the market.
One of the recent trending topics in computer science research is the reduction of energy and power consumed by computers. Part of this research is focused on providing low power architectures and accelerators which can achieve good performance at low power. However, an application should be modified to take advantage of this heterogeneous architecture.
TANGO project aims to deliver the best way to run application with low power consumption. TANGO helps defining the right hardware (CPU, GPU, FPGA …) according to the application needs. The Device Supervisor (defined in the global TANGO architecture) is in charge of managing the available resources from their allocation to their usage. One of the key parameter to follow is the energy consumption.
Two decades ago, parallel programming was a technology restricted to large scale specialised applications that were running in computing centres. Today’s picture is very different with the generalisation of multi-core processing devices, not only inside desktop computers, but inside embedded and mobile devices as well.
Image processing, The Internet of Things (IoT) and cloud computing are examples of domains in which concurrent cooperative programming, that-is-to-say parallel programming, develops widely.