PFN’s Supercomputers

Overview
MN-3a
MN-2
MN-1, MN-1b
Computing Sites
Middleware

Overview

Computing infrastructure powering problem-solving capabilities of deep learning

Preferred Networks (PFN)’s core technologies, especially deep learning, require enormous computing power. To perform a vast number of computations efficiently, we currently operate our own computer clusters (more commonly known as supercomputers). Our computer clusters are named MN followed by a series number: MN-1, MN-2 and MN-3.

Infrastructure

MN-3

Operating since May 2020, MN-3 is PFN’s third-generation computer cluster that uses MN-Core, a highly efficient custom processor co-developed by PFN and Kobe University specifically for use in deep learning. PFN is currently working to increase MN-3's computational speed for practical deep learning workloads. MN-3 topped the Green500 list of the world's most energy-efficient supercomputers three times in June 2020, June 2021 and November 2021.

Systems used for Green500 measurement and their respective performance

	November 2021	June 2021	November 2020	June 2020
Nodes	32			40
MN-Core processors	128			160
CPU (Intel Xeon) cores	1,536			1,920
Peak performance (theoretical, under given standard)	3.390 PFlops	3.138 PFlops		3.92 PFlops
HPL benchmark	2.181 PFlops	1.822 PFlops	1.653 PFlops	1.621 PFlops
Energy efficiency	39.38 GFlops/W	29.70 GFlops/W	26.04 GFlops/W	21.11 GFlops/W
Green500 ranking	#1	#1	#2	#1

Green500 certificate for November 2021

PFN plans to expand MN-Core-powered computer clusters in multiple phases. The first of these, MN-3a, was completed with the following configuration in May 2020.

MN-3a is made up of 1.5 “zones,” each of which consists of 32 compute nodes (MN-Core Servers) tightly coupled by two MN-Core DirectConnect Switches.

Configuration of the MN-3a cluster:

48 compute nodes (MN-Core Servers)
Network between MN-Core servers

MN-Core DirectConnect (interconnect developed specifically for MN-Core processors)
5 100GbE Ethernet

Configuration of each MN-3a node:

MN-Core Server

MN-Core	MN-Core Board x 4
CPU	Intel Xeon 8260M two-way (48 physical cores)
Memory	384GB DDR4
Storage Class Memory	3TB Intel Optane DC Persistent Memory
Network	MN-Core DirectConnect (112Gbps) x 2 Mellanox ConnectX-6 (100GbE) x 2 On board (10GbE) x 2

MN-2

MN-2 is the first GPU cluster built and managed solely by PFN, Operating since July 2019.

Specifications of the MN-2 cluster:

128 GPU servers (compute nodes)
32 CPU servers (compute nodes)
24 storage servers
18 100GBE Ethernet Switches

Specifications of each compute node on MN-2:

GPU server

GPU	NVIDIA V100 SXM x 8
CPU	Intel Xeon 6254 two-way (36 physical cores)
Memory	384GB DDR4
Network	Mellanox ConnectX-4 (100GbE) x 4 On board (10GbE) x 2

CPU server

CPU	Intel Xeon 6254 two-way (36 physical cores)
Memory	384GB DDR4
Network	Mellanox ConnectX-4（100GbE） x 2 On board（10GbE） x 2

MN-1, MN-1b

MN-1 is a GPU computer cluster that NTT Communications operates exclusively for PFN. The MN-1 cluster has two generations: MN-1 (operating since September 2017) and MN-1b (operated between July 2018 and July 2021).

The configuration of each MN-1 cluster is as follows:

MN-1
MN-1b

64 GPU servers (NVIDIA V100 x 8, EDR 100Gbps InfiniBand × 2)

MN-1 milestones:

MN-1b milestones:

Middleware

PFN’s computer clusters use Kubernetes, an OSS for managing containerised applications, as core technology. This, in combination with PFN’s own schedulers and front-end tools, provides a computing platform for efficient machine learning and deep learning research.