Preferred Networks (PFN) has developed the Chainer™ open-source deep learning framework and has been working to build large-scale clusters that support its research and development activities with the aim of applying deep learning technology in the real world.
To promote this initiative further, PFN is developing MN-Core™, a processor dedicated to the acceleration of deep learning research.
At SEMICON Japan 2018, PFN will exhibit its independently developed hardware for deep learning, including the MN-Core chip, board, and server.
High-speed computing is one of the big challenges in deep learning, which requires an enormous amount of computation.
The MN-Core chip is optimized for the training phase in deep learning. Unlike a general-purpose chip, it delivers excellent processing performance by having only limited functionalities. As well as focusing on minimal functionalities, PFN’s proprietary MN-Core has a dedicated circuit for performing matrix operations, a required process in deep learning, to make deep learning much faster.
Nowadays, performance per watt is becoming increasingly important when developing a processor mainly because of the cooling capacity reaching its limit. MN-Core is expected to achieve 1TFLOPS/W in a half-precision floating-point format, a top-class performance per watt in the world.
(Table) MN-Core Chip
|Fabrication process||TSMC 12nm|
|Estimated power consumption (W)||500|
|Peak performance (TFLOPS)||32.8 (DP) / 131 (SP) / 524 (HP)|
|Estimated performance per watt (TFLOPS / W)||0.066 (DP) / 0.26 (SP) / 1.0 (HP)|
(Notes) DP: double precision, SP: single precision, HP: half precision.
MN-Core has matrix arithmetic units (MAUs) extremely densely mounted in its hardware architecture.
The simple architecture – entirely SIMD with no conditional branch – can process a large amount of data all at once.
An MAU and four processor elements (PEs) form one matrix arithmetic block (MAB) where the PEs provide data to the MAU.
Each PE has an integer arithmetic unit, and frequently used commands in deep learning are also implemented in the hardware.
A total of 2,048 MABs, 512 MABs per die, are integrated into one package, which comprises four dies. These are hierarchically arranged and have multiple modes for interlayer data movement, such as distribution, combination, broadcasting, and reduction. This enables flexible programming.
MN-Core Board is a PCI Express board where MN-Core is mounted. A specifically designed heatsink with a blower fan ensures that the temperature of MN-Core does not become high and brings out the best performance of MN-Core.
(Table) MN-Core Board
|Chip||1 MN-Core Chip|
|Interface||PCI Express Gen3 x16|
|Memory size||32 GB|
|Power consumption||600 W (Estimated value)|
MN-Core Server is a 7U-size rack-mount server developed for mounting four MN-Core Boards.
In addition to the high-performance CPU and large capacity memory, a specifically designed internal structure, combined with 12 powerful built-in fans, provides an air-cooling system against heat generated by four MN-Core Boards.
With four MN-Core Boards, its computation speed per node is expected to be about 2PFLOPS in half precision.
(Table) MN-Core Server
|Number of mounted MN-Cores||4 MN-Core Boards|
|CPU||Dual socket up to TDP 200W|
|Memory||DDR4 up to 2666MHz / Up to 3TB ECC 3DS LRDIMM, 1TB ECC RDIMM|
|Storage||Up to 24 SAS/SATA drive bays / 8x 2.5" SAS/SATA supported natively, 2x 2.5" NVMe supported natively|
|Power unit||4 2000W (2+2 Redundant) Titanium Level|
|Size||H311mm, W437mm, D737mm (7U Rack-mountable)|
PFN plans to further develop this computing infrastructure to build a large-scale cluster, named MN-3, consisting of over 1,000 dedicated server nodes.
The objective is to eventually increase the computation speed of MN-3 to 2EFLOPS.
For MN-3 and subsequent clusters, PFN aims to build more efficient computing environments by making use of MN-Core and GPGPU (general-purpose computing on GPU) according to their respective fields of specialty.
PFN will also advance the development of the Chainer deep learning framework so that MN-Core can be selected as a backend. This way, MN-3 and Chainer will accelerate distributed deep learning, enabling PFN to tackle various unsolved problems.
MN-3 (Conceptual Image)
A research group led by Kobe University Prof. Junichiro Makino has played a key role in developing specifications for MN-Core. Thanks to their support, PFN was able to design and develop hardware backed by proven technology.
The University of Tokyo Emeritus Prof. Kei Hiraki has also kindly provided guidance on the evaluation of the high-speed transmission board.
The development of MN-Core originated when PFN was entrusted with a public project by New Energy and Industrial Technology Development Organization of Japan or NEDO. In this project, PFN carried out research and development in conjunction with Prof. Makino’s research team members, such as Takayuki Muranushi and Miyuki Tsubouchi, both of Riken, Japan’s national research institute, to create a processor. The knowledge obtained through this project was fully utilized to design and develop MN-Core.
(From left: Prof. Makino and Prof. Hiraki. Photo provided by: Mari Inaba, an associate professor of The University of Tokyo)
* MN-Core™ and Chainer™ are the trademarks or the registered trademarks of Preferred Networks, Inc. in Japan and elsewhere.