Spinpack using FPGA

Idea

setting up test environment

Adaptions for FPGA

Overview about dataflow

Most data intensive is the sparse matrix (stays in memory or on disk), followed by vectors and config space (stays in memory). Symmetries (permutations) fit into the CPU cache normaly. Matrix is read out sequentially (no latency problem, for big systems its on disk -> bandwith). Space computation is mainly integer or bit driven, but because of missing bit-permutation atomic function its very CPU intensive.
code and data flow


As a first test, space generation could be completely done within FPGA replacing numsymconf() function, writing out minimum symmetric configurations to memory (byte packet or long array).
base space generation


Second test would be implement parts or full hamilton matrix generation to FPGA, if speedup is about 100, matrix could be generated on the fly on every iteration without the need of storing the matrix. This would reduce bandwith problems to disk for bigger spin systems. Nowadays we are limited by disk bandwith (100MB/s) and could go to FPGA streams about 1GB/s per node (speedup 10 without needs of disks and better scaling).
matrix generation


Estimation of FPGA logic needs to compare 40bits configurations to get the minimum. Permutations at zero costs (just wires)?
Logic needs for comparition