Schematics: Libre Chip Execution Unit — Discussion & Suggestions

I addressed this some more further down.

The GPL requires providing the corresponding source, the preferred form for editing (here that would be the SVG source code, assuming that's how you edit it), for whatever files you are redistributing, so the GPL does actually qualify as "free documentation".

your latest diagram [1] seems pretty good to me, though I would leave out the amplifiers since it's supposed to show the high-level logical layout, not the individual components needed. Also, FPGAs don't have separate amplifiers that you can wire to run a SRAM, the SRAM is an indivisible block.

One thing we could do that you would have a hard time showing in that diagram is: for each of the replica physical registers that are only wired to an ALU input that only reads some of the bits (e.g. a ALU input that's only ever used for carry-in, or a memory-store pipeline that only ever reads the 64 data bits), you can store only those bits that would be read, so the registers on a memory-store pipeline's inputs could only be 64-bits wide, or if there was a input that only ever read flags, it could be just those flag bits that are read so it could be only like 5 bits wide.

assuming accesses to the L2 register file is sufficiently rare, it's fine if the CPU is much slower in pathological cases where it needs to move stuff in/out of the L2 registers (e.g. 100 div instructions in a row will almost certainly fill up the div unit's registers, but that's extremely uncommon code)

increasing the number of physical registers is a trade-off, the general idea I'm following is that we have a standard register-renaming CPU except that because the μOps have a lot of architectural registers (128 for now [2], most of the time only a few of them (16-64) are in use because PowerISA and x86 don't actually have all that many integer/fp registers), we add the L2 register file so we can actually store all of the architectural registers without wasting a huge number of the physical registers that would be rarely used (e.g. if the architectural register is used for storing the rarely accessed parts of XER or FPSCR or EFLAGS).

If we're building a design that has more than around 10 or so units, we'd likely want to reduce how much the units are interconnected, so the register renaming stage can just insert a copy instruction (which takes one or more cycles to move the data to the other side of the CPU) when there isn't a direct connection between the source and destination unit. This scheme is actually much more general than just having separate integer and floating-point physical register files, since you can do things like splitting into 3 groups of units for floating-point multiply, load/store and bitwise and/or/xor, and integer ops, or any arbitrary strongly-connected graph you please, as long as you have enough copy units connecting them to handle moving values from any unit to any unit.


  1. ↩︎

  2. some of the architectural and physical registers are hardwired as constants, e.g. the zero register on RISC-V ↩︎