Congratulations — and an ISA proposal

I just found this project and I love it. Congratulations on creating it and securing the grant.

I haven't explored the project much yet, but I will over the coming days. I'm a programmer interested in microkernels.

I’d expect the first ISA choice to be RISC‑V — which would also be my first choice. If you want to be more adventurous, I have an ISA proposal: a mix of EPIC, RISC, and CISC that I call ESEPIC (Extensible Simplified EPIC).

Overview

Overview

  • 32‑bit fixed instruction length; big‑endian instruction encoding; bi‑endian data

    • Optional 16‑bit “compressed” subset
  • Four separate integer units, each with 16 registers (registers are 64‑bit)

  • Optional hardware stack pointer

  • ISA includes flags (optional)

  • Optional separate integer “front unit”

  • Reserved space for an FP unit

  • Reserved space for SIMD, vector, and additional extensions

Registers and groups

  • 4 × 16 main registers and a program counter

  • All main registers are 64‑bit

  • Integer groups: k, l, m, n (also addressed as group g)

  • Floating groups: fk, fl, fm, fn (also addressed as group fg)

Per-group integer registers:

  • a, b, c, d, e, s, u, v — aliases r0 … r7 (main registers 0..7)

  • aa, bb, cc … uu, vv — aliases r8 … r15 (main registers 8..15)

Per-unit floating-point registers (extension):

  • fa, fb, fc, fd, fe, fs, fu, fv — aliases f0 … f7

  • faa, fbb, fcc … fss, fuu, fvv — aliases f8 … f15

Per-group flags (optional):

  • flagsA, flagsB (4 bits each: ca, za, va, ha and cb, zb, vb, hb)

    • c = carry; z = zero; v = overflow/other; h = stream mask
Bit decoding legend
.     dot   any (0 or 1)  (unspecified, not described)
x y         any (0 or 1), opcode, bit used up for decoding, some combinations might be invalid
0    zero   0  (where setting this bit to 1 produces another valid instruction)
1    one    1  (where setting this bit to 0 produces another valid instruction)
-    minus  0, (when 1 is INVALID or otherwise reserved, perhaps for future additions)
+    plus   1, (when 0 is INVALID or otherwise reserved, perhaps for future additions)
*    star   code given by an optional standard feature (ISA extension)
c           condition code
uu          group select  (00:k,  01:l,  10:m,  11:n)
<<          shift of immediate value, bit position
_ = | ,     kitch, ornamental

(imm - immediate value)   (i2 - 2-bit immediate value)   (cond3 - 3-bit condition code)(S - sign; top bit of immediate)   (Z - 1:sign extend)  (E - 1:big endian)(fp - floating point)  (P - fp precision: 14/24/53/113 bits)   (T - data type)(WW - bytes-width = 00:8,  10:4,  01:2,  11:1) (A a - hint prefetch flow, announcement)(H H H - hint = 111:taken,  000:straight)   (B - 1:speculation barrier)(cnt - count = 01, 10 or 11, number of registers to push)  (r - 1:read)(bStart6 - shift left by this many bits)(bEnd6: mask by ((uint64(-1) << bStart6) & (uint64(-1) >> (63 - bE6))) )
        big endian:   |  byte 0  |     byte 1       |      byte 2       |     byte 3      | ISA
 type  name      regs |31..27.25.|23.21 20 .18 17 16|15 .13 12 11 10 9 8| 7 6 5 4 3 2 1 0 |exte.
  --------------------|----------+------------------+-------------------+-----------------+-----
 CC-RC compare      2r|11uu 00 0c|1 1 0  T T -| cond3  | MODC |_ sregA _|_ sregB_| CZVEF  |flags
 NOP-H nop, hint    2r|11uu 00 0.|0 1 0  . . .| .  .  .|. . . | . .  . .| . . . .|. . . . |
 R-R   reg-reg      3r|11uu 00 0c|* 0 0  T T 1| x  x  x|y y y |__sregA _|_ sregB_|_ dreg _|
 R-RC  reg-reg pred.3r|11uu 00 0c|* 0 0  T T 0| cond3  |y y y |__sregA _|_ sregB_|_ dreg _|
 R-C0  single r. p. 2r|11uu 00 0c|* 0 1  T T 0| cond3  |0|y y   y y - - |_ sreg _|_ dreg _|
 R-0   single reg.  2r|11uu 00 0c|* 0 1  T T 0| -  x  x|1|y y   y y - - |_ sreg _|_ dreg _|
 R-6   width extend 2r|11uu 00 0c|* 0 1  T T 1| Z  -  -|0|____ imm6 _<<0|_ sreg _|_ dreg _|
 R-6   shifts,rots  2r|11uu 00 0c|* 0 1  T T 1| x  x  x|1|____ imm6 _<<0|_ sreg _|_ dreg _|
 R-8   addi, subi . 2r|11uu 00 0c|* 1 1  T T 0| -| S,_______ imm8 ___<<0|_ sreg _|_ dreg _|
 R-8   mul imm.     2r|11uu 00 0c|* 1 1  T T 1| -| S,_______ imm8 ___<<0|_ sreg _|_ dreg _|
 R-8   bitwise imm. 2r|11uu 00 10|* 1 x  T T x| x| S,_______ imm8 ___<<0|_ sreg _|_ dreg _|
 R-8   set less     2r|11uu 00 10|* 0 1  T T x| -| S,_______ imm8 ___<<0|_ sreg _|_ dreg _|less
 Y-20  CUSTOM        ?|11uu 00 10|* 0 0  . . -|    ==== custom extensions ====            |
 R-12  ins shift    2x|11uu 00 11|* 0 x  x|___ bEnd6 ___|___ bStart6 ___|_ sreg _| sdreg _|
 R-12  ext shift    2x|11uu 00 11|* 1 x  x|___ bEnd6 ___|___ bStart6 ___|_ sreg _| sdreg _|
    
 F-RR  flpoint FMA  4r|11uu 01 1x|* x -  P P round3 | sregfC  | sregfB  | sregfA |  dregf |float
 F-R   flpoint ari. 3r|11uu 01 0-|* 0 -  P P round3 |x x  x  x| sregfB  | sregfA |  dregf |float
 F-0   flpoint ari. 3r|11uu 01 0-|* 1 -  P P round3 |x x  x  x| y y - - | sregfA |  dregf |float
    
 MF-12 setfi64        |11uu 11 11|- 0 0  0|____________ imm12 _______<<0|-|srego |  dregf |float
 MF-12 setfi32        |11uu 11 11|- 0 0  1|____________ imm12 _______<<0|-|srego |  dregf |float
 M-6   addi.r.r     2r|11uu 11 11|* 0 1  0 1 0 srcg | T|S,____ imm6 _<<0|_ sreg _|_ dreg _|
 MF-0  mov.f.r      2r|11uu 11 11|* 0 1  0 0 0 srcg | T P  P P round3 . |_ sreg _|_ dregf |float
 M-F   mov.r.f      2r|11uu 11 11|* 0 1  0 1 1 srcg | T P  P P round3 . |  sregf |_ dreg _|float
 MF-F  mov.f.f      2r|11uu 11 11|* 0 1  0 0 1 srcg | - P  P P round3 . |  sregf |  dregf |float
 Y-18  CUSTOM        ?|11uu 11 11|* 0 1  1 . .  .  +|      ==== custom extensions ====    |
 M-O12 addi/seti/xori |11uu 11 11|- 1 0 |S,____________ imm12 _______<<0|x|srego |_ dreg _|
 M-M   mov.m        2r|11uu 11 11|* 1 1  0 1 +  0  -|                   |+ +|srgm|_ dreg _|LR
 M-0   mov.ret      2r|11uu 11 11|* 1 1  0 1 +  1  -|                   |-|sregr |_ dreg _|
 M-0   mov.ret      2r|11uu 11 11|* 1 1  0 0 1  1  -|                   |_ sreg _|-|dregr |
 MO-0  mov.o        2r|11uu 11 11|* 1 1  0 0 0  a  a| - -  - -  - - - - |_ sreg _|a|drego |
 MM-0  mov.m        2r|11uu 11 11|* 1 1  0 0 1  0| pri2|-  - -  - - - - |_ sreg _|cah|drgm|LR
 PP-0  pop          3r|11uu 11 11|* 1 1  1 1 0|fp| -|cnt?| - -|_ dreggC_|_ dreggB|_ dreggA|stack
 PU-0  push         3r|11uu 11 11|- 1 1  1 0 0|fp| -|cnt | - -|_ sreggA_|_ sreggB|_ sreggC|stack
 PP-R  deep read    2r|11uu 11 11|* 1 1  1 + 1|fp| 0|                   |_ sreg _|_ dregg_|stack
 PP-6  deep read    2r|11uu 11 11|* 1 1  1 + 1|fp| 1| - -|____ imm6 _<<0|        |_ dregg_|stack
    
 A-M   atomics      2r|11uu 11 00|- + Z  W W E|fp| -|instCount|         |+ 0|srgm|  sregg |atom
 L-M8  load front   2r|11uu 11 00|- 1 Z  W W E|fp| S,______ imm8 ____<<0|+ 1|srgm|  dregg |LR
 S-M8  store front  2r|11uu 11 00|- 0 P  W W E|fp| S,______ imm8 ____<<0|+ +|srgm|  dregg |LR
 L-8   load back    2r|11uu 11 01|* 1 Z  W W E|fp| S,______ imm8 ____<<0|_ sreg _|  dregg |
 S-8   store back   2r|11uu 11 01|- 0 P  W W E|fp| S,______ imm8 ____<<0|_ sreg _|  sregg |
 IO-0  port in/out   ?|11uu 11 10|*|r|-  + + E| -| +| opcode  |__sregA _|_ sregB_|_ dreg _|port
    
 LO-M8 load ptr.    2r|1011 11 00|* - -  T T E| a| S,______ imm8 ____<<0|a +|srgm|a| drego|LR
 LM-M8 load ptr.    2r|1011 11 01|* pr2  T T E| .| S,______ imm8 ____<<0|+ +|srgm|cah|drgm|LR
 MM-12 addi/seti/xori |1011 11 1x|* pr2 |S,____________ imm12 _______<<0|  sregx |cah|drgm|LR
    
 B-FO  branchF rego 1r|11uu 10 0c|* H H H |B|0| cond3 |- |co2 | . . . . |-|srego |- -| |- |flags
 B+F15 branchF PC   0r|11uu 10 0c|* H H H |B|1| cond3 | imm3<12,________ imm10____<<2|.|S |flags
 B+C10 branchC PC   0r|11uu 10 1c|* H H H |B|.|i2<<8|_ sregB _|_ sregA _|__ imm6 _<<2|c|S |cmpb
     
 B-CO  branchC rego 1r|1011 10 1c|* H H H |B|0| A A  A|sregoB |.|sregoA |-|sregoD|- -|c|- |cmpbo
 EMU   emulation req. |1011 10 1.|* . . .  .|1| 0 0  0|  reserved & emulation feature req.|
 NOP-A branch announce|1011 10 1-|* H H H |B|1| a a  a| imm3<12,________ imm10____<<2|.|S |
 B+C10 branchC PC   0r|1011 10 0c|* H H H |B|.|i2<<8|.|sregoB |.|sregoA |__ imm6 _<<2|c|S |cmpbo
     
 C-O0  CALL rego      |1011 01 01|* A -  . . 0  0 . | . . . .  . . .  . |a| srego|ajaj a a|
 C-M8  CALL regm+offs |1011 01 01|* A -  T T 0  1| S,______ imm8 ____<<0|a +|srgm|ajaj a a|LR
 R-D   RET retreg     |1011 01 01|* A -  . . 1  . . | .        . . .    |a depth3|ajaj a -|
 C+21  CALL PC    L+0r|1011 01 10|* A|.|imm3<<18, ____________ imm16 _____________<<2| ?|S|
 J+21  JUMP PC      0r|1011 01 00|* A|.|imm3<<18, ____________ imm16 _____________<<2| .|S|
 Y-19  CUSTOM         |1011 01 11|* - 0  1 0|      ==== custom extensions ====            |
 EMU   emulation req. |1011 01 11|- - 0  1 1 -|    reserved & emulation feature req.      |
 E-0   environment  FF|1011 01 11|- - 0  0 0 .  . . | . . . . .  . .  . | OS vec.|R - 0 - |
 E-8   environment  FF|1011 01 11|- - 0  0 0 .  . . |______ imm8 ____<<0| OS vec.|R - 1 - |
 E-0   environment  FF|1011 01 11|- - 0  0 1 .  . . | . . . . .  . .  . |Hyper v.|R - 0 - |
 E-8   environment  FF|1011 01 11|- - 0  0 1 .  . . |______ imm8 ____<<0|Hyper v.|R - 1 - |
 E-F   fences       0r|1011 01 11|- - 1  - 1 0  0 - | - +|- - - |     RISC-V fences       |
 E-F   fences       0r|1011 01 11|- - 1  - 1 0  1 - | - +|- - - |  ESEPIC fences |op|itag |
 E-F   fences       0r|1011 01 11|- - 1  - 1 1  - - |       feature support query         |

 PC-20 ADDUI ria(+PC)o|1011 00 10|* x sh|S,________________ imm19 _____________<<12or32| 0|
 IB-20 SETUI rib     o|1011 00 10|* x sh|S,________________ imm19 _____________<<12or32| 1|rib
 MO-12 addi/seti/xori |1011 00 11|* x sh|S,____________ imm12 ___<<0or52|  sregx |-|drego |
 O-XX                 |1011 00 0-|* 0  arith. | a a | . . . . |  sregxA |  sregxB|a|drego |
 O-X                  |1011 00 0-|* 1  arith. | a a | . . . . | . . . . |  sregxB|a|drego |
                            || ||  \
                  1 0 = flow/  \|   ^flags write
                                ^ 0 1 = (predicated)
    
 __ standard feature _|31..27.25.|23.21 20 .18 17 16|15 .13 12 11 10 9 8| 7 6 5 4 3 2 1 0 |exte.
  flow control        |11uu 10 ..|.|     conditional flow control       |        |        |
  flow control        |1011 10 ..|.|     conditional flow control,  front unit            |
  flow                |1011 01 ..|.|   unconditional flow, environment; fences            |
  flags write         |11uu .. ..|1|   flags write            |         |        |
  predicated          |11uu 00 01|*|   predicated (? on not flag Ca ?)  |        |        |
  predicated          |11uu 10 .1|*|   predicated (? on not flag Ca ?)  |        |        |
  SIMD 4x, masked     |1010 .. ..|.|   all 4 int+fp groups, masked by Ha         |        |simd
  SIMD 4x             |1000 .. ..|.|   all 4 int+fp groups, ignores Ha           |        |simd
  SIMD 4x, any        |10.0 10 .1|.|     conditional flow control,  on all groups true    |simd
  SIMD flags write    |10.0 .. ..|1|   all 4 int+fp groups, flags write          |        |simd
   
   
 _____ unit name _____|31..27.25.|23.21 20 .18 17 16|15 .13 12 11 10 9 8| 7 6 5 4 3 2 1 0 |
  packed extension    |0... .. ..|.| packed 16-bits |                                      pack
  int+fp units        |11uu .. ..|.| int+fp single unit|      |    reg  |  reg   |  reg   |
  int+fp group SIMD   |10.0 .. ..|.| int+fp all units 4× SIMD |    reg  |  reg   |  reg   |simd
  SIMD unit 64-bit    |1001 00 0.|.| dedicated masked SIMD unit,   64 bits wide           |s64
  SIMD unit multiwidth|1001 00 1.|.| dedicated masked SIMD unit,  128 - 512 bit operands  |smw
  loop SIMD unit      |1001 01 0.|.| dedicated masked loop SIMD,  pow(2,n)×64 bits wide   |sloop
  vector unit         |1001 01 1.|.|    vector    unit        |     ultralong width       |vec
 Y-26  CUSTOM         |1001 10 ..|.|          custom ISA, custom units                    |
 Z-26  INVALID        |1001 11 ..|.|          reserved        |        unknown            |
  memory unit         |1011 1+ ..|.| target    mem     |  AGU |  immediate       |        |
  environment, jumps  |1011 01 ..|.|  environment, jumps, emulation, fences , privileges  |

Thanks! We're planning on having PowerPC be the first instruction set we support, because it's the open instruction set with the most software support. Later, we're planning on adding a programmable decoder to our CPU, so it'll be able to run RISC-V, x86, SPARC, and more as well as PowerPC with more extensions. Inventing a new instruction set is a nice idea, but realistically you'll have like 5-10yr of hundreds to thousands of people working on it before the software ecosystem can catch up, so that's why we're not doing that. That said, once we have the programmable decoder working, it'll be relatively simple (still quite complex, but simple compared to designing a whole CPU) to program it to run your instruction set if you want to do that.

Inventing a new instruction set is a nice idea, but realistically you'll have like 5-10yr of hundreds to thousands of people working on it before the software ecosystem can catch up, so that's why we're not doing that.

Yes, I agree. Meaning that your goal is to make a CPU that’s immediately useful by running existing software and using existing tools.

Regarding L1 cache, here’s a suggestion.
Add a speculative buffer alongside the cache to record pending writes. However, that only fixes the writes to the cache. If you don’t want to support speculative writes, then you don’t need such a buffer.

Now, let’s consider just the cache reads.
For speculative reads (no writes), perform two cache reads:

  1. Speculative read — fetch the data from the cache without modifying cache metadata (no LRU updates).

  2. Confirmation read — issued only if the speculative path is actually taken; its return value can be ignored. Its purpose is to update cache metadata (LRU position) as if the access were real.

Practical issues

  • a) Bandwidth: this requires roughly twice the read bandwidth
  • b) Timing interference: speculative reads still contend for cache resources with non-speculative accesses and can change access timing. The cache is shared between speculative and non‑speculative activity, so care is needed to prevent speculation from altering timing behaviour (especially for real-time-sensitive or timing-attack scenarios). Designing arbitration or prioritization that preserves non-speculative timing while allowing speculative confirmation reads is nontrivial.

Just my 2 cents.

it might be better to use random cache eviction instead of LRU, that avoids needing to update per-cache-element state, allowing you to have just one LFSR that you update and rollback when cancelling instructions.

yup, the plan is that whenever there's a capacity issue, earlier instructions always have higher priority.

OK, random eviction looks like a good idea, although it might be a lot slower than LRU. Notice, that then you will have to use random eviction always, for all the data in the cache.
I’m not sure, but I would think about whether such an approach yields sufficiently high performance.

I forgot to mention:

In general, the key problem to solve to prevent SPECTRE is always the same: speculative execution competes for resources with non-speculative execution.
Therefore you always have to be sure that speculative execution uses only the EXTRA resources of a CPU, which then prevents any impacts on non-speculative execution.