I decided I needed a short break from working on the HDL, so I wrote libre-chip/parse_powerisa_pdf: parser for the OPF PowerISA 3.1C pdf - Libre-Chip.org
currently it reports no errors for the first 304 pages of the pdf (enough to get into the vector instructions and past all the base integer and fp and decimal fp instructions), though I haven't manually verified all of the extracted instructions are 100% correct (that's a lot of manual work).
I wrote it in Python 3.11 (default version on Debian Bookworm), since IIRC Dmitry wants to have something like this for binutils, and I figured they'd be much more happy about something written in Python (or C, but I don't want to use C) rather than Rust.
I have it generating XML, mostly because it generates HTML tags to record when text is subscript/superscript/bold/italic/etc. (critical for retaining the pseudo-code's semantics) and that seemed to work best with XML, rather than something like JSON.
Example excerpt from the generated XML for the bc[l][a]
instructions (uses subscripts a whole bunch in the code):
bc[l][a]
<instruction>
<header>
<title>Branch Conditional B-form</title>
<mnemonics>bc BO,BI,target_addr (AA=0 LK=0)<br />
bca BO,BI,target_addr (AA=1 LK=0)<br />
bcl BO,BI,target_addr (AA=0 LK=1)<br />
bcla BO,BI,target_addr (AA=1 LK=1)</mnemonics>
<bit-fields>
<fields>
<field>
<name>16</name>
<bit-number>0</bit-number>
</field>
<field>
<name>BO</name>
<bit-number>6</bit-number>
</field>
<field>
<name>BI</name>
<bit-number>11</bit-number>
</field>
<field>
<name>BD</name>
<bit-number>16</bit-number>
</field>
<field>
<name>AA</name>
<bit-number>30</bit-number>
</field>
<field>
<name>LK</name>
<bit-number>31</bit-number>
</field>
</fields>
</bit-fields>
</header>
<code>if (64-bit mode)<br />
then M ← 0<br />
else M ← 32<br />
if ¬BO<sub>2</sub> then CTR ← CTR - 1<br />
ctr_ok ← BO<sub>2</sub> | ((CTR<sub>M:63</sub> ≠ 0) ⊕ BO<sub>3</sub>)<br />
cond_ok ← BO<sub>0</sub> | (CR<sub>BI+32</sub> ≡ BO<sub>1</sub>)<br />
if ctr_ok & cond_ok then<br />
if AA then NIA ←<sub>iea</sub> EXTS(BD || 0b00)<br />
else NIA ←<sub>iea</sub> CIA + EXTS(BD || 0b00)<br />
if LK then LR ←<sub>iea</sub> CIA + 4</code>
<description>BI+32 specifies the Condition Register bit to be tested.<br />
The BO field is used to resolve the branch as described<br />
in Figure 2.5. <i>target_addr</i> specifies the branch target ad-<br />
dress.<br />
<br />
If AA=0 then the branch target address is the sum of<br />
BD || 0b00 sign-extended and the address of this instruc-<br />
tion, with the high-order 32 bits of the branch target ad-<br />
dress set to 0 in 32-bit mode.<br />
<br />
If AA=1 then the branch target address is the value<br />
BD || 0b00 sign-extended, with the high-order 32 bits of<br />
the branch target address set to 0 in 32-bit mode.<br />
<br />
If LK=1 then the effective address of the instruction fol-<br />
lowing the <i>Branch</i> instruction is placed into the Link Reg-<br />
ister.</description>
<special-registers-altered>
<title><b>Special Registers Altered:</b></title>
<table-header-register><b><i>Register</i></b></table-header-register>
<table-header-fields><b><i>Field(s)</i></b></table-header-fields>
<entry>
<register>CTR</register>
<fields />
<conditions>(if BO<sub>2</sub>=0)</conditions>
</entry>
<entry>
<register>LR</register>
<fields />
<conditions>(if LK=1)</conditions>
</entry>
</special-registers-altered>
</instruction>
And for the addi/paddi
instructions (has both unprefixed and prefixed forms):
addi/paddi
<instruction>
<header>
<title>Add Immediate D-form</title>
<mnemonics>addi RT,RA,SI</mnemonics>
<bit-fields>
<fields>
<field>
<name>14</name>
<bit-number>0</bit-number>
</field>
<field>
<name>RT</name>
<bit-number>6</bit-number>
</field>
<field>
<name>RA</name>
<bit-number>11</bit-number>
</field>
<field>
<name>SI</name>
<bit-number>16 31</bit-number>
</field>
</fields>
</bit-fields>
</header>
<header>
<title>Prefixed Add Immediate MLS:D-form</title>
<mnemonics>paddi RT,RA,SI,R</mnemonics>
<bit-fields>
<prefix>
<prefix-text>Prefix:</prefix-text>
<fields>
<field>
<name>1</name>
<bit-number>0</bit-number>
</field>
<field>
<name>2</name>
<bit-number>6</bit-number>
</field>
<field>
<name>0</name>
<bit-number>8</bit-number>
</field>
<field>
<name>//</name>
<bit-number>9</bit-number>
</field>
<field>
<name>R</name>
<bit-number>11</bit-number>
</field>
<field>
<name>//</name>
<bit-number>12</bit-number>
</field>
<field>
<name>si0</name>
<bit-number>14 31</bit-number>
</field>
</fields>
<suffix-text>Suffix:</suffix-text>
</prefix>
<fields>
<field>
<name>14</name>
<bit-number>0</bit-number>
</field>
<field>
<name>RT</name>
<bit-number>6</bit-number>
</field>
<field>
<name>RA</name>
<bit-number>11</bit-number>
</field>
<field>
<name>si1</name>
<bit-number>16 31</bit-number>
</field>
</fields>
</bit-fields>
</header>
<code>if ‘‘addi’’ then<br />
RT ← (RA|0) + EXTS64(SI)<br />
if ‘‘paddi’’ & R=0 then<br />
RT ← (RA|0) + EXTS64(si0||si1)<br />
if ‘‘paddi’’ & R=1 then<br />
RT ← CIA + EXTS64(si0||si1)</code>
<description>For <b><i>addi</i></b>, the sum of the contents of register <code>RA</code>, or the<br />
value 0 if <code>RA=0</code>, and the value <code>SI</code>, sign-extended to 64<br />
bits, is placed into register <code>RT</code>.<br />
<br />
For <b><i>paddi</i></b> with <code>R=0</code>, the sum of the contents of register<br />
<code>RA</code>, or the value 0 if <code>RA=0</code>, and the value <code>si0</code>||<code>si1</code>, sign-<br />
extended to 64 bits, is placed into register <code>RT</code>.<br />
<br />
For <b><i>paddi</i></b> with <code>R=1</code>, the sum of the address of the instruc-<br />
tion and the value <code>si0</code>||<code>si1</code>, sign-extended to 64 bits, is<br />
placed into register <code>RT</code>.<br />
<br />
For <b><i>paddi</i></b>, if <code>R</code> is equal to <code>1</code> and <code>RA</code> is not equal to 0, the<br />
instruction form is invalid.</description>
<special-registers-altered>
<title><b>Special Registers Altered:</b></title>
<special-text>None</special-text>
</special-registers-altered>
</instruction>