On choosing the Z80 over the 6502 (2014)

dwarman · on Dec 20, 2015

I spent severaal years of my life in the latter 70's and early 80's living and dreaming in Z80, for my work at MICOM using it for hard real time data communictions (specifically, Statistical Multiplexors). A critical component to achieving HRT is interrupt response time. Z80 has two complete sets of hardware registers and a one byte instruction to toggle which is active. So with the basic 2.5MHz clock time from raising the interrupt to execution of the first instruction of the handler was 2 usecs. With zero additional context saving to do. Add to that hardware vectored interrupts and it was golden - you landed directly in the handler for the specific peripheral. No need to poll status registers.

Not only way ahead of its time, this kind of performance feature is still apparently unknown. Even in GHz SOC systems and modern DSPs. I've seen some really arcane and baroque interrupt status trees in my time since. All take significant code to parse and dispatch. One really egregious offender was the i960 CPU, sold specifically into the data communications market by Intel in the mid 90's. Sold into a necessarily HRT application domain. Even at a 50 MHz clock it still took 6 - 10 usecs to get started doing actual work. I was very surprised, but the hardware choice at that time was not open to me and I had to live with it.

nibnib · on Dec 21, 2015

I think the market has splintered a little bit, low-latency "real-time" stuff tends to be targeted more by simpler devices, whereas the GHz SOCs and DSPs are more often used for heavy number-crunching or "less real-time" applications where they may be running a full OS. Some of the delay is probably due to hardware support for things like scheduling, multi-threading etc.

As for current tech that can match the Z80, I know the MSP430 can have an interrupt latency of about 10 cycles. This can be below 1us with the kind of clock speeds many of the parts will run at.

Apparently (according to the manual) an ARM Cortex M3 can enter an interrupt in 12 cycles. This can be much faster as the parts can be run above 100MHz.

ChuckMcM · on Dec 20, 2015

The Z80 was always considered the more "commercial" processor, and businesses ran CP/M and various business applications like dBase on it. CP/M as an OS was more "OS like" than anything available for the 6502 at the time, I believe that RT-11 inspired CP/M which was a DEC operating system (a "real" computer company) and commands like PIP were ported over with almost identical syntax.

In contrast there was a number of "proprietary" vendors with 6502 systems, Apple and its OS, SWTP and its OS, Sphere and its OS, Commodore and its OS, all different all with minimal amounts of market share in business. Contrast that to CP/M which ran on Altair machines, IMSAI, Morrow, Polywell, Sol20's, Heathkits, Etc. Even the Cromemco once my BIOS had gotten reasonable distribution.

The I/O address space provided another feature on Z80 (and 8080) machines, which was "shadow rom". Basically you booted with some "massive 8K" ROM and then you could write out a byte which would swap it in memory space for a 1/2K very simple jump table/switcher. That let you use most of the address space for user programs. Later when MP/M came out it let you do wild things with "swapping" processes.

Perhaps the saddest part of the Z80 was that the Z800 (which only made it to market in a more limited Z280 form) was still born. It had a sorts of great ideas that folks wanted to use but the momentum of the 8086/8088 was unbeatable by that time. Given chip geometries it would be interesting to build a Z800 today but for the same reasons that it is interesting to build the difference engine, to prove that it would have worked and had some nice features that were ahead of its time.

thwarted · on Dec 20, 2015

The I/O address space provided another feature on Z80 (and 8080) machines, which was "shadow rom". Basically you booted with some "massive 8K" ROM and then you could write out a byte which would swap it in memory space for a 1/2K very simple jump table/switcher.

The 6502 had a similar feature, that let you turn off the ROM mapping into the 64k of address space so you could access the RAM "underneath" for storage of your own programs. Especially useful for programs that didn't need the BASIC interpreter.

https://www.youtube.com/watch?v=fe1-VVXIEh4#t=26m41s

pvg · on Dec 20, 2015

That's really a feature of the 6502-based computers, not of the CPU itself, though.

pvg · on Dec 20, 2015

Sure, but these early CP/M systems ended up being a blip - Z80 CP/M cards for the Apple ][ outsold them all, most likely combined and then with some factor of N. ZX Spectrums, Trash 80s, various arcade and home game machines probably moved way more Z80s as well.

cmrdporcupine · on Dec 20, 2015

Take a look at the eZ80 from Zilog. 50mhz (equiv of a 200mhz 80s-era Z80 apparently) with 256KB onboard flash, and can address 16MB (24-bits) of RAM, as well as full of onboard peripherals and GPIO.

Too bad only available in surface mount, not really hobbyist friendly.

Sanddancer · on Dec 20, 2015

I've found that surface mount is more bark than bite, to be honest. Boards that'll make it a pth device aren't terribly much, and I've done tons of surface mount chips using the toaster oven technique. Even there, you can get a low end smd oven off ebay for not a terrible amount of money if/when you want to do any sort of volume work. Hell, with my crappy fine motor skills, I've nuked more stuff with the soldering iron than I have with smd, so things definitely have changed on that front.

tonyarkles · on Dec 20, 2015

There's two techniques that I find work very well for SMD ICs for hand assembled boards:

- hot air rework wand. I got mine from China for $100. It works awesome both for removing parts and for melting paste.

- flood & wick. You start by tacking down two corners of the chip, and then drag the tip with some solder along the full edge. You'll probably get some bridging between pins. Sometimes, when the moon is aligned just right, you just get a perfect solder. For the times when you don't, the next step is to take a bit of fresh wick and clean it up.

Between these two tools (cheap Chinese reflow wand $100, and a nice Hakko iron $100), I've probably hand-assembled 75-100 prototypes and I don't recall ever destroying one. My next trick is going to be trying to do some BGA parts... new experiences!

userbinator · on Dec 20, 2015

Hot air is definitely the right way to do it for SMD/BGA, and I find it much easier than through-hole because you don't have to solder each pin separately -- just aim the gun at the part and wait for the solder to melt. The surface tension means the part will actually self-align into place over the pads if you get it close enough.

BGA is not that much more difficult; look online for videos of the phone repair shops in China which can swap BGA chips in literally minutes.

cmrdporcupine · on Dec 20, 2015

Well maybe I should give it a try. We have the toaster oven set up in our 'maker space' at work, but I haven't tried it yet. There's a few people there well versed in it.

There's a lot of things I would probably play with if I could do SMD.

jjoonathan · on Dec 20, 2015

Looks like a QFP not a BGA so you could get away with a home-etched board.

nibnib · on Dec 21, 2015

Or any of the cheap panel-sharing services that will do small-run PCBs for a few bux.

userbinator · on Dec 20, 2015

Another advantage of the Z80 if you ever want to assemble (or disassemble) programs manually is that its instruction encoding is very consistently arranged in an octal, 2-3-3 pattern:

http://www.z80.info/decoding.htm

This design was likely influenced by the 8080/8085, and x86 followed the same organisation too:

http://reocities.com/SiliconValley/heights/7052/opcode.txt

Due to how it decodes, the 6502 has a 3-3-2 pattern, which doesn't look quite as structured:

http://www.oxyron.de/html/opcodes02.html

http://www.llx.com/~nparker/a2/opcodes.html

http://www.pagetable.com/?p=39

pvg · on Dec 20, 2015

Not likely but for sure - the Z80 was designed to be binary compatible with the 8080. Although I'm not sure it is that big a practical advantage, you start recognizing the instructions fairly quickly after staring at enough dumps.

melted · on Dec 20, 2015

Z80 has beautiful assembly. Super intuitive and user friendly. I had a Spectrum growing up, and 13 year old me had no issues writing complex programs in Z80 assembly. In retrospect, it's kind of incredible that those programs worked, seeing how I only had less than 48K to work with and didn't know shit about programming.

Theodores · on Dec 20, 2015

In the home computing sector the 6502 occupied the high end in the UK with the BBC Micro being the flagship. Meanwhile Z80 was popularised by Sinclair with the ZX Spectrum (which built on the hugely popular ZX81).

Personally as a newbie to programming (as everyone was at the time, to varying degrees), I preferred the simplicity of the 6502 because it only had three registers. Meanwhile, the extra registers on the Z80 also came with programming concepts that were beyond me at the time. I always knew the Z80 was better!!!

Despite knowing the Z80 to be better it was the underdog at the time in the UK home computing sector, more accomplished but used in the cheaper Sinclair style machines with the 6502 in more expensive machines.

Why were the 6502 machines more expensive? The extra chips that went with the 6502 such as the 6522 IO chip and commonly found chips like the AY-3-8910 sound chip provided useful features that were handled very well by these support chips. They cost money. Meanwhile, in Z80 land with the likes of Clive Sinclair, some ULA chip of some sort would be put together. This chip would work with the CPU to do all of these extra functions badly.

On the ZX81 the CPU did everything. This included the screen. In 'fast mode' you would have no screen. With sound on the spectrum, the same story. Rather than have some neat electronics do the sound properly the CPU would stop everything it was doing to make some 'beep'.

So, in 2015, which to go for? If you are not going to be making millions of these boards there is no point saving money on the support logic. Therefore the Z80 - the better CPU - with a proper set of support chips is what to go for.

cmrdporcupine · on Dec 20, 2015

The 6502 has ridiculously good interrupt responsiveness and very tight cycle timing. And it was cheap. It was the best choice for gaming/graphics type systems because of this.

cmrdporcupine · on Dec 20, 2015

It's neat to imagine an alternate history where z80 CP/M machines 'won' the 80s PC wars, and Gary Kildall and DRI became the rich and deservedly powerful software house of the 80s and 90s, instead of Gates/Microsoft; with ascendancy for GEM and a multitasking multiuser CP/M instead of Windows and MS-DOS.

rasz_pl · on Dec 20, 2015

To be fair Kildall believed he deserved $240 for every copy of the os, and was angry at Gates selling dos at $40. He didnt see the big picture.

cmrdporcupine · on Dec 20, 2015

Do you have a source for that?

rasz_pl · on Dec 21, 2015

wiki, books, interviews of IBM employees

cmrdporcupine · on Dec 21, 2015

I ask because there's been a cottage industry of rumours around this stuff, including the (not supported) claim that Kildall was "busy flying his plane" instead of meeting with IBM, etc.

rasz_pl · on Dec 22, 2015

Afair Computer History Museum has an interview with one of IBMers that was present at this meeting.

avifreedman · on Dec 20, 2015

6809 for the win over both, by far!

jejones3141 · on Dec 20, 2015

Amen to that. Alas, I'm not aware of it still being made, though there are FPGA implementations. (And look up "CoCo on a Chip"; it's a project to recreate and eventually enhance the Tandy Color Computer on an FPGA. The fellow doing it has started work to recreate the CoCo 3's GIME chip, which is its MMU and graphics hardware.)

If only Hitachi had publicized the capabilities of the 6309...

Sanddancer · on Dec 20, 2015

Rochester Electronics got the license to start making the 6809 again a few years ago, and they're apparently not available now, though should be shortly.

http://www.embedded.com/electronics-news/4437243/Rochester-b...

[Link to irrelevant part removed, thanks tacos]

tacos · on Dec 20, 2015

Doesn't seem to be in production yet. The part numbers you reference are SOT23-3 voltage regulators.

tacos · on Dec 20, 2015

68HC11 versions are still in production, though not nearly as fun as the 6809.

yuhong · on Dec 20, 2015

Which also reminds me of: https://en.wikipedia.org/wiki/MOS_Technology_6502#Origins_at...

cmrdporcupine · on Dec 20, 2015

Yep if only the 6309 or 6809 was still being made. Beautiful processor.

pflanze · on Dec 20, 2015

I thought that the page that was used for the stack on the C64, or at least the C128 (which I had, i.e. the MOS 8502, which is almost identical to the 6502) could be reconfigured. I remember some code that would use stack pushes to achieve faster filling of memory, and that would of course only have been useful if the location of the stack could be changed. But I just can't find anything on the web that confirms this (all resources say the stack was fixed at page 01, I just think that was the default, not really fixed; as I recall it, the zero page location was really fixed, but the stack wasn't).

to3m · on Dec 20, 2015

As a longtime 6502 programmer I was impressed by the Z80 after reading the data sheet - but actually trying to program it left me nonplussed. The main omissions as I recall were: lack of immediate addressing for the 16-bit arithmetic instructions; immediate instructions generally are more expensive; (IX+n)/(IY+n) addressing modes are heinously slow; no indirect-with-register-offset addressing mode. These all seemed to eat away at the Z80's apparent advantages.

As a somewhat representative example - suppose you have a bunch of objects that you want to work on. On the Z80 you'd probably adopt a modern-style struct system, whereby each object is represented by a block of memory. Offset 0 is X coordinate, offset 1 is Y coordinate, offset 2 is flags, blah de blah. Suppose each object is 8 bytes, and you want to set a flag for each object that's on screen.

    LD DE,(MAX_X<<8)|MIN_X ; 10 10
    LD IY,8                ; 14 24
    LD IX,OBJECT0          ; 14 38
    LD B,NUMOBJECTS        ; 7  45

    .LOOP
    LD A,(IX+0)            ; 19 19
    CP D                   ; 4  23
    JP C,NOTVISIBLE        ; 10 33
    CP E                   ; 4  37
    JP NC,NOTVISIBLE       ; 10 47
 
    .VISIBLE
    LD A,(IX+2)            ; 19 19
    OR VISFLAG             ; 7  26
    LD (IX+2),A            ; 19 45
    ADD IX,IY              ; 15 60
    DJNZ LOOP              ; 13 73
    JP DONE                ; 10

    .NOTVISIBLE
    LD A,(IX+2)            ; 19 19
    AND ~VISFLAG           ; 7  26
    LD (IX+2),A            ; 19 45
    ADD IX,IY              ; 15 60
    DJNZ LOOP              ; 13 73
    .DONE

(For readers born after 1985 :) - numbers in the comments are cycle counts: instruction's count, then cumulative total for this block)

So, for N objects: (45 + N * 120). Plus 10 if the last one was visible.

For the 6502 you'd probably adopt a striped layout, so rather than having each object as a block of data, you'd have an X table, a Y table, a flags table, and so on. So the same again:

    LDX NUMOBJECTS         ; 3  3
    .LOOP
    LDA XS-1,X             ; 4  4
    CMP MINX               ; 3  7
    BCC NOTVISIBLE         ; 3  10
    CMP MAXX               ; 3  13
    BCS NOTVISIBLE         ; 3  16

    .VISIBLE
    LDA FLAGS-1,X          ; 4  4
    ORA #VISFLAG           ; 2  6
    STA FLAGS-1,X          ; 5  11
    DEX                    ; 2  13
    BNE LOOP               ; 3  16
    BEQ DONE               ; 3

    .NOTVISIBLE
    LDA FLAGS-1,X          ; 4  4
    AND #~VISFLAG          ; 2  6
    STA FLAGS-1,X          ; 5  11
    DEX                    ; 2  13
    BNE LOOP               ; 3  16
    .DONE

So, for N objects: (3 + N * 32). Plus 3 if the last one was visible. (I've been a bit scrappy with the cycle counts here, by counting each branch as taken, even though that makes the totals invalid. This is done to favour the Z80, which doesn't appear to execute untaken branches any quicker.)

That's pretty much the 4:1 improvement that's commonly claimed. 1MHz 6502 will beat 3.5MHz Z80; 2MHz 6502 will beat 4MHz Z80.

This might seem like a synthetic benchmark, designed to make the 6502 look better, but this sort of thing crops up fairly often. (I noticed it initially after noting how crappy-looking the code was for a couple of Z80 games I was disassembling; I only realised why after trying to rewrite the snippets in question myself!)

One thing to note in particular here is that in the Z80 case you're two registers down - because IY has been used for the array stride (no immediate 16-bit addition...), and DE has been used for the min/max constants (immediate instructions are more expensive).

And this is all very well as the code is presented here, but in practice, in many cases you'll probably need to use DE in your code. You can work around this by using EXX and having your constants in the shadow register bank, but you're losing 8 T-states per iteration (EXX = 4 T-states). For many purposes you'd be better off using self-modifying code - but now you lose 3 T-states per addition (immediate instructions are more expensive).

So a bit of a disappointment for me. Initial excitement at seeing just how much crazy stuff the Z80 can do rapidly turned into disillusionment as I realised just how long it takes to do anything. It's like the 68000 in this respect.

Good luck beating LDIR with a 6502 at quarter the clock rate, though...

Z80 code is also generally much easier to follow...

Z80 has a 16-bit stack pointer too...

hyperpallium · on Dec 20, 2015

\oblig HN pedantry: In this specific simplified problem, I think you could use stripes for Z80, with x in (HL), flag in (DE), and use literals for max/min (as in 6502) instead of D/E? But it would still be much slower than the 6502, and you run out of registers if need more than two fields of an object...

I'd forgotten (IX+n) instructions were so expensive! It's a shame, because your example is their purpose.

I guess Z80 is a higher-level micro-coded language: easier but slower. I also always found the orthogonal mnenonic style of Z80 much clearer, which I think 6502 could have used. e.g. "LDA $F453,X" could be "LD A,($F453+X)"

PS some references I found: 6502 addressing modes: http://www.dwheeler.com/6502/oneelkruns/asm1step.html

  LDA $F453,X  where X contains 3

Absolute Indexed Addressing - Load the accumulator with the contents of address $F453 + 3 = $F456.

Z80 timings: http://map.grauw.nl/resources/z80instr.php

to3m · on Dec 20, 2015

Hmm... in this case, you might be right. You'd then need to advance 2 registers by 8 each time, and you'd be paying an extra 3 T-states per compare for the fetch of the immediate operand, but you'd be saving 4+ cycles in various places from never needing a DD prefix...

(Actually, MIN_X and MAX_X are supposed to be values you fetch from memory, as you can see from the 6502 code. I have no idea why I wrote what I did for the Z80 version. You could use self-modifying code to put constants in the right place, though - which, thinking about it, is actually what I should have done for the 6502 version.)

yuubi · on Dec 20, 2015

    add ix iy

I'm pretty sure that isn't a thing (the z80 has prefix opcodes for "hl in next instruction is really ix (or (hl) is (ix+dd)" and one for iy, but not one for both in the same instruction like that). Source: worked on a product with a z80 in this decade.

to3m · on Dec 20, 2015

Yes, good point. Sorry. This is not going to help the Z80 much in this particular comparison, is it :(

rdc12 · on Dec 20, 2015

Why would you favour a AOS (Array of Structures) or the SOA (Structure of Arrays), for the different architectures?

to3m · on Dec 20, 2015

The basic operation for accessing a value from a structure is to take the address of the structure, add a compile-time constant offset to it, and use that address. There's an addressing mode specifically for this in the Z80: you put the struct address in a register, and have the constant offset as part of the instruction. So you can do stuff like this: (numbers in comments are: <bytes for this instruction> <bytes total> / <cycles for this instruction> <cycles total>)

    <<put address in IX>>
    LD A,(IX+0)            ; A=SELF.X              3 3  / 19 19
    ADD A,(IX+2)           ; A=SELF.X+SELF.DX      3 6  / 19 38
    LD (IX+0),A            ; SELF.X=SELF.X+SELF.DX 3 9  / 19 57
    XOR A                  ; A=0                   1 10 / 4  61
    LD (IX+2),A            ; SELF.DX=0             3 13 / 19 80

The 6502 doesn't have anything exactly like this. You do have an indirect addressing mode, but the offset goes in a register. So for every access, you need to have the offset into the index register. (There's no point adding the offset to the address; the offset is mandatory, so you'd then just have to load zero into the index register every time.) So the above in 6502:

    <<put address in $70>>
    LDY #2                                         2 2  / 2 2
    LDA ($70),Y            ; A=SELF.DX             2 4  / 5 7
    LDY #0                                         2 6  / 2 9
    CLC                                            1 7  / 2 11
    ADC ($70),Y            ; A=SELF.DX+SELF.X      2 9  / 5 16
    STA ($70),Y            ; SELF.X=SELF.X+SELF.DX 2 11 / 6 22
    TYA                                            1 12 / 2 24
    LDY #2                                         2 14 / 2 26
    STA ($70),Y            ; SELF.DX=0             2 16 / 6 32

Which is rather unwieldy, takes up more space, and is a bit slow (the 4:1 ratio I mentioned in my original past is here closer to 2:1) - even after I rearranged the code a bit so the Y value didn't need reloading so much. You also have the issue that the struct address is a 16-bit quantity, something that is inconvenient to handle on the 6502. There are no instructions that operate on 16-bit data.

One thing the 6502 does have though is an addressing mode where the operand's address is a fixed 16-bit value (from the instruction) plus an 8-bit index register. So, with this in mind, what you could do is have a separate table for each struct field. So this switches things around, and instead of a runtime 16-bit address (of the struct) and a compile-time 8-bit offset (of the field), you have a compile-time 16-bit address (of the field table) and a runtime 8-bit offset (for the row in the table). Which is a perfect fit. Best of all, the indexing is free if you arrange things correctly, which is not that hard to do. Then the code would look more like this:

    <<put row in X>>
    CLC                     ;                        1 1  / 2 2
    LDA XS,X                ; A = SELF.X             3 4  / 4 6
    ADC DXS,X               ; A = SELF.X+SELF.DX     3 7  / 4 10
    STA XS,X                ; SELF.X=SELF.X+SELF.DX  3 10 / 4 14
    LDA #0                  ;                        2 12 / 2 16
    STA DXS,X               ; SELF.DX=0              3 15 / 4 20

And there's that 4:1 ratio again...

Each object's "address" is now an 8-bit quantity (it's just an index into the tables), so much easier to deal with. And moving from one object to the next is very quick, since it's just incrementing a register.

There doesn't appear to be anything analogous to this alternative approach on the Z80.

rasz_pl · on Dec 20, 2015

microcode < state machine

watching z80 in action is actually painful https://www.youtube.com/watch?v=DHeS6nQ_dmY

squeakynick · on Dec 20, 2015

Didn't "The Terminator" use 6502?

Case closed :)

vardump · on Dec 20, 2015

Terminator? Bah.

Bender used... err will use a 6502.

http://spectrum.ieee.org/semiconductors/processors/the-truth...

GPGPU · on Dec 20, 2015

I like the 6502 instruction set better. But it does stink to have to chop up your limited address space for I/O.

yuhong · on Dec 20, 2015

It is sad that they then screwed up the Z80 by doing only 128 cycle refresh.