First thing I wanted to get setup was at least get the basics of my build pipeline installed.  This is based on a blog post by Quinn Dunki, things have changed a bit and I wanted to verify the steps.

To get what we need, we will need to install the the following:

  1. cc65 – To compile the code
  2. AppleCommander – To put the compiled executable onto a floppy disk image
  3. Virtual ][ – To boot and test the image

Installing cc65

Installing cc65 on the Mac is pretty easy as there is a bottle under Homebrew to install it directly.  If you don’t have or have never used Homebrew, it’s a great way to install extra software in a way similar to MacPorts or Fink. Homebrew is easy to install, just follow the directions on their site.

Then to install cc65, simply do:

That’s it.  Pretty simple.  But, let’s try a simple test compile:

So far, so good.

Installing AppleCommander

For the build pipeline, we also need AppleCommander.  Luckily, we can also install this with Homebrew by using this Apple II homebrew repository.  First we need to “tap” the repository then we can install AppleCommander from there.

But, let’s make sure this part is working as well.  Let’s put the “helloworld” executable on a DOS 3.3 bootable disk and try it out.  You can get ”

Again, looks good so far.  Next!


  • The image created by -dos140 will not be bootable and, as you will see below, we will boot the DOS System Master 3.3 disk then run our image off the other disk.  In the final, I’ll INIT the test disk before hand and reuse it as needed to boot
  • When putting the executable on the disk, you’ll need to use -as (AppleSingle) flag when using AppleCommander.  This replaces the -cc65 flag mentioned in Quinn’s post.

Installing Virtual ][

So far, we’ve been able to ride the build pipeline for free, but here is where need need to get off that train and pay piper (See what I did there?).  To me, Virtual ][ is the go to emulator for the Mac and is worth every penny it costs.  It 44USD for the full license.  Like I said worth every penny. You can get Virtual ][ here.

Once you install it, you’ll need to get the correct ROM for the machine you want to run.  I run an Apple //e as my physical machine, so I also like to run a //e as my virtual testing environment.  You can find the ROM you need without a lot of digging, so I’ll leave that as a exercise for the reader.

The important part about using Virtual ][ that I’m not sure other emulators do, is that it has AppleScript support so it can be controlled from scripts.  This is important to the build pipeline so it can load and boot the disk image as part of the build.

To verify Virtual ][, let’s boot the “Apple_DOS_3.3_Master.dsk” (found on Archive.org) in drive 1 and the image we created above in drive 2.

Everything looks good.

Notes/updates compared to Quinn’s post (i.e. TL;DR)

  • cc65 can be installed via Homebrew.
  • AppleCommander can be installed via Homebrew after adding the Apple II homebrew repository.
    • The AppleCommand command line executable is “applecommander” not “ac”
    • The “-cc65” flag has been removed and you need to use “-as” or “-dos” as appropriate. In our case, it’s “-as”


Next, I’ll be looking to generate a CMakefile (which CLion uses) to do similar work and link these together.

In order to help motivate me to get some retro-computing work done, I signed up for RetroChallenge 2018/04.  If you don’t know what RetroChallenge is, well it’s an informal contest (that’s not really a contest) for doing retro-computing related projects.  Basically, it’s a way to help incentivize retro-computing enthusiasts to get off their butts and do something cool.

For my project, I have a main goal and some sub goals.  Mainly so I can feel like I accomplished something even if I don’t completely finish it in April.

Here is what I’m planning on doing for this challenge.   I’m planning on writing a game for the Apple ][ that is similar to an iPad game called CargoBot (iTunes link), which is a box sorting game that you need to program the crane to sort the boxes in the fewest amount of commands as possible.

Here are my goals as part of this project.  Mostly in order, but who knows. Feel free to keep score, if you like.

  1. Develop a build pipeline for use with the JetBrains tools (IntelliJ/CLion) similar to, and where I will borrow from, work reference by Quinn Dunki in this blog post.  As she mentions in the post, she is standing on the shoulders of (bald) giants, so I guess I will be standing on (the shoulders of giants)².  And, yes, I’m (mostly) bald, as well, so there is that.
  2. Build the core engine while comparing the tradeoffs for memory usage and performance by trying compact/verbose level formats.  Because, the geek in me wants to make it as small as possible but the gamer in me wants to to actually play it.
  3. Make the rendering of the game to modular so I can render it in any of the following modes:
    1. Text – Easiest way to vet out the engine.
    2. LoRes – Because I can.
    3. HiRes – Because I should.
    4. Double-HiRes – Because I shouldn’t, but I’m gonna anyways.

Some things you might see along the way, so don’t be frightened:

  1. Banging my head on my desk in frustration, because impact-maintenance is a real thing.
  2. Me pulling my hair out, which is a challenge in-and-of itself (see reference to being (mostly) bald above).
  3. Goofy graphics issues.  I’ve played with some HiRes stuff earlier and it’s “interesting” (Minnesota slang for “sucky”).  If you want to point and laugh early, you can look back and see the struggles I’m probably going to have to go through again.
  4. Swearing.  Well, only if you are nearby when this is all happening.  I’ll keep the blog family-friendly but I’ll make no such promises for real-life.

I encourage you to follow along.  We can laugh together (or you can laugh at me), we can cry together (as there may be tears) and hopefully we can play together (well, not together, but you know what I mean).

Let the challenge begin! (Well, tomorrow)

I mentioned in an earlier post that I would post about getting timing routines into my PLASMA test code as well as the C code. Needless to say, it took me way more banging my head and a late night to get it in and working. But, that was all me. I failed to RTFM and then tried to figure out why it wasn’t working

But, let’s back up a bit to give credit where credit is due. I stole/borrowed/adapted the clock routines I’m using in my code (both for PLASMA and C). I finally found a post on comp.sys.apple2.programer from Bill Buckels that had the code in raw opcode format which was then memcpy()’d into the right location in memory (in this case $0260) and then accessed via inline assembly via a JSR call to the right spot.  Brilliant!

But, with my lack of understanding of how PLASMA is laid out, I figured I had better do something a little more portable.  I tried several things trying to convert it to inline assembly on my own, I tried taking the assembly spit out from the monitor and converting the raw memory locations to logical offsets which involved using Virtual ][, printing the ML from the monitor to the virtual printer, saving as PDF and copy/pasting from there.  Which was a nightmare as the output in the PDF is not sequenced how I would have expected:

Screen Shot 2016-04-11 at 11.27.14 AM

Thanks to a tip from David Schmidt, I took a look at the code in ADTPro for the clock routines.  That’s all in assembly with logical offsets!  Woohoo!  I converted it into the assembly style that PLASMA wants and gave it a shot.  No go.  Time to figure out why.

At this point, I wish I has taken some screenshots of what I was doing as it would be nice to have.  I’ll be better about that in the future.

I had my PLASMA code print out the memory location ($4047, I think it was) for the function that as the inline assembly in it and went into the monitor and took a look.  If you look at the code in the picture above, you can see that the first STA instruction is $7e after the start of the routine ($260).  You’d expected to see the STA of this new code to be $7e past $4047, right?  Nope!  It was at $10B2. Well, there’s your problem.  I could get the offsets to be right in that code if I used “–setpc 16401” on the call to the ACME assembler, but then the entrance location to my PLASMA code was off and nothing would run.

After hours of digging around and trying various things, I decided I needed to reach out to see if I hit a bug (unlikely) or if I was doing something wrong (very likely).   After posting to comp.sys.apple2.programmer, David Schmenk got be straightened around.

Here is where the RTFM failure part comes in.  Here is a section from the PLASMA readme about Native Assembly Functions:

Lastly, PLASMA modules are re-locatable, but labels inside assembly functions don’t get flagged for fix-ups. The assembly code must use all relative branches and only accessing data/code at a fixed address.

Then I set off on a “damn fool idealistic crusade” to implement the code code in C (and then PLASMA) directly.  I tried.  Boy, did I try. But, apparently my reading of the assembly and trying to do it in something else was failing miserably. I tend to do that.  Wanting to do things the “right” or “best” way instead of doing it the “working way”.  Sometimes, it’s best to just use the “working way”.  Especially, since I only wanted it to do some performance testing.

Back to using the raw code and memcpy()’ing it in.  That was working fine, except my loop from 1 to 10 in my test program ran way more than 10 times.  I realize now, this was RTFM failure #2:

Data passed in on the PLASMA evaluation stack is readily accessed with the X register and the zero page address of the ESTK. The X register must be properly saved, incremented, and/or decremented to remain consistent with the rest of PLASMA. Parameters are popped off the evaluation stack with INX, and the return value is pushed with DEX.

David to  the rescue again.  Added in the code to save/restore X and DEX and good to go!

Here is the code for the timers. It’s basically a simple stopwatch with one lap timer included. Start the timer then you can ask for the elapsed time. You can do a lap reset to get individual times while the main timer is unaffected.


import cmdsys
    predef memcpy

const nscdata = $303

byte timer_year, timer_month, timer_date, timer_day, timer_hour, timer_minute, timer_second, timer_hundredth
byte lap_year, lap_month, lap_date, lap_day, lap_hour, lap_minute, lap_second, lap_hundredth
byte tmp_year, tmp_month, tmp_date, tmp_day, tmp_hour, tmp_minute, tmp_second, tmp_hundredth

byte nsccode[] = $a9,$00,$8d,$de,$02,$a9,$03,$09,$c0,$8d,$1f,$03,$8d,$22,$03,$8d,$31,$03,$8d,$3f,$03,$a9,$03,$8d,$df,$02,$d0,$16,$00,$00,$00,$00
byte           = $00,$00,$2f,$00,$00,$2f,$00,$00,$20,$00,$00,$3a,$00,$00,$3a,$00,$00,$8d,$20,$0b,$03,$a2,$07,$bd,$03,$03,$dd,$e0,$02,$90,$0f,$dd
byte           = $e8,$02,$b0,$0a,$ca,$10,$f0,$ce,$df,$02,$d0,$e6,$18,$60,$ee,$de,$02,$ad,$de,$02,$c9,$08,$90,$af,$d0,$1d,$a9,$c0,$a0,$15,$8d,$1b
byte           = $03,$8c,$1a,$03,$a0,$07,$8d,$1f,$03,$8c,$1e,$03,$88,$8d,$6f,$03,$8c,$6e,$03,$a9,$c8,$d0,$95,$a9,$4c,$8d,$16,$03,$38,$60,$00,$00
byte           = $00,$01,$01,$01,$00,$00,$00,$00,$64,$0d,$20,$38,$98,$3c,$3c,$64,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00
byte           = $18,$90,$09,$00,$00,$00,$00,$00,$00,$00,$00,$38,$08,$78,$a9,$00,$8d,$04,$03,$8d,$80,$02,$ad,$a3,$03,$ad,$ff,$cf,$48,$8d,$00,$c3
byte           = $ad,$04,$c3,$a2,$08,$bd,$bf,$03,$38,$6a,$48,$a9,$00,$2a,$a8,$b9,$00,$c3,$68,$4a,$d0,$f4,$ca,$d0,$ec,$a2,$08,$a0,$08,$ad,$04,$c3
byte           = $6a,$66,$42,$88,$d0,$f7,$a5,$42,$9d,$7f,$02,$4a,$4a,$4a,$4a,$a8,$a5,$42,$c0,$00,$f0,$08,$29,$0f,$18,$69,$0a,$88,$d0,$fb,$9d,$02
byte           = $03,$ca,$d0,$d7,$ad,$80,$02,$8d,$83,$02,$68,$30,$03,$8d,$ff,$cf,$a0,$11,$a2,$06,$bd,$c7,$03,$99,$80,$02,$bd,$80,$02,$48,$29,$0f
byte           = $09,$30,$88,$99,$80,$02,$68,$4a,$4a,$4a,$4a,$d0,$0c,$e0,$01,$f0,$04,$e0,$04,$d0,$04,$a9,$20,$d0,$02,$09,$30,$88,$99,$80,$02,$88
byte           = $ca,$d0,$d1,$28,$b0,$19,$20,$be,$de,$20,$e3,$df,$20,$6c,$dd,$85,$85,$84,$86,$a9,$80,$a0,$02,$a2,$8d,$20,$e9,$e3,$20,$9a,$da,$60
byte           = $5c,$a3,$3a,$c5,$5c,$a3,$3a,$c5,$2f,$2f,$20,$3a,$3a,$8d

asm _initnsc
        jsr $0260

asm _readnsc
        jsr $030B

export def loadnsccode
    memcpy($0260, @nsccode, $16e);

export def initnsc

export def gettime(timedata)
    memcpy(timedata, nscdata, 8)

export def timer_start
    memcpy(@lap_year, @timer_uear, 8)

export def timer_elapsed
    word d, h, m, s, hd
    d = tmp_date - timer_date; h = tmp_hour - timer_hour; m = tmp_minute - timer_minute; s = tmp_second - timer_second; hd = tmp_hundredth - timer_hundredth;

    return (((d*24+h)*60+m)*60+s)*100+hd

export def timer_lap_reset

export def timer_lap_elapsed
    word d, h, m, s, hd
    d = tmp_date - lap_date; h = tmp_hour - lap_hour; m = tmp_minute - lap_minute; s = tmp_second - lap_second; hd = tmp_hundredth - lap_hundredth;

    return (((d*24+h)*60+m)*60+s)*100+hd


C Code

Adapted from a post by Bill Buckels

#include <stdio.h>
#include <string.h>
#include <conio.h>
#include "realtime.h"

#define READ_TIME_ADDR 0x260
#define READ_TIME_LEN  366

/* The READ.TIME program Version 1.4 (C) Copyright Craig Peterson 1991 */
char _read_time[READ_TIME_LEN] = {

struct nsctm timer, lap, tmp;

#pragma optimize (push,off)
void initnsc(void)

    char *brunptr = (char *)READ_TIME_ADDR;

    /* bload read.clock to $260 */

	asm("JSR $260"); /* call init clock */

#pragma optimize (pop)

/* read the current date time and time from the NSC */
#pragma optimize (push,off)
void gettime(struct nsctm *output)
	asm("JSR $30B"); /* call read clock */

    memcpy(output, (char *)0x303, 8);
#pragma optimize (pop)

void timer_start()
    memcpy(&lap, &timer, 8);

int timer_elapsed()
    int d, h, m, s, hd;
    d = tmp.date - timer.date; h = tmp.hour - timer.hour; m = tmp.minute - timer.minute; s = tmp.second - timer.second; hd = tmp.hundredth - timer.hundredth;

    return (((d*24+h)*60+m)*60+s)*100+hd;

void timer_lap_reset()

int timer_lap_elapsed()
    int d, h, m, s, hd;
    d = tmp.date - lap.date; h = tmp.hour - lap.hour; m = tmp.minute - lap.minute; s = tmp.second - lap.second; hd = tmp.hundredth - lap.hundredth;

    return (((d*24+h)*60+m)*60+s)*100+hd;

Again, strikingly similar, but that is what I was after. Comparing apples to apples (pun intended!)

Next I’m going to take a look at some more comparisons. Thing I was to look at (some based on suggestions) are things like timings for different routines, cycles for different operations and size comparisons.

I wanted to get some timings for PLASMA vs C for a few operations. I’m sticking with my “moving monster” theme and tracked the time for doing two different operations.

  1. Drawing a frame of the monster (100 times), which involves
    • Flipping HGR pages
    • Getting the page address, getting Y address (lookup),  the byte for X (lookup) and adding them together
    • Getting frame for X (lookup) and calculating the offset to get to the correct frame
    • Memcpy() the data to memory
  2. Do a simple no op for loop from 1 to 500 (100 times)

I fully admit that this is not an exhaustive test, but I just wanted to get an idea of how they compare. Again, this is not a “C is faster/better” post. PLASMA is impressive tech regardless of the times. It’s just out of my pure curiosity.

I added pretty much identical timing routines on the PLASMA and C side (after much time spent banging my head), but I’ll post on that later.


Note: Times are in cs (centiseconds, i.e. 100ths)

100 Frames

C: 147 cs
PLASMA: 228 cs (155%)

Loop 500

C: 530 cs
PLASMA: 1368 cs (258%)


Because, I like videos.



While playing around with PLASMA and working a some timing routines (more on that later), I found I needed to expand my build chain to be able to include multiple PLASMA modules in to one disk when booting.

I also didn’t like having to specify an environmental variable to set the source file for simple builds. For building a single .pla file and running it, I wanted something easier. This new makefile satisfies both of those requirements.

I did end up moving away from generating the # style files that I think CiderPress wants. Mainly because I’m using AppleCommander to build my disk images. I decided to use .mod (PLASMA “module” was the inspiration) as the intermediary file extension.



.PRECIOUS: %.dsk

	-rm -f *.a *.mod

%.run: %.dsk
	osascript plasma_run.scpt `pwd` $*

%.dsk: %.mod $(patsubst %,%.mod,$(EXTRA))
	cp template.dsk [email protected]
	java -jar AppleCommander.jar -d [email protected] $*
	java -jar AppleCommander.jar -p [email protected] $* $(DSKTYPE) 0x$(ADDR) < $*.mod
	-if [ ! -z "$(EXTRA)" ]; then \
		for o in "$(EXTRA)"; \
		do \
			java -jar AppleCommander.jar -d [email protected] $$o ;\
			java -jar AppleCommander.jar -p [email protected] $$o $(DSKTYPE) 0x$(ADDR) < $$o.mod ;\
		done ;\

%.mod: %.a
	acme --setpc 4094 -o [email protected] $?

%.a: %.pla
	plasm -AM < $? > [email protected]


This can be used in a few different ways. This simplest is to just run make passing in the name of your .pla file with “.pla” replaced with “.dsk” to build the disk image, or “.run” to build the disk image and boot it in Virtual ][.

You can technically even run make and use “.a” and get the .a file out of PLASMA. It’s all generic so any of the intermediaries will work. Use “.mod” to get the compiled binary file, you can then use that with whatever tool you’d want to put it on a disk.

To have it build and include additional PLASMA modules, set the EXTRA environmental variable to the list of the files to include without the .pla extension

Note: Besides the .dsk (which is marked as .PRECIOUS) all intermediaries are removed.


Make .dsk
% ls -l hello.pla
-rw-r--r--  1 mfinger  staff  65 Apr  8 21:42 hello.pla
% make hello.dsk
plasm -AM < hello.pla > hello.a
acme --setpc 4094 -o hello.mod hello.a
cp template.dsk hello.dsk
java -jar AppleCommander.jar -d hello.dsk hello
hello: No match.
java -jar AppleCommander.jar -p hello.dsk hello rel 0x1000 < hello.mod
if [ ! -z "" ]; then \
		for o in ""; \
		do \
			java -jar AppleCommander.jar -d hello.dsk $o ;\
			java -jar AppleCommander.jar -p hello.dsk $o rel 0x1000 < $o.mod ;\
		done ;\
rm hello.mod hello.a
% java -jar AppleCommander.jar -ll hello.dsk

  PRODOS  Destroy Read Rename Write SYS  035 09/19/2007 05/06/1993 17,128 $0000 0002 0008 Sapling Changed 0 4
  CMD  Destroy Read Rename Write SYS  010 04/01/2016 04/01/2016 4,141 A=$2000 0002 0029 Sapling Changed 0 0
  HELLO  Destroy Read Rename Write REL  001 04/08/2016 04/08/2016 55 $2000 0002 0037 Seedling Changed 0 0
  PLASMA.SYSTEM  Destroy Read Rename Write SYS  007 04/01/2016 04/01/2016 2,901 A=$2000 0002 002F Sapling Changed 0 0
ProDOS format; 112,640 bytes free; 30,720 bytes used.
Make .run
% make hello.run
plasm -AM < hello.pla > hello.a
acme --setpc 4094 -o hello.mod hello.a
cp template.dsk hello.dsk
java -jar AppleCommander.jar -d hello.dsk hello
hello: No match.
java -jar AppleCommander.jar -p hello.dsk hello rel 0x1000 < hello.mod
if [ ! -z "" ]; then \
		for o in ""; \
		do \
			java -jar AppleCommander.jar -d hello.dsk $o ;\
			java -jar AppleCommander.jar -p hello.dsk $o rel 0x1000 < $o.mod ;\
		done ;\
osascript plasma_run.scpt `pwd` hello
rm hello.mod hello.a
Including EXTRA
% ls -l timer.pla test.pla
-rw-r--r--  1 mfinger  staff   598 Apr  8 21:32 test.pla
-rw-r--r--  1 mfinger  staff  3710 Apr  8 13:26 timer.pla
% EXTRA=timer make test.dsk
plasm -AM < test.pla > test.a
acme --setpc 4094 -o test.mod test.a
plasm -AM < timer.pla > timer.a
acme --setpc 4094 -o timer.mod timer.a
cp template.dsk test.dsk
java -jar AppleCommander.jar -d test.dsk test
test: No match.
java -jar AppleCommander.jar -p test.dsk test rel 0x1000 < test.mod
if [ ! -z "timer" ]; then \
		for o in "timer"; \
		do \
			java -jar AppleCommander.jar -d test.dsk $o ;\
			java -jar AppleCommander.jar -p test.dsk $o rel 0x1000 < $o.mod ;\
		done ;\
timer: No match.
rm test.mod test.a timer.mod timer.a
% java -jar AppleCommander.jar -ll test.dsk

  PRODOS  Destroy Read Rename Write SYS  035 09/19/2007 05/06/1993 17,128 $0000 0002 0008 Sapling Changed 0 4
  CMD  Destroy Read Rename Write SYS  010 04/01/2016 04/01/2016 4,141 A=$2000 0002 0029 Sapling Changed 0 0
  TEST  Destroy Read Rename Write REL  001 04/08/2016 04/08/2016 423 $2000 0002 0037 Seedling Changed 0 0
  TIMER  Destroy Read Rename Write REL  003 04/08/2016 04/08/2016 927 $2000 0002 0039 Sapling Changed 0 0
  PLASMA.SYSTEM  Destroy Read Rename Write SYS  007 04/01/2016 04/01/2016 2,901 A=$2000 0002 002F Sapling Changed 0 0
ProDOS format; 111,104 bytes free; 32,256 bytes used.


Here is a video showing it using the “.dsk” and “.run” versions:

I wanted to compare PLASMA with CC65 on several different points. At this point, with my limited experience with PLASMA, I’ll just start with:

  • Easy of understanding/similarity
  • Speed

I took my “moving monster” test program and rewrote it using PLASMA to compare it to how I had it written in C.  Having read that PLASMA took some inspiration of it’s structure from modern languages, I was pleasantly surprised how similar the code for each is and how easy the port was. It actually helped me improve my C code a bit as well.

C code

// Put image on screen
void putImage(imageData *image, char page, char x, char y) {
    char b, f, r;
    // Convert X to byte offset
    b = xToByte[x];
    // Convert X to needed shift frame
    f = xToFrame[x] * image->height*image->width;
    // Draw frame line by line
    for (r = 0; r < image->height; r++) {;
        memcpy((char *)(hgrpage[page] + yToAddr[y + r] + b), image->data + f + (r * image->width), image->width);

int main() {
    int x = 0;
    int count = 0;
    // Clear both Hi-Res pages (Bad: Clearing holes too!)
    memset((char *)0x2000, 0, 0x2000);
    memset((char *)0x4000, 0, 0x2000);
    // Activate graphics
    POKE(-16304, 0);
    // Full screen graphics
    // Hi-Res graphics
    // Put initial image on non-displayed page so when we flip it's there
    putImage(&image, !page, 0, 30);
    // Move across the screen by 2
    for(x=2; x <= 200; x+=2) {
        // Flip page
        page = !page;
        POKE(showpage[page], 0)
        // Draw new image on non-displayed page
        putImage(&image, !page, x, 30);
        // Pause

    // Go back to page 0 (1)
    POKE(showpage[0], 0)

    // Text mode
    POKE(-16303, 0);



// Put image on screen
def putImage(imgdata, imgheight, imgwidth, page, x, y)
    byte b, f, r

    // Convert X to byte offset
    b = xToByte[x]

    // Comvert X to needed shift frame
    f = xToFrame[x] * imgwidth * imgheight

    // Draw frame line by line
    for r = 0 to imgheight-1
        memcpy(hgrpage[page] + yToAddr[y + r] + b, imgdata + f + (r * imgwidth), imgwidth)

// Clear both Hi-Res pages (Bad: Clearing holes too!)
memset(hgr1, 0, $2000)
memset(hgr2, 0, $2000)

// Activate graphics

// Full screen graphics

// Hi-Res graphics

// Put intial image on non-displayed page so when we flip it's there
putImage(@data, height, width, (!page&$01), 0, 30)

// Move across screen by 2
for x = 2 to 200 step 2

    // Flip page
    page = (!page&$01)

    // Drw new image on non-displayed page
    putImage(@data, height, width, (!page&$01), x, 30)

    // Pause
    for count = 1 to 500

// Go back to page 0 (1)

// Text mode

As you can see, they are very similar. Should be an easy move over for people familiar with C/Java and languages of that ilk. Very impressive.

Next I took a look at performance. When I originally started looking at comparing performance, I was shocked at the speed difference between the two (which I’ll show shortly). That was before I realized that I was wrong about PLASMA.

I was thinking that PLASMA was more of a “pre-assembler” or “pre-compiler” that took high level structures and generated 6502 assembly for the corresponding code. It actually produces byte-code that is then run under the PLASMA VM. This can be sped up by writing raw assembly for routines that need more power. Silly me.

Now, I don’t consider that a bad thing for the same reason I don’t consider it a bad thing for Java vs C. It’s just a different approach and both have their merits.

C Performance

PLASMA Performance

As you can see in the above videos, without some native assembly to do some of the heavy lifting where needed, the C compiled code runs much faster than the PLASMA code. With a byte-code VM, that is to be expected.

Again, I want to reiterate, this is not a bash on PLASMA at all. On the contrary, even with the little I’ve worked with it I’m very impressed with it and it’s an amazing piece of engineering. Especially doing a byte-code/VM on a 8-bit platform. Well done, well done.

I’m working on getting some timing routines in both the C side and the PLASMA side that will read from the No-Slot Clock, since it gives hundredths of seconds resolution. Then I’ll publish some exact numbers comparing the two. Again, not as a “C is faster/better” but just to show some of the trade-offs.

I decided as part of my efforts to get back into programming on my Apple ][‘s that I’d also explore other newer technologies that are available on the development side.

Thanks to a recent issue of Juiced.GS (Vol 21, Issue 1), I thought I’d try out PLASMA (Proto Language AsSeMbler for Apple) from Davis Schmenk. It (like it says) is a proto-assembly language that has a lot of features of modern language normally not available in assembly.  I’ve not dug into the language much beyond reading the article (“Programming with PLASMA: Developing a chat client”) in Juiced.GS and reading through some of the sample code, but it does look very interesting.

But, thanks to the great work on the Xcode build pipeline for C[AC]65 that I mentioned in an early post, I’m spoiled in having a quick build pipeline.  Write code, click build, watch it run.  So, I figured by “standing on the shoulders of giants” I’d put together a proof of concept way to do something similar with PLASMA.

Requirements are simple:  Write code, run a build, watch it run.

Digging into the work Quinn Dunki posted about here, I took  the Applescript code and the makefile and adapted it to work for what I needed.  I did it outside of Xcode for this case for a couple of reasons.  First is that Xcode won’t really understand PLASMA code in a way that is beneficial (no completion or highlighting) and second is that I don’t really like Xcode very much.  So, vi and make it is.  Makes me all nostalgic.

Here is my adapted Applescript code (Really only changed a – to a +):

-- Stolen/Adapted from: Blondihacks Makefile script for Virtual ][ (http://www.quinndunki.com/blondihacks)
-- Boots the disk image for the program and runs it inside PLASMA

on run argv
	set TARGETPATH to item 1 of argv
	set PGM to item 2 of argv

	tell application "Virtual ]["

		tell front machine
			eject device "S6D1"
			insert TARGETPATH & "/" & PGM & ".dsk" into device "S6D1"
			delay 0.5
			delay 0.5
			type line "+" & PGM
		end tell
	end tell
end run

Here is my makefile:

PGM?=$(shell basename $(SRC) .pla)



all: disk

run: disk
	osascript plasma_run.scpt `pwd` $(PGM)

disk: $(PGM).dsk

	-rm -f $(OBJ) $(PGM).a $(PGM).dsk

	-rm -f *.a *\#*

$(PGM).dsk: $(OBJ)
	cp template.dsk $(PGM).dsk
	java -jar AppleCommander.jar -d $(PGM).dsk $(PGM)
	java -jar AppleCommander.jar -p $(PGM).dsk $(PGM) $(DSKTYPE) 0x$(ADDR) < $(OBJ)

%\#$(TYPE)$(ADDR): %.a
	acme --setpc 4094 -o [email protected] $?

$(PGM).a: $(SRC)
	$(PLASM) -AM < $? > [email protected]

Again, this may be too limited at the moment as I don’t have a deep understanding of PLASMA and project structure, linking, etc.  But, for this case simply set the SRC environmental variable to point to your plasma code and run make.

Here is an example (Note: I’ve tweaked the makefile a bit since the video):

Now it’s time to start writing some of my own code and experiment with the language.

I’ve been using GO (golang.org) for the last several months and really like the language, which I can go into at another time.

Lately, one of the processes that I’ve written seems to get into a site where the CPU of the process is extremely high even though the process is basically in an idle state:

top - 13:06:53 up 152 days,  4:04,  1 user,  load average: 11.99, 11.30, 11.25
Tasks: 348 total,   1 running, 347 sleeping,   0 stopped,   0 zombie
%Cpu(s): 48.4 us,  2.4 sy,  0.0 ni, 49.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  32900140 total, 32371752 used,   528388 free,       44 buffers
KiB Swap: 33509372 total,  2151948 used, 31357424 free. 22511692 cached Mem

16115 mfinger   20   0 2637812 1.105g   5972 S 616.7  3.5   3883:11 xxxxxx
16134 mfinger   20   0 2504232 728572   6128 S 610.1  2.2   2909:37 xxxxxx

After looking around, I remembered that go has profiling built in. I added a few lines to my code, namely:

import _ "net/http/pprof"


go func() {
       log.Println(http.ListenAndServe(":6060", nil))

Then I ran the profile tool built into GO:

% go tool pprof -png http://host:6060/debug/pprof/profile &gt; cpu.png
Fetching profile from http://host:6060/debug/pprof/profile
Please wait... (30s)
Saved profile in /Users/Mfinger/pprof/pprof.host:6060.samples.cpu.008.pb.gz

Let’s look at the results:


Let’s look at the heap, as well:

% go tool pprof -png  http://host:6060/debug/pprof/heap > heap.png
Fetching profile from http://host:6060/debug/pprof/heap
Saved profile in /Users/Mfinger/pprof/pprof.host:6060.inuse_objects.inuse_space.006.pb.gz

Very nice. Now to try to fix the issue.

Reading about odd/even frames and bytes was pretty, well, confusing at the beginning.  Took me a few times to get through it and experiment, but I figured it out.

Firstly, there are two choices:

  1. Move one pixel at a time, which really turns into only moving every other cycle.
  2. Move two pixels at a time, which actually looks okay.

There really is no way to move 1 pixel and not change colors which, looking back (again), makes total sense.

Secondly, I figured out the odd/even frames and bytes logic.

In the book, he generates frames at bit shift of 0, 2, 4, 6 as frames 0, 1,  2, and 3 then generates frames at bit shift offset of 1, 3, and 5 as frames 4, 5 and 6.  Which means show even offset frames in even bytes and odd offset frames in odd bytes.  The tricky part is right at the middle of the frames at frame 3.

If we plot this out over 14 shifts (enough to get through both an even and odd byte)

  • X = 0/1 we show frame 0 (even byte, even offset frame)
  • X = 2/3 we show frame 1 (even byte, event offset frame)
  • X = 4/5 we show frame 2 (even byte, event offset frame)
  • X = 6/7 we show frame 3
    • Except 7 is in the odd byte, but if we move it to the first odd frame 4) then we show frame 3 for 1 cycle and show frame 4 for 3 cycles.
    • And we can’t put an even offset frame in an odd byte or the color will change.
    • The fix is that for X = 7, we actually just do X = 6 again.  Put frame 3 in the even byte
  • X = 8/9 we show frame 4 (odd byte, odd offset frame)
  • X = 10/11 we show frame 5 (odd byte, odd offset frame)
  • X = 12/13 we show frame6 (odd byte, odd offset frame)

Not sure that made it any clearer, maybe some code will.  I have a lookup table that you index into with your X value and you get back byte # and frame #.  Unlike the book, I generate the frames in bit-shift order to keep even/odd consistent between byte, offset and frame #.

char xToByteFrame[280][2] = {
{ 0, 0 },
{ 0, 0 },
{ 0, 2 },
{ 0, 2 },
{ 0, 4 },
{ 0, 4 },
{ 0, 6 },
{ 0, 6 },
{ 1, 1 },
{ 1, 1 },
{ 1, 3 },
{ 1, 3 },
{ 1, 5 },
{ 1, 5 },

Notice there are 8 entries that update byte offset 0 and 6 that update byte offset 1.  The second { 0 , 6 } handles the fix for X = 7.Screen Shot 2016-03-18 at 11.58.30 PM

You can see that we have 2 at each X off set.  This is a move from 0-13 moving by 1 pixel.  I put each frame below the previous one for comparison.

Here is the final product of the little monster man moving across the screen.  I opted to move 2 pixels at a time, I could have halved the delay between moves and moved by 1 but why copy unneeded data around.

Here is my main() code.

Screen Shot 2016-03-19 at 12.17.04 AM

putImage() takes care of figuring our what frame needs to be displayed based on the X value passed in.

More progress In the right direction.

Apparently, moving a (mostly) white object is as easy as I thought it was.  My tool generated the 7 needs frames and they progressed nicely across the screen.  The only tricky part is I need to turn an X value into two different values:

  1. Byte # in row
  2. Frame # to display

This was pretty easy (or so I thought, more on that below).  Take X divide it by 7 and round down to get the byte # in the row.  Take X mod 7 (i.e remainder) and you get the bit offset with in the byte which corresponds to the frame.  I’m worried that math is also too much work to do every movement, so I generated and lookup table for X to Byte/Bit but it’s 2 bytes for each column so that’s another 560 bytes for lookup tables.  Remember we’re working with things on the order of magnitude of 32-48k.  So that’s a total of 944 bytes for lookups, almost a whole K.  I’ll need to figure out which is better doing some testing, for now lookup table it is.

We’re good, right?  For non-white objects, no so much:

Reading further in the book (yes, I end up working ahead when perhaps I shouldn’t), looks like I need to handle odd/even frames for odd/even bytes differently.  Oh, the fun never ends.

Which, when I think about it, makes total sense. Here is the first frame as a bitmap:

Screen Shot 2016-03-18 at 9.07.22 PM

And here is the second:

Screen Shot 2016-03-18 at 9.07.38 PM

The first one is all on green pixels (the G at the bottom) and the second is all on violet pixels (the V). I could just move two bit at a time, the the second (displayed) frame would be:

Screen Shot 2016-03-18 at 9.19.12 PM

So, it’s back to green like I want.  That seems cheap like a 2-bit suit (Ok, I had to).  But, it does feel like cheating.  Maybe that is what I’ll need to do and what games do and we don’t know it, but I don’t think so.

Time to read more and see, looks like the parts I’ve moved through so far at the “easy parts”.  Figures.