HyperHelper by Azarona Software is

a shareware hypertext development system

The program created a standalone DOS Binary from text with capabilities for Searching and Hyperlinks, similar to HTML. The only information I could find was an add in February 1992 Edition of PC Magazine.

While the built in browser works well enough, needing to load it in Dosbox makes it a little inconvenient so I wanted to try and extract the text strings from the program, which itself doesn't offer any option to do.

strings is a shell program that can extract text strings from a binary file, but I was only able to find some of the information I was looking for, most of what was found related to HyperHelper itself and not the internal text I was looking for.

So the quick solution I came up with was to make a couple short shell scripts and use xte, gnome-screenshot, Image Magick convert and tesseract.

The first script I wrote used xte to input the keystrokes to scroll through all of the pages in the program and gnome-screenshot to save an image of each page.

#!/bin/bash

screenshots () {
	while true
	do
		x=$((x+1))
		gnome-screenshot -w -B -f ~/Documents/ac94/page$x.png
		xte "keydown Control_L" "key Page_Down" "keyup Control_L" "sleep 1"
	done
}

xte "keydown Alt_L" "key Tab" "keyup Alt_L"

screenshots

To keep things simple the key commands to switch the focus to Dosbox are hardcoded and then the screenshots function contains an infinite loop to go through each page, I stopped the the script manually with ctrl + c to end it. The x counter is used to name the files to keep the pages in order.

The next script, uses convert to create new enlarged grey scale tiff formatted images, from the the png images gnome-screenshot produced.

#!/bin/bash

for ((x=1;x<379;x++))
do
	convert -resize 500% ~/Documents/ac94/images/page$x.png -type Grayscale ~/Documents/ac94/images/page$x.tiff
	tesseract ~/Documents/ac94/images/page$x.tiff ~/Documents/ac94/text$x
done

This script uses another simple loop with a counter that stops at 379, the number of pages in the program.

After convert creates the new image file it is run through tesseract to create a text file.

The last script I wrote was just to combine all of the individual text files, one for each page, into a complete document.

#!/bin/bash

for ((x=1;x<379;x++))
do
	cat ~/Documents/ac94/text$x.txt >> ~/Documents/ac94/master.txt
done

Again another loop with a counter set to the maximun number of pages, this script just uses cat to read each file and append it all into a single file, master.txt

Afterwords I used the Spell Checker in Libre Office to clean up the text file. Overall the majority of the text made it through with some formatting issues. Finding most of the errors was relatively easy with Spell Checker, which was kind of interesting to see how too separate tools handled a similar task.

Overall it wasn't the most elegant solution but got the job done in a relatively timely fashion and saved the data from an old outdated format.