Introduction
Gek is a simple programming language for manipulating text and files. It's used on the command-line and in scripts to get what you want. This tutorial covers the language like a little book but is for the Unix user already familiar with the command-line and regular expressions.Installation
Gek has only just released and might not be in your distro's package manager yet. If it's not there, it's easy to install from the source. All you need is Git and the Go programming language compiler, which are very likely in your package manager. Then run:$ git clone https://git.sr.ht/~geb/gek && gek/install.sh
Big First Steps
Gek programs usually take in input and spit out output. The programs are sequences of words that each do an action, and without doing anything special the words are executed for each line of input. Here's a program that prints each line UPPERCASE and prefixed with the line number:n ! r upper $
It's got five words, n
gives us the number and !
prints it, r
gives us the line, upper
makes it uppercase and $
prints it plus a newline. The r
stands for record.
We can run it on the command-line like this: $ gek 'n ! r upper $'
That'll have gek read from stdin, but we can pass input files like this: $ gek 'n ! r upper $' myfile myfile2
It's too smooshed, lets have a colon and a space after the line number: n ! ": " ! r upper $
Before I explain what on earth's going on, I'd like to introduce the words B
and E
. We can execute code at the beginning of the program with B
and execute code at the end of the program with E
. The code to execute comes after them terminated by a ;
so here's our beloved program now printing "hello" at the beginning and "bye" at the end: B "hello" $ ; n ! ": " ! r upper $ E "bye" $ ;
Programs that are just B
blocks terminate without reading input, so here's a Hello Word: B "hi" $ ;
Gek is what's called a stack-based language. A stack in programming is a list of elements where the most recently added element is the first to be removed, like a stack of plates where you add and take from the top. All the words take their arguments from "the stack" and put their return values onto it. When we go r $
, the r
puts the current line on the stack and $
just takes and prints what's on the top of the stack. If we felt so inclined, we could write n ! r $
as r n ! $
.
Say the fourth input line is "ham", here's what happens with n ! r $
.n
puts "4" on the stack: "4"!
takes the top element off the stack and prints it: <empty>r
puts "ham" on the stack: "ham"$
takes the top element off the stack and prints it and a newline: <empty>
Here's what happens with r n ! $
:r
puts "ham" on the stack: "ham"n
puts "4" on the stack: "ham" "4"!
takes the top element off the stack and prints it: "ham"$
takes the top element off the stack and prints it and a newline: <empty>
We could use this program to achieve nothing: r
It just puts each line on the stack and doesn't print anything. We can see what's happening with ?
who shows us the stack: r ?
This program prints the number of input lines: E n $ ;
But for illustrative purposes, so does this: B 0 ; 1 + E $ ;
When literal numbers are encountered they put their value on the stack, the same goes for the literal text we've seen. The +
word takes two arguments and returns them added together. Here zero is our initial value, 1 +
increments it every line and $
prints the accumulation at the end. See how 1 +
is like little pipeline that takes a number and increments it by one.
This program sums the numbers on each line: fields sum $
So running: $ printf %s\\n '3 4' '2 8 0.5' '-1 9' | gek 'fields sum $'
We get: 7
10.5
8
The word fields
puts the non-whitespace parts of the line on the stack and the count of how many, and sum
takes a count and takes and sums that many elements. Let's run through it.On the first line
fields
puts "3" and "4" on the stack plus a count of how many: "3" "4" "2"sum
takes the top value, "2", and sums that many elements: "7"$
takes the top element and prints it and a newline: <empty>On the second line
fields
puts "2", "8" and "0.5" on the stack plus a count of how many: "2" "8" "0.5" "3"And so on. We say words like
sum
take a list, meaning they take a count argument specifying how many more arguments to take. If we wanted to sum the top three elements we'd use 3 sum
: B 4 5 6 3 sum $ ;
15
The fact we specify the count lets us leave values on the stack: B 10 4 5 6 3 sum + $ ;
25
And if we wanted to use the whole stack, there's depth
who returns how many elements there are on the stack: B 4 5 6 depth sum $ ;
15
The word #$
prints multiple elements each on their own line: B "+-------------+" "|Stud Lad John|" "+-------------+" 3 #$ ;
+-------------+
|Stud Lad John|
+-------------+
The word keep
takes a list and another count specifying how many to keep: B "a" "b" "c" "d" "e" 5 3 keep depth #$ ;
c
d
e
Here's a program that prints the last ten lines of input: r depth 10 keep E depth #$ ;
It puts each line on the stack limiting it to ten elements, and prints what's there at the end.
What if we give the +
word text?All the values are actually strings, that is strings of characters, but a word might take them to represent text, a number or regular expression. For example, the
+
word treats its arguments as numbers: B "123" "456" + $ ;
579
Whereas the word ,
that joins two strings, treats them as characters: B "123" "456" , $ ;
123456
The values have no intrinsic type, it's the words that matter. There's also no difference between B 123 456 + $ ;
and B "123" "456" + $ ;
but the number literals are pretty.
Alright but what if we give +
"ab%3gOt%4!" ?All the words that take a number argument determine the number it represents the same way, they skip whitespace and give up after a character that's not a sign, point or digit. You can see what happens running
0 + $
on a string: "123abc" 0 + -> "123"
"abc123" 0 + -> "0"
" +-+--+123.456 " 0 + -> "-123.456"
And for completeness, string literals can be in `backticks` as well as "double quotes". It's nice having the two so we can quote the quote characters: B `"sup" quoth John` $ ;
String literals are truly literal, there's no expansion or escape sequences like in other languages (but we'll see alternatives and Unicode later).
To recap, Gek programs consist of words and string literals, and loop over input.The word
r
returns the current line and n
returns the line number.The word
!
prints a string and $
prints a string plus a newline.The words
B
and E
execute code at the beginning/end of the program.The word
#$
prints strings on their own lines.The word
fields
returns a list of the non-whitespace parts of the line.The word
sum
sums a list.The word
depth
returns the number of elements on the stackThe word
+
adds two arguments, and I didn't mention -
, *
and /
.Importantly the word
?
shows the stack.There are no types, which make sense since the input data doesn't have any types either.
The stack keeps things short and simple. Can you write a program that sums all the numbers of the entire input?
Programs in Files
Instead of passing our programs on the command line, we can store them in files and use the -f option:$ gek -f myprog.gek
On Unix we can go further making file executable and adding a shebang line like: #! /usr/bin/env -S gek -f
To gek #!
is just a word that ignores the rest of the line, so the space after it is important.
Gekdoc and Notation
If you ever want info on a word, Gek comes with a handy program called gekdoc. You can pass a word as an argument like$ gekdoc '!'
or a topic like $ gekdoc :printing
and the list the topics with $ gekdoc :topics
or just $ gekdoc
.
Each word has a pictorial showing its effect on the stack.The word
swap
has ( s1 s2 -- s2 s1 ) showing it swaps two strings.The word
+
has ( n1 n2 -- n3 ) showing it takes two numbers and returns a number.The word
#$
has ( s... c -- ) showing it takes zero or more strings plus a count of how many and doesn't return anything.
To get more on the notation you can see $ gekdoc :notation
, I'm going to start showing each word's stack effect when I introduce them.
If Statements
If we want our program to make a decision, we have if statements that look like this:if <code to run if true> then
The word if
( f -- ) takes a flag, which is just a number argument representing true if it's non-zero or false if it's zero. If the flag is true, the words between if
and then
are executed, otherwise they're skipped.
Here's a program that only prints the first three lines: n 4 < if r $ then
The word <
( n1 n2 -- f ) returns whether the first number argument is less than the second by returning "1" or "0". There's also <=
, >
, >=
, =
and !=
.
To do something else if the flag was false, there's the word else
: if <code to run if true> else <code to run if false> then
This program prints the first three lines and has a breakdown: n 4 < if r $ else "oh no" $ then
We could factor out the $
: n 4 < if r else "oh no" then $
Ok ok ok, the condition kinda coming at the start goes with the whole stack thing, but why's it then
at the end, how's that meant to read?The
if
else
then
syntax actually harks back to the 1970s and the original stack-based language called FORTH. I've always read the then
to mean "in that case" like "If so, why don't you do the thing then?" or simply "If so, do the thing then".
The words and
( n1 n2 -- n1 | n2 ) and or
( n1 n2 -- n1 | n2 ) let us express complex conditions. Technically and
returns n2 if n1 is non-zero else n2 (zero), and or
returns n1 if n1 is non-zero else n2. But what really matters is and
returns true if they're both true, and or
returns true if either of them is true. This program prints lines two and four: n 2 = n 4 = or if r $ then
Index and Integer Arguments
We've metfields
( -- s... c ) that returns all the fields, there's also f
( i -- ) who takes an index argument and returns the field at that index.Indices always start at zero, so 0 for the first field, 1 for the second field...
And if they're negative they go from the other end, so -1 for the last field, -2 for the second last field, and so on.
Here's an example to ignite your zeal, it prints each line's second field quadrupled:
1 f 4 * $
Both index arguments and integer arguments are an i in stack effects. An integer is a positive or negative whole number, we call it an index if it specifies the <i>th something. Like counts and flags, they're just a common idea and make stack effects more descriptive.
String Words
I'll rattle through some of the basic words for working with text, they'll show up enough in our examples.is
( s1 s2 -- f ) returns whether two strings are the same, and aint
( s1 s2 -- f ) whether they're different. Note the difference between "0.0" "0" =
and "0.0" "0" is
.
starts
and ends
( s1 s2 -- f ) return whether s1 starts/ends with s2.
says
( s1 s2 -- f ) returns whether s1 contains an occurrence of s2: B "waffle" "ff" says $ ;
Like a couple words we'll see, says
sets where
( -- i ) to give the position of the first occurrence: B "waffle" "ff" says if where $ else "nowhere" $ then ;
nl
( -- s ) returns a string that's just a newline, and tab
( -- s ) a string that's just a tab. It's common to print a lone newline with nl !
.
len
( s -- c ) returns the number of characters in a string.
cut
( s1 i -- s2 ) cuts off characters from a string. It removes i characters from the start if i is positive: "hello" 2 cut -> "llo"
Or i characters from the end if i is negative: "hello" -2 cut -> "hel"
CUT
( s1 i -- s2 ) cuts out i characters, removing everything but what cut
would: "hello" 2 CUT -> "he"
"hello" -2 CUT -> "lo"
Using them together you can get any substring: "0123456789" 2 cut 4 CUT -> "2345"
,
( s1 s2 -- s3 ) joins two strings. "sup" "lads" -> "suplads"
#,
( s... c -- s1 ) joins the strings of a list. "ba" "na" "na" 3 -> "banana"
join
( s... c -- s ) joins the strings of a list with a space between each. "a" "b" "c" 3 -> "a b c"
$join
( s... c s1 -- s2 ) joins the strings of a list with a string s1 between each. "a" "b" "c" 3 "::" -> "a::b::c"
split
( s1 -- s... c ) returns a list of s1 split at whitespace. "a b c" -> "a" "b" "c" 3
$split
( s1 s2 -- s... c ) returns a list of s1 split at occurrences of s2. "a::b::c" "::" -> "a" "b" "c" 3
SUB
( s1 s2 s3 -- s4 ) replaces the first occurrence of s2 in s1 with s3: r "day" "night" SUB $
We'll meet the more powerful sub
( s1 re s2 -- s3 ) when we get to regular expressions.
Reading and Writing Files and Handling Errors
The most important file operations are reading and writing. The wordread
( s1 -- s2 ) returns the contents of the file at filepath s1, so this program replaces lines like "include myfile" with the file's contents: r "include" starts if 1 f read $ else r $ then
The word WRITE
( s1 s2 -- ) writes the string s1 to the file at filepath s2, creating the file if it doesn't exist. The word write
( s1 s2 -- ) is the same but writes s1 plus a newline, which is usually what you want as non-empty text files should end with a newline. There's also APPEND
and append
that append to files rather than overwriting them.
Say we've made a speech recognition thingy that continuously prints possible transcriptions of sentences like this: 1:don't cremate you're nutritious
2:don't crummy your new trashes
1:vets make exchange in sync with the fishes
2:lots make change and cheap with efficient
3:let's make a change and sleep with the fishes
We could write a filter to keep a file up to date with the most recent sentence's possibilities and print the top ones: r "1:" starts if "" "transcriptions" WRITE r 2 cut $ then
r 2 cut "transcriptions" append
The project that inspired this example pipes the output to a command who types it and uses the file for a menu to switch to another possibility.
What if words no work?Well, some words like
read
throw an error terminating the program if they're unsuccessful, but there's usually a variant of them to handle the unsuccessful case. For example the word read?
( s1 -- s2 1 | 0 ) returns the file's contents and a "1" if it's successful, or just a "0" on its own if it's unsuccessful. The variable number of return values means there's no guff value in the unsuccessful case.
Here's our "include myfile" example now skipping bad files: 0 f "include" is if
1 f read? if $ then
else
r $
then
Like a lot of it's friends, read?
also has the word apologize
( -- ) print a message explaining why it failed: 0 f "include" is if
1 f read? if $ else apologize then
else
r $
then
(The else
clause isn't really necessary because apologize
prints nothing in the successful case.)
Exiting
If we want to throw our own errors or just end the program early, we've gotexit
( -- ). This program stops printing the lines if one is "quit": E "bye" $ ; r "quit" is if exit then r $
Remember E
is just a word and needs to execute, so this wouldn't say farewell if the first line was "quit": r "quit" is if exit then r $ E "bye" $ ;
I normally put E
at the end only because it looks cool.
The word code
( i -- ) lets us change the return code gek gives on termination. As you know, a return code of zero indicates success and non-zero failure, so this program returns successfully if it finds happiness: B 1 code ; r "happiness" says if 0 code then
You could try: $ gek 'B 1 code ; r "happiness" says if 0 code then' && echo success || echo failure
Shell Words
Gek is very charismatic and good at working with other programs. The wordshell
( s -- ) executes commands through the system shell (sh, dash, bash) printing any output and waiting for it to finish.
This program shells lines that start with an exclamation mark: r "!" starts if r 1 cut shell else r $ then
Try giving it a line like: !date
We can retrieve the return code with rc
( -- i ), like this program that uses the test command to know if it's connected to a terminal: B "test -t 1" shell rc if "NOT" . then "connected to a terminal" $ ;
You could try running it connected to another program like this: $ gek 'B "test -t 1" shell rc if "NOT" . then "connected to a terminal" $ ;' | rev
The word shellout
( s1 -- s2 ) is same as shell
but returns the output as a string. This one shouts your name: B "whoami" shellout upper $ ;
The word $shell
( s1 s2 -- ) is the same as shell
but the command gets the string s1 as its input. This one runs the Unix bc calculator for each line that starts with an exclamation mark: r "!" starts if r 1 cut "bc" $shell else r $ then
Try giving it a line like: !(1 + 2) * 2 ^ 2(You might need to install bc, some distros don't come with it although it's specified by POSIX.) Finally, there's
shelljoin
( s... c -- s1 ) for shell-escaping strings. It removes any special meaning of the characters so a Unix shell will take them literally. It takes a list so you can conveniently make a string that represents the list in the eyes of a Unix shell. Here's an example that creates directories for each field: "mkdir " fields shelljoin , shell
If it was join
the user would be able to run something nefandous with a line like "; echo willy".
Bump
The wordbump
( -- ) has the next input read in prematurely.
This joins pairs of lines: r ! ":" ! bump r $
And this prefixes the first line to the rest: B bump r ; dup ! r $
The word dup
( s -- s s ) duplicates the top element on the stack.
Everyone agrees bump
is the best word.
Loops
Loops let us repeat or a loop over something, just like how Gek loops over the lines of input. Together with if statements, loops let us write any algorithm, any set of instructions. The wordfor
( n -- ) takes a number argument and executes the words up to loop
that many times. This program prints the first line once, the second twice, the third thrice, and so on: n for r $ loop
The word i
gives the loop's so-called index, which is a counter that increments with every loop: B 10 for i ! loop nl ! ;
As an aside, the word .
( s -- ) prints a string and a space: B 10 for i . loop nl ! ;
The word foreach
( s... c -- ) takes a list of elements and makes the next element accessible with each iteration, terminating when they've all had their turn. The element is accessible with x
( -- s ), so here's one that prints each field in parentheses: fields foreach "(" ! x ! ")" . loop nl !
If we wanted more fields per parentheses we've got X
( -- s ) who bumps the loop's element along before returning it: fields foreach "(" ! x . X . X ! ")" . loop nl !
If we just want to loop, we've got forever loops. This will greet you indefinitely: B forever "hello" $ 1 sleep loop ;
The word while
( f -- ) jumps out of a loop if its flag is false. This program squashes each paragraph onto a line: forever r len while r . bump loop nl !
Try it on a file, it's too confusing seeing what you're typing and what it's printing.
Same goes for this one that joins lines which end with a backslash: forever r "\" ends while r -1 cut ! bump loop r $
The words i
and while
work with any kind of loop.
Individual Input File Words
We're now more than just friends withB
and E
that run at the beginning and end of the program, but there's also FB
and FE
for the beginning and end of each input file. This program prints the last ten lines per file: r depth 10 keep FE depth #$ ;
The word N
( -- n ) returns the line number like n
but of the current input file, so this prints the line counts of each file: FE N $ ;
We could print the filepaths alongside using file
( -- s ): FE file . N $ ;
Variables and Mappings
The wordset
<name> ( s -- ) takes an argument and defines a word that returns copy of it. The <name> notation shows set
steals the next token, in this case for the word's name. We call the new word a variable and it has the stack effect <Variable name> ( -- s ).
Here's the prefixing example rewritten using a variable: B bump r set Prefix ; Prefix ! r $
Here's one showing we can keep redefining them: B "" set Longest ; r len Longest len > if r set Longest then E Longest $ ;
At the top of bigger programs you'll often name some filepaths or the like for ease and clarity. I name my variables with a capital letter but it's just convention.
Along with variables we have collections of keys and corresponding values called mappings. They too let us assign values to names but unlike variables the keys are strings and can come from anywhere. A mapping first needs created with the word mapping
<name> ( -- ), after which map
<Mapping name> ( s1 s2 -- ) assigns the value s1 to the key s2 of the mapping. The name of the mapping becomes a word <Mapping name> ( s1 -- s2 ) that returns the corresponding value of the key s1, or an empty string if the key is unassigned.
Here's an example to figure out: B
mapping Action
"press a" "alpha" map Action
"press b" "bravo" map Action
"press c" "charlie" map Action
"transcribe" "transcribe" map Action
"run ./menu" "transcriptions" map Action
;
0 f "said" is if 1 f Action $ then
I usually name my mappings so the keys sound good before the word. In the example, the alpha action is "press a", which you could get with "alpha" Action
. If I had a mapping of animals and foods, I'd call it Food so you could go "cat" Food
.
Mappings have a little lexicon dedicated to working with them:in
<Mapping name> ( s -- f ) returns whether a mapping has a key s.keysof
<Mapping name> ( -- s... c ) returns the keys of a mapping.valuesof
<Mapping name> ( -- s... c ) returns the values of a mapping.tot
<Mapping name> ( -- c ) returns the number of keys a mapping has.
Here's an example that prints lines if they haven't been seen before: B mapping Seen ;
r in Seen 0 = if r $ then
"guff" r map Seen
Extra Stacks
We can have additional stacks for collecting elements or holding a buffer of elements to work through. We create them withstack
<name> ( -- ), after which push
<Stack name> ( s -- ) puts an element on it and pop
<Stack name> ( -- s ) takes an element from it. The stack name returns copies of all its elements and has the stack effect <Stack name> ( -- s... c ).
This program prints any lines starting with FIXME then any starting with TODO: B stack Todos ;
r "FIXME" starts if r $ then
r "TODO" starts if r push Todos then
E Todos #$ ;
This one prints the lines side by side with those of a file: B stack Lines "myfile.txt" read nl $split #push Lines ;
r ! ":" ! POP Lines $
The word #push
<Stack name> ( s... c -- ) puts multiple elements on the top of a stack, and the word POP
<Stack name> ( -- s ) takes an element from a stack but from the bottom.
Similar to mappings the word in
<Stack name> ( s -- f ) returns whether a stack has an element s, and tot
<Stack name> ( -- c ) the number of elements.
"The stack" we know and love has the name Stack
( -- s... c ). Like all stack names it returns copies of all its elements so we could print the stack without losing the elements using Stack #$
.
Command-line Arguments
The command-line arguments passed to gek are stored on a stack calledArgs
.
You can probably guess what this prints: $ gek 'B Args sum $ ;' 1 2 3 4
These arguments are consumed as each file is begun, and we can even thieve arguments or add files to process.
This program prints the last lines of each file, the first argument specifing how many lines: B POP Args set Num ;
r depth Num keep
FE depth #$ ;
This one recursively prints dependencies based off lines saying "require <filepath>": r "require" starts if 1 f push Args then FE file $ ;
Regular Expressions
Regular expressions feature in lots of utilities, languages and text editors, they describe patterns of text. If you're not already familiar, I'd recommend learning them with the grep and sed commands. Gek's regular expressions have some of the commonplace additions that first appeared in the Perl programming language. These are the ones I care about: \s whitespace character
\S non-whitespace character
\b word boundary
x*? zero or more x, prefer fewer
x+? one or more x, prefer fewer
x?? zero or one x, prefer zero
(?:mypattern) non-capturing group
(?i) make patterns case-insensitive
(?m) multi-line mode so ^ and $ match the start/end of lines (in addition to the start/end of the text)
(?s) have . match newlines
But I'll recap if an example uses any of them except the top three. You can find everything here: https://github.com/google/re2/wiki/Syntax
The word meets
( s re -- f ) returns whether s matches an occurrence of the regular expression re. This program prints lines containing stack effects: r "\(.*--.*\)" meets if r $ then
Like we saw with says
, it sets where
( -- i ) to the position of the first occurrence: B "drop ( s -- ) discards an element" "\(.*--.*\)" meets if where $ else "nowhere" $ then ;
It also sets match
( -- s ) to return the first matched text: B "drop ( s -- ) discards an element" "\(.*--.*\)" meets if match $ else "none" $ then ;
The word sub
( s1 re s2 -- s3 ) replaces text matched by a regular expression much like sed's s/this/that/. It returns s1 the with the first match of the regular expression re replaced using s2.
We could replace the first occurrence of day with night: r "day" "night" sub $
Or the first one or more digits with CLASSIFIED: r "\d+" "CLASSIFIED" sub $
The replacement can refer to the matched text with ${0}, so we could surround our number with some arrows like this: r "\d+" "->${0}<-" sub $
We can also refer to capturing groups with ${1} for the first, ${2} for the second and so on. This one swaps the first words separated by a colon: r "(\S+):(\S+)" "${2}:${1}" sub $
The variant gsub
replaces all the matches and #sub
replaces a specific match.
The word extract
( s1 re -- s... c ) returns all the matches of a regular expression: B "Hi {{ name }}, eat {{ thing }}." "{{\s*\S+\s*}}" extract ? ;
[3] "{{ name }}" "{{ thing }}" "2"
If the regular expression has capturing groups, just their matches are returned: B "Hi {{ name }}, eat {{ thing }}." "{{\s*(\S+)\s*}}" extract ? ;
[3] "name" "thing" "2"
On the flip side, punch
( s1 re -- s... c ) returns all non-matched parts: B "Hi {{ name }}, eat {{ thing }}." "{{\s*\S+\s*}}" punch ? ;
[4] "Hi " ", eat " "." "3"
If the regular expression has capturing groups, their matches are also returned: B "Hi {{ name }}, eat {{ thing }}." "{{\s*(\S+)\s*}}" punch ? ;
[6] "Hi " "name" ", eat " "thing" "." "5"
You can use punch
for fancy substitutions where you execute words on the matched parts: $ echo "where _is_ the _ham_" | gek 'r "_(.*?)_" punch foreach x ! X upper ! loop nl !'
The .*? matches zero or more of any character but prefers fewer, you could try just .* as well.
In Place Editing
How can we actually apply our programs to the input files? If it's one input file, we could do it with our shell:$ output="$(gek 'n . r $' myfile)"; printf %s\\n "$output" > myfile
(The variable is needed because shell redirection WIPES(!) the file first as with $ cat myfile > myfile
)
But for ease Gek has an option to edit the input files in place. That means, what would have been printed for each file becomes the contents of each file, so be sure to make backups. This would prefix each line of each file with the total line number: $ gek -i 'n . r $' myfile myfile2
Legend has it, Gek's source code was once an ignominious mix of snake_case and CamelCase but one day he cleaned himself up with a program like this: r "(\b(?:[a-z]+_[a-z]+)+\b)" punch
foreach
x ! X "_" " " GSUB title " " "" GSUB !
loop nl !
The regular expression cautiously matches whole tokens that are lowercase and include an underscore, using a non-capturing group (?:...) to apply + to a grouping without having it be punched out. The word title
( s1 -- s2 ) capitalizes the first letter of each word in a string.
Gekdoc Example
It'd be shameful if gekdoc wasn't written in Gek. We now know enough to have a look at it. Gekdoc scans the documentation files for listings of the words you pass. If we give gekdoc a word like pop it shows listings forpop
, pop?
, #pop
, POP
, POP?
and #POP
, by creating a regular expression. You can also pass multiple words at the same time.
Here's the essence of it: #! /usr/bin/env -S gek -f
B
stack Patterns
Args foreach
"(?i)^#?\$?" x escape "\??\s.*\(.*--.*\)$" 3 #, push Patterns
loop
POP? Patterns if set Pat else EXIT then
stack WordFiles "/usr/share/doc/gek/words" entries as WordFiles WordFiles as Args
;
# If the current line meets the pattern, print lines until the next stack effect, and repeat.
forever r Pat meets while
forever r $ bump? while r "\(.*--.*\)$" meets until loop
loop
FE
# If finished the files, go through them again looking for the next pattern.
tot Args 1 = if
POP? Patterns if set Pat WordFiles as Args then
then
;
Straight away it makes a stack of the patterns it'll look for, one for each command-line argument. The regular expression is made from two string literals with the command-line argument sandwiched in between. The (?i) makes it case-insensitive so it will match both pop and POP. The #? $? and \?? allow optional # $ ? symbols and the rest looks for a stack effect. I've put escape
( s -- re ) after x
to remove the meaning of any special characters so it matches itself literally.
Down a bit it makes a stack of the filepaths in /usr/share/doc/gek/words and sets them to be its input files. We can see why it makes a stack of them looking at the FE
block. I've been very anal and instead of looping through the patterns each file, I've made it go through all the files for each pattern so the order of listings is the same as the order of arguments.
The main loop in the middle prints lines from a match till a line with a stack effect that doesn't match. It uses bump? while
instead of bump
so it breaks out the loop if it reaches the end of the file. The word until
( f -- ) is the opposite of while
, breaking out the loop if the flag is true, like 0 = while
. The outer forever loop is needed so the lines that trigger until
are themselves processed. And that's that.
Macros
We can name a group of words for readability and reusability. This snippet defines a word called increment that increments a number by one:: increment 1 + ;
More generally the layout is: : <name> <code to expand to> ;
These new words get called macros.
This example defines a macro called sqr and uses it to print the first and last fields squared: : sqr dup * ;
0 f sqr . -1 f sqr $
Here's a macro from one of my programs that moves the terminal's cursor: # ( column row -- ) Moves the cursor to the coordinate.
: at "\033[" es ! ! ";" ! ! "H" ! ;
You can see why I wrapped that up with a name. I'll show you the program at the end and cover es
, but you can enter this program some coordinates to play with the macro: : at "\033[" es ! ! ";" ! ! "H" ! ;
r snap at "hi" !
The word snap
( s -- s1 s2 ) splits a string in two at the first whitespace.
Brace Example
Let's look at a program I use day to day that formats the braces used in C-like programming languages nicley. Given this: func pushString(s string) { push(Str(s)) }
func pushStrings(strs []string) { for _, s := range strs { pushString(s) } }
It spits out: func pushString(s string) {
push(Str(s))
}
func pushStrings(strs []string) {
for _, s := range strs {
pushString(s)
}
}
This is a deceivingly tricky problem with how the braces can nest in each other. My program repeatedly jumps to the next { and cascades any } that were behind the {. The } decrement a variable holding nesting/indentation level and the } increment it. The { are printed indented with a newline after them.
Here it is: #! /usr/bin/env -S gek -f
# ( -- ) Prints indentation.
: indent tab Nesting scale ! ;
# ( s -- ) Prints s indented and a newline, unless s is empty.
: indented$ dup len if indent $ else drop then ;
# ( s1 -- s2 )
# Takes a string like "a } b } c",
# prints:
# a
# }
# b
# }
# and returns "c".
: cascade
forever lstrip "}" $snap? while
swap rstrip indented$
1 -set Nesting "}" indent $
loop
;
# Use the first line's indentation as the starting point.
1 n = if r "^\t+" meets drop match len set Nesting then
r
forever "{" $snap? while
swap cascade
rstrip indent . "{" $ 1 +set Nesting
loop
cascade indented$
At the top I've defined a couple helper macros for printing indentation to match the nesting level. The word Nesting hasn't been defined yet but it will be before the macros are encountered.
The cascade macro helps break the problem down, it's used to deal with the } skipped when we jump to the next {. The words lstrip
and rstrip
remove whitespace from the left/right of a string.
It first sets Nesting to the number of leading tab characters on the first line. This acts as the baseline indentation in case the code is already indented. The word match
returns an empty string if there was no match, so meets drop match len
returns the length the match or zero if there wasn't one.
Then the biggie. It snaps each line at each { and passes what was behind each { to cascade. Cascade prints the part up to the last } nicely while decrementing Nesting, and returns the other bit that was between the last } and the {. This bit is then printed indented with a { plus a newline, and Nesting is incremented.
When there are no { to snap, it breaks out the loop and cascades any remaining } and also prints anything after the last }.
Cool beans.
Defining Fields and Records
Up to now programs have run for every line of input, but more generally programs run for every record of input, and we can define what a record is. The wordseparator
( re -- ) defines what terminates a record instead of the usual newline. This program splits lines at semicolons: B ";|\n" separator ; r $
There's also record
( re -- ) that defines what a record is directly. This program prints how many numbers (groups of consecutive digits) there are in the input: B "\d+" record ; E n $ ;
There's a subtle difference between ";" separator
and "[^;]+" record
, with the former producing empty records with consecutive separators. If we did "[^\n]+" record
, making a record one or more characters except newline, a blank line wouldn't be a record.
It's same idea for fields with delim
( re -- ) defining what terminates a field and field
( re -- ) defining what a field is directly. By default fields are defined by "\S+" field
so you never get an empty one.
Command-line Parsing
There are two words intended for parsing command-line arguments. The wordvarparse
<Mapping name> ( s... c -- s... c ) looks for strings like "key=value" in its arguments and assigns the results to a mapping. It returns the strings that didn't have equal signs.
Here's a little templating command that fills in {{ placeholders }}: B mapping Replacement Args varparse Replacement as Args ;
r "{{\s*(\S+)\s*}}" punch foreach x ! X Replacement ! loop nl !
You could run it like: $ printf %s\\n 'Hi {{ name }}!' "Why not do what you love like {{ loves }}!" | gek -f prog.gek -- name=John loves='the smell of the wind'
Or with files like: $ gek -f prog.gek -- name=John loves='the smell of the wind' myfile myfile2
The word optparse
<Mapping name> ( s... c s1 -- s... c ) is for parsing command-line options like -v and --verbose. Its s1 argument specifies the allowed options and whether they take an argument. For example, "help? e: f:" would specify an option --help doesn't an argument, and options -e and -f that do. The rest are the list of options and positional arguments to parse, which in a horrible case might look like: "myfile" "-r" "--verbose" "myfile2" "-aJx" "--level" "3" "--debug=true" "-omyfile" 9
The option results are assigned to the mapping and the positional arguments are returned.
Here's a simple version of the Unix tail command: B
mapping Opt
10 "n" map Opt
Args "n:" optparse Opt as Args
;
r depth "n" Opt keep
FE depth #$ ;
You could run it like: $ gek -f tail.gek -- -n3 myfile myfile2
As is the norm, passing -- to gek says to treat the following arguments as positional arguments even if they looks like options. We need this to stop gek thinking the options to our programs are for itself. Gek also has an option -F to specify a program file and end the options.
If we made our tail command executable and gave it the shebang: #! /usr/bin/env -S gek -F
We could run it like: $ ./tail -n3 myfile myfile2
Bytes and Unicode
We've been thinking of abstract characters although deep down we know files and output are data. It might hurt but strings are actually sequences of bytes of data and some words don't even care what they represent. For example,is
compares the bytes indifferent of what they represent, while $
outputs them without remorse.
Words that do take the bytes to represent text use the UTF-8 encoding to map the bytes to characters, likely the same encoding your terminal uses to render data. UTF-8 is a superset of ASCII able to represent all the characters of Unicode and the dominant encoding on the web.(*Technically the bytes map to atomic units called Unicode code points but I'm not gonna fret.) Remember though, strings don't need to be valid UTF-8. Besides reading input, we can embed arbitrary bytes with the word
es
( s1 -- s2 ) who substitutes escape sequences: B "H\n\tE\n\t\tL\n\t\t\tL\n\t\t\t\tO" es $ ;
My macro for moving the cursor uses es
to embed the ESC character which terminals interpret specially (ANSI escape sequence): : at "\033[" es ! ! ";" ! ! "H" ! ;
The most important thing is Gek can handle UTF-8 input, the modern standard.
Fluidity
We'll wrap up with a few powerful words that change Gek. The wordsPRENUM
and POSTNUM
let us change what represents a number. They're events that take code up to a ;
like B
and E
.
PRENUM
blocks execute before each number argument is taken from the stack. To give an example, many countries use a decimal comma (1,23) instead of a decimal point (1.23), we could replace commas with periods so they both work: PRENUM "," "." SUB ; fields sum $
If we wanted return values use commas, POSTNUM
blocks execute on each returned number: PRENUM "," "." SUB ; POSTNUM "." "," SUB ; fields sum $
The TOKEN
event lets us intercept the code itself. Tokens here are what we call each a bit of code, maybe a word name, number or string literal, that Gek executes one after another. Before each token is evaluated, the token is put on the stack, the TOKEN
block executes, and the new top value is taken to be the token. This example ignores underscores in tokens: TOKEN "_" "" gsub ; f_ie_lds s_um_ _$_
A big brained use of this fluidity is creating syntaxes that can become programatic. Imagine a game that interprets levels from text files, the levels would usually be limited to words declaring the start, obstacles, checkpoints, etc., but say the hub level for downloading levels could harness all of Gek.
Here's an example of that without the fun game bit: B
"mylevel" set LevelFile
: beep "BEEP" $ ;
: boop "BOOP" $ ;
: programmatic 1 set Programmatic ;
1 set Programmatic
0 set InMacro
TOKEN
set Token
Programmatic InMacro or if
Token
else
Token "beep" "boop" "programmatic" 3 within assert
1 set InMacro Token eval 0 set InMacro
"" # Skip the token, we've already evaluated it.
then
;
"0 set Programmatic " LevelFile read , eval
;
Here's a valid level: beep boop boop
Another valid level: programmatic beep boop boop 5 for beep loop
But not this: beep boop boop 5 for beep loop
I've defined three macros for the level syntax, the placeholders beep and boop, and programmatic who should enable all Gek's words. In the real one you can imagine the programatic word would throw an error if --programatic wasn't passed. The TOKEN
code just passes the token untouched if Programmatic is true, but otherwise complains if the token isn't a macro name unless it's already evaluating a macro's code. It keeps track of whether it's evaluating a macro's code by setting a variable before and after evaluating macro names with eval
, after which it returns an empty string so the token isn't evaluated twice.
It might be a weird niche but I don't know a language better suited for this kind of thing than Gek.
And finally finally finally, the word yank
( -- s ) is used in macros to steal the next token after the macro, so you can define your own words like set
and map
.
Example 1: : plus yank + ; B 3 plus 4 $ ;
Example 2: : ( forever yank ")" is until loop ; B 1 $ ( 2 $ ) 3 $ ( 4 $ ) 5 $ ;
In our game example it'd let us define a syntax where arguments come after words like "background: blue".
The End
You made it, that's Gek. Gek's old school stack approach makes it simple yet expressive, it's got the ability of a general-purpose language without the drag. It's a bunch of words. Well, time to say goodbye. I hope you've enjoyed this voyage of a tutorial and find Gek useful.$ gek 'E "bye" $ ;'
But seeing as I'm a man of my word, here's my cursor moving program, it's snake: #! /usr/bin/env -S gek -F
# ( coordinate -- ) Moves the cursor to the coordinate "<row> <column>".
: at "\033[" es ! " " ";" sub ! "H" ! ;
# ( s -- ) A wrapper around the stty command for changing terminal settings.
: stty "stty " swap , shell ;
# On termination, reset the terminal to sane settings.
PE "sane" stty ;
B
# Controls
"w" set Up "a" set Left "s" set Down "d" set Right
# Arena Dimensions
16 set Width 16 set Height
# Characters For Shapes
"#" set Border " " set Blank "@" set Skin "A" set AppleSkin
# Clear the screen and move the cursor to the top left and hide it.
"\033[2J\033[H\033[?25l" es !
# At the end, unhide the cursor and move it below the output.
PE "\033[?25h\033[" es ! Height 3 + ! "H" $ ;
# Print the arena and the controls.
Border Width 2 + scale $
Height for Border ! Blank Width scale ! Border $ loop
Border Width 2 + scale $
"up:" . Up . "left:" . Left . "down:" . Down . "right:" . Right $
# Put the snake coordinates on the stack.
Height 2 / 2 + Width 2 / 1 + 2 join
Height 2 / 1 + Width 2 / 1 + 2 join
Height 2 / Width 2 / 1 + 2 join
# Draw the snake.
Stack foreach x at Skin ! loop
# At the end, draw a pulsing snake animation.
E
Stack foreach x at Blank ! 0.075 sleep loop
Stack foreach x at Skin ! 0.075 sleep loop
;
: SpawnApple
forever
random Height 2 - mod 2 + random Width 2 - mod 2 + 2 join set Apple
Apple in Stack while
loop
Apple at AppleSkin !
;
# Have the terminal send each character rather than lines.
# (and don't display the the characters and allow signals like Ctrl+C)
"raw min 1 isig -echo" stty
# Await a keypress.
"dd bs=1 count=1 2>/dev/null" shellout set Key
-1 set Vertical
0 set Horizontal
SpawnApple
# Have the terminal send nothing if a key is not pressed.
"raw min 0 time 2 isig -echo" stty
forever
Key Up is Vertical 0 = and if -1 set Vertical 0 set Horizontal then
Key Down is Vertical 0 = and if 1 set Vertical 0 set Horizontal then
Key Left is Horizontal 0 = and if 0 set Vertical -1 set Horizontal then
Key Right is Horizontal 0 = and if 0 set Vertical 1 set Horizontal then
# Add a new head and game over if it collides with the border or body.
dup split Vertical Horizontal 2 #+ set HeadX set HeadY HeadY HeadX 2 join set Head
HeadY 2 < HeadX 2 < or HeadY Height 1 + > or HeadX Width 1 + > or if exit then
Head in Stack if exit then
Head at Skin !
Head
Apple in Stack if
SpawnApple
else
# Erase the tail.
depth depth #rot at Blank !
then
"dd bs=1 count=1 2>/dev/null" shellout set Key
loop
;