The Gek Tutorial

Introduction

Gek is a simple programming language for manipulating text and files. It's used on the command-line and in scripts to get what you want. This tutorial covers the language like a little book but is for the Unix user already familiar with the command-line and regular expressions.

Installation

Gek has only just released and might not be in your distro's package manager yet. If it's not there, it's easy to install from the source. All you need is Git and the Go programming language compiler, which are very likely in your package manager. Then run: $ git clone https://git.sr.ht/~geb/gek && gek/install.sh

Big First Steps

Gek programs usually take in input and spit out output. The programs are sequences of words that each do an action, and without doing anything special the words are executed for each line of input. Here's a program that prints each line UPPERCASE and prefixed with the line number: n ! r upper $ It's got five words, n gives us the number and ! prints it, r gives us the line, upper makes it uppercase and $ prints it plus a newline. The r stands for record. We can run it on the command-line like this: $ gek 'n ! r upper $' That'll have gek read from stdin, but we can pass input files like this: $ gek 'n ! r upper $' myfile myfile2 It's too smooshed, lets have a colon and a space after the line number: n ! ": " ! r upper $ Before I explain what on earth's going on, I'd like to introduce the words B and E. We can execute code at the beginning of the program with B and execute code at the end of the program with E. The code to execute comes after them terminated by a ; so here's our beloved program now printing "hello" at the beginning and "bye" at the end: B "hello" $ ; n ! ": " ! r upper $ E "bye" $ ; Programs that are just B blocks terminate without reading input, so here's a Hello Word: B "hi" $ ; Gek is what's called a stack-based language. A stack in programming is a list of elements where the most recently added element is the first to be removed, like a stack of plates where you add and take from the top. All the words take their arguments from "the stack" and put their return values onto it. When we go r $, the r puts the current line on the stack and $ just takes and prints what's on the top of the stack. If we felt so inclined, we could write n ! r $ as r n ! $. Say the fourth input line is "ham", here's what happens with n ! r $.
n puts "4" on the stack: "4"
! takes the top element off the stack and prints it: <empty>
r puts "ham" on the stack: "ham"
$ takes the top element off the stack and prints it and a newline: <empty> Here's what happens with r n ! $:
r puts "ham" on the stack: "ham"
n puts "4" on the stack: "ham" "4"
! takes the top element off the stack and prints it: "ham"
$ takes the top element off the stack and prints it and a newline: <empty> We could use this program to achieve nothing: r It just puts each line on the stack and doesn't print anything. We can see what's happening with ? who shows us the stack: r ? This program prints the number of input lines: E n $ ; But for illustrative purposes, so does this: B 0 ; 1 + E $ ; When literal numbers are encountered they put their value on the stack, the same goes for the literal text we've seen. The + word takes two arguments and returns them added together. Here zero is our initial value, 1 + increments it every line and $ prints the accumulation at the end. See how 1 + is like little pipeline that takes a number and increments it by one. This program sums the numbers on each line: fields sum $ So running: $ printf %s\\n '3 4' '2 8 0.5' '-1 9' | gek 'fields sum $' We get: 7 10.5 8 The word fields puts the non-whitespace parts of the line on the stack and the count of how many, and sum takes a count and takes and sums that many elements. Let's run through it.
On the first line fields puts "3" and "4" on the stack plus a count of how many: "3" "4" "2"
sum takes the top value, "2", and sums that many elements: "7"
$ takes the top element and prints it and a newline: <empty>
On the second line fields puts "2", "8" and "0.5" on the stack plus a count of how many: "2" "8" "0.5" "3"
And so on. We say words like sum take a list, meaning they take a count argument specifying how many more arguments to take. If we wanted to sum the top three elements we'd use 3 sum: B 4 5 6 3 sum $ ; 15 The fact we specify the count lets us leave values on the stack: B 10 4 5 6 3 sum + $ ; 25 And if we wanted to use the whole stack, there's depth who returns how many elements there are on the stack: B 4 5 6 depth sum $ ; 15 The word #$ prints multiple elements each on their own line: B "+-------------+" "|Stud Lad John|" "+-------------+" 3 #$ ; +-------------+ |Stud Lad John| +-------------+ The word keep takes a list and another count specifying how many to keep: B "a" "b" "c" "d" "e" 5 3 keep depth #$ ; c d e Here's a program that prints the last ten lines of input: r depth 10 keep E depth #$ ; It puts each line on the stack limiting it to ten elements, and prints what's there at the end. What if we give the + word text?
All the values are actually strings, that is strings of characters, but a word might take them to represent text, a number or regular expression. For example, the + word treats its arguments as numbers: B "123" "456" + $ ; 579 Whereas the word , that joins two strings, treats them as characters: B "123" "456" , $ ; 123456 The values have no intrinsic type, it's the words that matter. There's also no difference between B 123 456 + $ ; and B "123" "456" + $ ; but the number literals are pretty. Alright but what if we give + "ab%3gOt%4!" ?
All the words that take a number argument determine the number it represents the same way, they skip whitespace and give up after a character that's not a sign, point or digit. You can see what happens running 0 + $ on a string: "123abc" 0 + -> "123" "abc123" 0 + -> "0" " +-+--+123.456 " 0 + -> "-123.456" And for completeness, string literals can be in `backticks` as well as "double quotes". It's nice having the two so we can quote the quote characters: B `"sup" quoth John` $ ; String literals are truly literal, there's no expansion or escape sequences like in other languages (but we'll see alternatives and Unicode later). To recap, Gek programs consist of words and string literals, and loop over input.
The word r returns the current line and n returns the line number.
The word ! prints a string and $ prints a string plus a newline.
The words B and E execute code at the beginning/end of the program.
The word #$ prints strings on their own lines.
The word fields returns a list of the non-whitespace parts of the line.
The word sum sums a list.
The word depth returns the number of elements on the stack
The word + adds two arguments, and I didn't mention -, * and /.
Importantly the word ? shows the stack.
There are no types, which make sense since the input data doesn't have any types either.
The stack keeps things short and simple. Can you write a program that sums all the numbers of the entire input?

Programs in Files

Instead of passing our programs on the command line, we can store them in files and use the -f option: $ gek -f myprog.gek On Unix we can go further making file executable and adding a shebang line like: #! /usr/bin/env -S gek -f To gek #! is just a word that ignores the rest of the line, so the space after it is important.

Gekdoc and Notation

If you ever want info on a word, Gek comes with a handy program called gekdoc. You can pass a word as an argument like $ gekdoc '!' or a topic like $ gekdoc :printing and the list the topics with $ gekdoc :topics or just $ gekdoc. Each word has a pictorial showing its effect on the stack.
The word swap has ( s1 s2 -- s2 s1 ) showing it swaps two strings.
The word + has ( n1 n2 -- n3 ) showing it takes two numbers and returns a number.
The word #$ has ( s... c -- ) showing it takes zero or more strings plus a count of how many and doesn't return anything. To get more on the notation you can see $ gekdoc :notation, I'm going to start showing each word's stack effect when I introduce them.

If Statements

If we want our program to make a decision, we have if statements that look like this: if <code to run if true> then The word if ( f -- ) takes a flag, which is just a number argument representing true if it's non-zero or false if it's zero. If the flag is true, the words between if and then are executed, otherwise they're skipped. Here's a program that only prints the first three lines: n 4 < if r $ then The word < ( n1 n2 -- f ) returns whether the first number argument is less than the second by returning "1" or "0". There's also <=, >, >=, = and !=. To do something else if the flag was false, there's the word else: if <code to run if true> else <code to run if false> then This program prints the first three lines and has a breakdown: n 4 < if r $ else "oh no" $ then We could factor out the $: n 4 < if r else "oh no" then $ Ok ok ok, the condition kinda coming at the start goes with the whole stack thing, but why's it then at the end, how's that meant to read?
The if else then syntax actually harks back to the 1970s and the original stack-based language called FORTH. I've always read the then to mean "in that case" like "If so, why don't you do the thing then?" or simply "If so, do the thing then". The words and ( n1 n2 -- n1 | n2 ) and or ( n1 n2 -- n1 | n2 ) let us express complex conditions. Technically and returns n2 if n1 is non-zero else n2 (zero), and or returns n1 if n1 is non-zero else n2. But what really matters is and returns true if they're both true, and or returns true if either of them is true. This program prints lines two and four: n 2 = n 4 = or if r $ then

Index and Integer Arguments

We've met fields ( -- s... c ) that returns all the fields, there's also f ( i -- ) who takes an index argument and returns the field at that index.
Indices always start at zero, so 0 for the first field, 1 for the second field...
And if they're negative they go from the other end, so -1 for the last field, -2 for the second last field, and so on.
Here's an example to ignite your zeal, it prints each line's second field quadrupled: 1 f 4 * $ Both index arguments and integer arguments are an i in stack effects. An integer is a positive or negative whole number, we call it an index if it specifies the <i>th something. Like counts and flags, they're just a common idea and make stack effects more descriptive.

String Words

I'll rattle through some of the basic words for working with text, they'll show up enough in our examples. is ( s1 s2 -- f ) returns whether two strings are the same, and aint ( s1 s2 -- f ) whether they're different. Note the difference between "0.0" "0" = and "0.0" "0" is. starts and ends ( s1 s2 -- f ) return whether s1 starts/ends with s2. says ( s1 s2 -- f ) returns whether s1 contains an occurrence of s2: B "waffle" "ff" says $ ; Like a couple words we'll see, says sets where ( -- i ) to give the position of the first occurrence: B "waffle" "ff" says if where $ else "nowhere" $ then ; nl ( -- s ) returns a string that's just a newline, and tab ( -- s ) a string that's just a tab. It's common to print a lone newline with nl !. len ( s -- c ) returns the number of characters in a string. cut ( s1 i -- s2 ) cuts off characters from a string. It removes i characters from the start if i is positive: "hello" 2 cut -> "llo" Or i characters from the end if i is negative: "hello" -2 cut -> "hel" CUT ( s1 i -- s2 ) cuts out i characters, removing everything but what cut would: "hello" 2 CUT -> "he" "hello" -2 CUT -> "lo" Using them together you can get any substring: "0123456789" 2 cut 4 CUT -> "2345" , ( s1 s2 -- s3 ) joins two strings. "sup" "lads" -> "suplads" #, ( s... c -- s1 ) joins the strings of a list. "ba" "na" "na" 3 -> "banana" join ( s... c -- s ) joins the strings of a list with a space between each. "a" "b" "c" 3 -> "a b c" $join ( s... c s1 -- s2 ) joins the strings of a list with a string s1 between each. "a" "b" "c" 3 "::" -> "a::b::c" split ( s1 -- s... c ) returns a list of s1 split at whitespace. "a b c" -> "a" "b" "c" 3 $split ( s1 s2 -- s... c ) returns a list of s1 split at occurrences of s2. "a::b::c" "::" -> "a" "b" "c" 3 SUB ( s1 s2 s3 -- s4 ) replaces the first occurrence of s2 in s1 with s3: r "day" "night" SUB $ We'll meet the more powerful sub ( s1 re s2 -- s3 ) when we get to regular expressions.

Reading and Writing Files and Handling Errors

The most important file operations are reading and writing. The word read ( s1 -- s2 ) returns the contents of the file at filepath s1, so this program replaces lines like "include myfile" with the file's contents: r "include" starts if 1 f read $ else r $ then The word WRITE ( s1 s2 -- ) writes the string s1 to the file at filepath s2, creating the file if it doesn't exist. The word write ( s1 s2 -- ) is the same but writes s1 plus a newline, which is usually what you want as non-empty text files should end with a newline. There's also APPEND and append that append to files rather than overwriting them. Say we've made a speech recognition thingy that continuously prints possible transcriptions of sentences like this: 1:don't cremate you're nutritious 2:don't crummy your new trashes 1:vets make exchange in sync with the fishes 2:lots make change and cheap with efficient 3:let's make a change and sleep with the fishes We could write a filter to keep a file up to date with the most recent sentence's possibilities and print the top ones:

r "1:" starts if "" "transcriptions" WRITE  r 2 cut $ then
r 2 cut "transcriptions" append

The project that inspired this example pipes the output to a command who types it and uses the file for a menu to switch to another possibility. What if words no work?
Well, some words like read throw an error terminating the program if they're unsuccessful, but there's usually a variant of them to handle the unsuccessful case. For example the word read? ( s1 -- s2 1 | 0 ) returns the file's contents and a "1" if it's successful, or just a "0" on its own if it's unsuccessful. The variable number of return values means there's no guff value in the unsuccessful case. Here's our "include myfile" example now skipping bad files:

0 f "include" is if
	1 f read? if $ then
else
	r $
then

Like a lot of it's friends, read? also has the word apologize ( -- ) print a message explaining why it failed:

0 f "include" is if
	1 f read? if $ else apologize then
else
	r $
then

(The else clause isn't really necessary because apologize prints nothing in the successful case.)

Exiting

If we want to throw our own errors or just end the program early, we've got exit ( -- ). This program stops printing the lines if one is "quit": E "bye" $ ; r "quit" is if exit then r $ Remember E is just a word and needs to execute, so this wouldn't say farewell if the first line was "quit": r "quit" is if exit then r $ E "bye" $ ; I normally put E at the end only because it looks cool. The word code ( i -- ) lets us change the return code gek gives on termination. As you know, a return code of zero indicates success and non-zero failure, so this program returns successfully if it finds happiness: B 1 code ; r "happiness" says if 0 code then You could try: $ gek 'B 1 code ; r "happiness" says if 0 code then' && echo success || echo failure

Shell Words

Gek is very charismatic and good at working with other programs. The word shell ( s -- ) executes commands through the system shell (sh, dash, bash) printing any output and waiting for it to finish. This program shells lines that start with an exclamation mark: r "!" starts if r 1 cut shell else r $ then Try giving it a line like: !date We can retrieve the return code with rc ( -- i ), like this program that uses the test command to know if it's connected to a terminal: B "test -t 1" shell rc if "NOT" . then "connected to a terminal" $ ; You could try running it connected to another program like this: $ gek 'B "test -t 1" shell rc if "NOT" . then "connected to a terminal" $ ;' | rev The word shellout ( s1 -- s2 ) is same as shell but returns the output as a string. This one shouts your name: B "whoami" shellout upper $ ; The word $shell ( s1 s2 -- ) is the same as shell but the command gets the string s1 as its input. This one runs the Unix bc calculator for each line that starts with an exclamation mark: r "!" starts if r 1 cut "bc" $shell else r $ then Try giving it a line like: !(1 + 2) * 2 ^ 2
(You might need to install bc, some distros don't come with it although it's specified by POSIX.) Finally, there's shelljoin ( s... c -- s1 ) for shell-escaping strings. It removes any special meaning of the characters so a Unix shell will take them literally. It takes a list so you can conveniently make a string that represents the list in the eyes of a Unix shell. Here's an example that creates directories for each field: "mkdir " fields shelljoin , shell If it was join the user would be able to run something nefandous with a line like "; echo willy".

Bump

The word bump ( -- ) has the next input read in prematurely. This joins pairs of lines: r ! ":" ! bump r $ And this prefixes the first line to the rest: B bump r ; dup ! r $ The word dup ( s -- s s ) duplicates the top element on the stack. Everyone agrees bump is the best word.

Loops

Loops let us repeat or a loop over something, just like how Gek loops over the lines of input. Together with if statements, loops let us write any algorithm, any set of instructions. The word for ( n -- ) takes a number argument and executes the words up to loop that many times. This program prints the first line once, the second twice, the third thrice, and so on: n for r $ loop The word i gives the loop's so-called index, which is a counter that increments with every loop: B 10 for i ! loop nl ! ; As an aside, the word . ( s -- ) prints a string and a space: B 10 for i . loop nl ! ; The word foreach ( s... c -- ) takes a list of elements and makes the next element accessible with each iteration, terminating when they've all had their turn. The element is accessible with x ( -- s ), so here's one that prints each field in parentheses: fields foreach "(" ! x ! ")" . loop nl ! If we wanted more fields per parentheses we've got X ( -- s ) who bumps the loop's element along before returning it: fields foreach "(" ! x . X . X ! ")" . loop nl ! If we just want to loop, we've got forever loops. This will greet you indefinitely: B forever "hello" $ 1 sleep loop ; The word while ( f -- ) jumps out of a loop if its flag is false. This program squashes each paragraph onto a line: forever r len while r . bump loop nl ! Try it on a file, it's too confusing seeing what you're typing and what it's printing. Same goes for this one that joins lines which end with a backslash: forever r "\" ends while r -1 cut ! bump loop r $ The words i and while work with any kind of loop.

Individual Input File Words

We're now more than just friends with B and E that run at the beginning and end of the program, but there's also FB and FE for the beginning and end of each input file. This program prints the last ten lines per file: r depth 10 keep FE depth #$ ; The word N ( -- n ) returns the line number like n but of the current input file, so this prints the line counts of each file: FE N $ ; We could print the filepaths alongside using file ( -- s ): FE file . N $ ;

Variables and Mappings

The word set <name> ( s -- ) takes an argument and defines a word that returns copy of it. The <name> notation shows set steals the next token, in this case for the word's name. We call the new word a variable and it has the stack effect <Variable name> ( -- s ). Here's the prefixing example rewritten using a variable: B bump r set Prefix ; Prefix ! r $ Here's one showing we can keep redefining them: B "" set Longest ; r len Longest len > if r set Longest then E Longest $ ; At the top of bigger programs you'll often name some filepaths or the like for ease and clarity. I name my variables with a capital letter but it's just convention. Along with variables we have collections of keys and corresponding values called mappings. They too let us assign values to names but unlike variables the keys are strings and can come from anywhere. A mapping first needs created with the word mapping <name> ( -- ), after which map <Mapping name> ( s1 s2 -- ) assigns the value s1 to the key s2 of the mapping. The name of the mapping becomes a word <Mapping name> ( s1 -- s2 ) that returns the corresponding value of the key s1, or an empty string if the key is unassigned. Here's an example to figure out:

B
	mapping Action
	"press a" "alpha" map Action
	"press b" "bravo" map Action
	"press c" "charlie" map Action
	"transcribe" "transcribe" map Action
	"run ./menu" "transcriptions" map Action
;
0 f "said" is if 1 f Action $ then

I usually name my mappings so the keys sound good before the word. In the example, the alpha action is "press a", which you could get with "alpha" Action. If I had a mapping of animals and foods, I'd call it Food so you could go "cat" Food. Mappings have a little lexicon dedicated to working with them:
in <Mapping name> ( s -- f ) returns whether a mapping has a key s.
keysof <Mapping name> ( -- s... c ) returns the keys of a mapping.
valuesof <Mapping name> ( -- s... c ) returns the values of a mapping.
tot <Mapping name> ( -- c ) returns the number of keys a mapping has. Here's an example that prints lines if they haven't been seen before:

B mapping Seen ;
r in Seen 0 = if r $ then
"guff" r map Seen

Extra Stacks

We can have additional stacks for collecting elements or holding a buffer of elements to work through. We create them with stack <name> ( -- ), after which push <Stack name> ( s -- ) puts an element on it and pop <Stack name> ( -- s ) takes an element from it. The stack name returns copies of all its elements and has the stack effect <Stack name> ( -- s... c ). This program prints any lines starting with FIXME then any starting with TODO:

B stack Todos ;
r "FIXME" starts if r $ then
r "TODO" starts if r push Todos then
E Todos #$ ;

This one prints the lines side by side with those of a file:

B stack Lines  "myfile.txt" read nl $split #push Lines ;
r ! ":" ! POP Lines $

The word #push <Stack name> ( s... c -- ) puts multiple elements on the top of a stack, and the word POP <Stack name> ( -- s ) takes an element from a stack but from the bottom. Similar to mappings the word in <Stack name> ( s -- f ) returns whether a stack has an element s, and tot <Stack name> ( -- c ) the number of elements. "The stack" we know and love has the name Stack ( -- s... c ). Like all stack names it returns copies of all its elements so we could print the stack without losing the elements using Stack #$.

Command-line Arguments

The command-line arguments passed to gek are stored on a stack called Args. You can probably guess what this prints: $ gek 'B Args sum $ ;' 1 2 3 4 These arguments are consumed as each file is begun, and we can even thieve arguments or add files to process. This program prints the last lines of each file, the first argument specifing how many lines:

B POP Args set Num ;
r depth Num keep
FE depth #$ ;

This one recursively prints dependencies based off lines saying "require <filepath>": r "require" starts if 1 f push Args then FE file $ ;

Regular Expressions

Regular expressions feature in lots of utilities, languages and text editors, they describe patterns of text. If you're not already familiar, I'd recommend learning them with the grep and sed commands. Gek's regular expressions have some of the commonplace additions that first appeared in the Perl programming language. These are the ones I care about: \s whitespace character \S non-whitespace character \b word boundary x*? zero or more x, prefer fewer x+? one or more x, prefer fewer x?? zero or one x, prefer zero (?:mypattern) non-capturing group (?i) make patterns case-insensitive (?m) multi-line mode so ^ and $ match the start/end of lines (in addition to the start/end of the text) (?s) have . match newlines But I'll recap if an example uses any of them except the top three. You can find everything here: https://github.com/google/re2/wiki/Syntax The word meets ( s re -- f ) returns whether s matches an occurrence of the regular expression re. This program prints lines containing stack effects: r "$.*--.*$" meets if r $ then Like we saw with says, it sets where ( -- i ) to the position of the first occurrence: B "drop ( s -- ) discards an element" "$.*--.*$" meets if where $ else "nowhere" $ then ; It also sets match ( -- s ) to return the first matched text: B "drop ( s -- ) discards an element" "$.*--.*$" meets if match $ else "none" $ then ; The word sub ( s1 re s2 -- s3 ) replaces text matched by a regular expression much like sed's s/this/that/. It returns s1 the with the first match of the regular expression re replaced using s2. We could replace the first occurrence of day with night: r "day" "night" sub $ Or the first one or more digits with CLASSIFIED: r "\d+" "CLASSIFIED" sub $ The replacement can refer to the matched text with ${0}, so we could surround our number with some arrows like this: r "\d+" "->${0}<-" sub $ We can also refer to capturing groups with ${1} for the first, ${2} for the second and so on. This one swaps the first words separated by a colon: r "(\S+):(\S+)" "${2}:${1}" sub $ The variant gsub replaces all the matches and #sub replaces a specific match. The word extract ( s1 re -- s... c ) returns all the matches of a regular expression: B "Hi {{ name }}, eat {{ thing }}." "{{\s*\S+\s*}}" extract ? ; [3] "{{ name }}" "{{ thing }}" "2" If the regular expression has capturing groups, just their matches are returned: B "Hi {{ name }}, eat {{ thing }}." "{{\s*(\S+)\s*}}" extract ? ; [3] "name" "thing" "2" On the flip side, punch ( s1 re -- s... c ) returns all non-matched parts: B "Hi {{ name }}, eat {{ thing }}." "{{\s*\S+\s*}}" punch ? ; [4] "Hi " ", eat " "." "3" If the regular expression has capturing groups, their matches are also returned: B "Hi {{ name }}, eat {{ thing }}." "{{\s*(\S+)\s*}}" punch ? ; [6] "Hi " "name" ", eat " "thing" "." "5" You can use punch for fancy substitutions where you execute words on the matched parts: $ echo "where _is_ the _ham_" | gek 'r "_(.*?)_" punch foreach x ! X upper ! loop nl !' The .*? matches zero or more of any character but prefers fewer, you could try just .* as well.

In Place Editing

How can we actually apply our programs to the input files? If it's one input file, we could do it with our shell: $ output="$(gek 'n . r $' myfile)"; printf %s\\n "$output" > myfile (The variable is needed because shell redirection WIPES(!) the file first as with $ cat myfile > myfile) But for ease Gek has an option to edit the input files in place. That means, what would have been printed for each file becomes the contents of each file, so be sure to make backups. This would prefix each line of each file with the total line number: $ gek -i 'n . r $' myfile myfile2 Legend has it, Gek's source code was once an ignominious mix of snake_case and CamelCase but one day he cleaned himself up with a program like this:

r "(\b(?:[a-z]+_[a-z]+)+\b)" punch
foreach
	x ! X "_" " " GSUB title " " "" GSUB !
loop nl !

The regular expression cautiously matches whole tokens that are lowercase and include an underscore, using a non-capturing group (?:...) to apply + to a grouping without having it be punched out. The word title ( s1 -- s2 ) capitalizes the first letter of each word in a string.

Gekdoc Example

It'd be shameful if gekdoc wasn't written in Gek. We now know enough to have a look at it. Gekdoc scans the documentation files for listings of the words you pass. If we give gekdoc a word like pop it shows listings for pop, pop?, #pop, POP, POP? and #POP, by creating a regular expression. You can also pass multiple words at the same time. Here's the essence of it:

#! /usr/bin/env -S gek -f

B
	stack Patterns
	Args foreach
		"(?i)^#?\$?" x escape "\??\s.*\(.*--.*\)$" 3 #, push Patterns
	loop
	POP? Patterns if set Pat else EXIT then

	stack WordFiles  "/usr/share/doc/gek/words" entries as WordFiles  WordFiles as Args
;

# If the current line meets the pattern, print lines until the next stack effect, and repeat.
forever r Pat meets while
	forever r $  bump? while  r "\(.*--.*\)$" meets until loop
loop

FE
	# If finished the files, go through them again looking for the next pattern.
	tot Args 1 = if
		POP? Patterns if set Pat  WordFiles as Args then
	then
;

Straight away it makes a stack of the patterns it'll look for, one for each command-line argument. The regular expression is made from two string literals with the command-line argument sandwiched in between. The (?i) makes it case-insensitive so it will match both pop and POP. The #? $? and \?? allow optional # $ ? symbols and the rest looks for a stack effect. I've put escape ( s -- re ) after x to remove the meaning of any special characters so it matches itself literally. Down a bit it makes a stack of the filepaths in /usr/share/doc/gek/words and sets them to be its input files. We can see why it makes a stack of them looking at the FE block. I've been very anal and instead of looping through the patterns each file, I've made it go through all the files for each pattern so the order of listings is the same as the order of arguments. The main loop in the middle prints lines from a match till a line with a stack effect that doesn't match. It uses bump? while instead of bump so it breaks out the loop if it reaches the end of the file. The word until ( f -- ) is the opposite of while, breaking out the loop if the flag is true, like 0 = while. The outer forever loop is needed so the lines that trigger until are themselves processed. And that's that.

Macros

We can name a group of words for readability and reusability. This snippet defines a word called increment that increments a number by one: : increment 1 + ; More generally the layout is: : <name> <code to expand to> ; These new words get called macros. This example defines a macro called sqr and uses it to print the first and last fields squared:

: sqr  dup * ;
0 f sqr .  -1 f sqr $

Here's a macro from one of my programs that moves the terminal's cursor:

# ( column row -- ) Moves the cursor to the coordinate.
: at  "\033[" es ! ! ";" ! ! "H" ! ;

You can see why I wrapped that up with a name. I'll show you the program at the end and cover es, but you can enter this program some coordinates to play with the macro:

: at  "\033[" es ! ! ";" ! ! "H" ! ;
r snap at "hi" !

The word snap ( s -- s1 s2 ) splits a string in two at the first whitespace.

Brace Example

Let's look at a program I use day to day that formats the braces used in C-like programming languages nicley. Given this: func pushString(s string) { push(Str(s)) } func pushStrings(strs []string) { for _, s := range strs { pushString(s) } } It spits out: func pushString(s string) { push(Str(s)) } func pushStrings(strs []string) { for _, s := range strs { pushString(s) } } This is a deceivingly tricky problem with how the braces can nest in each other. My program repeatedly jumps to the next { and cascades any } that were behind the {. The } decrement a variable holding nesting/indentation level and the } increment it. The { are printed indented with a newline after them. Here it is:

#! /usr/bin/env -S gek -f

# ( -- ) Prints indentation.
: indent  tab Nesting scale ! ;
# ( s -- ) Prints s indented and a newline, unless s is empty.
: indented$  dup len if indent $ else drop then ;

# ( s1 -- s2 )
# Takes a string like "a } b } c",
# prints:
#             a
#         }
#         b
#     }
# and returns "c".
: cascade
	forever lstrip "}" $snap? while
		swap rstrip indented$
		1 -set Nesting "}" indent $
	loop
;

# Use the first line's indentation as the starting point.
1 n = if r "^\t+" meets drop match len set Nesting then

r
forever "{" $snap? while
	swap cascade
	rstrip indent . "{" $ 1 +set Nesting
loop
cascade indented$

At the top I've defined a couple helper macros for printing indentation to match the nesting level. The word Nesting hasn't been defined yet but it will be before the macros are encountered. The cascade macro helps break the problem down, it's used to deal with the } skipped when we jump to the next {. The words lstrip and rstrip remove whitespace from the left/right of a string. It first sets Nesting to the number of leading tab characters on the first line. This acts as the baseline indentation in case the code is already indented. The word match returns an empty string if there was no match, so meets drop match len returns the length the match or zero if there wasn't one. Then the biggie. It snaps each line at each { and passes what was behind each { to cascade. Cascade prints the part up to the last } nicely while decrementing Nesting, and returns the other bit that was between the last } and the {. This bit is then printed indented with a { plus a newline, and Nesting is incremented. When there are no { to snap, it breaks out the loop and cascades any remaining } and also prints anything after the last }. Cool beans.

Defining Fields and Records

Up to now programs have run for every line of input, but more generally programs run for every record of input, and we can define what a record is. The word separator ( re -- ) defines what terminates a record instead of the usual newline. This program splits lines at semicolons: B ";|\n" separator ; r $ There's also record ( re -- ) that defines what a record is directly. This program prints how many numbers (groups of consecutive digits) there are in the input: B "\d+" record ; E n $ ; There's a subtle difference between ";" separator and "[^;]+" record, with the former producing empty records with consecutive separators. If we did "[^\n]+" record, making a record one or more characters except newline, a blank line wouldn't be a record. It's same idea for fields with delim ( re -- ) defining what terminates a field and field ( re -- ) defining what a field is directly. By default fields are defined by "\S+" field so you never get an empty one.

Command-line Parsing

There are two words intended for parsing command-line arguments. The word varparse <Mapping name> ( s... c -- s... c ) looks for strings like "key=value" in its arguments and assigns the results to a mapping. It returns the strings that didn't have equal signs. Here's a little templating command that fills in {{ placeholders }}:

B mapping Replacement  Args varparse Replacement as Args ;
r "{{\s*(\S+)\s*}}" punch foreach x ! X Replacement ! loop nl !

You could run it like:

$ printf %s\\n 'Hi {{ name }}!' "Why not do what you love like {{ loves }}!" | gek -f prog.gek -- name=John loves='the smell of the wind'

Or with files like: $ gek -f prog.gek -- name=John loves='the smell of the wind' myfile myfile2 The word optparse <Mapping name> ( s... c s1 -- s... c ) is for parsing command-line options like -v and --verbose. Its s1 argument specifies the allowed options and whether they take an argument. For example, "help? e: f:" would specify an option --help doesn't an argument, and options -e and -f that do. The rest are the list of options and positional arguments to parse, which in a horrible case might look like: "myfile" "-r" "--verbose" "myfile2" "-aJx" "--level" "3" "--debug=true" "-omyfile" 9 The option results are assigned to the mapping and the positional arguments are returned. Here's a simple version of the Unix tail command:

B
	mapping Opt
	10 "n" map Opt
	Args "n:" optparse Opt as Args
;
r depth "n" Opt keep
FE depth #$ ;

You could run it like: $ gek -f tail.gek -- -n3 myfile myfile2 As is the norm, passing -- to gek says to treat the following arguments as positional arguments even if they looks like options. We need this to stop gek thinking the options to our programs are for itself. Gek also has an option -F to specify a program file and end the options. If we made our tail command executable and gave it the shebang: #! /usr/bin/env -S gek -F We could run it like: $ ./tail -n3 myfile myfile2

Bytes and Unicode

We've been thinking of abstract characters although deep down we know files and output are data. It might hurt but strings are actually sequences of bytes of data and some words don't even care what they represent. For example, is compares the bytes indifferent of what they represent, while $ outputs them without remorse. Words that do take the bytes to represent text use the UTF-8 encoding to map the bytes to characters, likely the same encoding your terminal uses to render data. UTF-8 is a superset of ASCII able to represent all the characters of Unicode and the dominant encoding on the web.
(*Technically the bytes map to atomic units called Unicode code points but I'm not gonna fret.) Remember though, strings don't need to be valid UTF-8. Besides reading input, we can embed arbitrary bytes with the word es ( s1 -- s2 ) who substitutes escape sequences: B "H\n\tE\n\t\tL\n\t\t\tL\n\t\t\t\tO" es $ ; My macro for moving the cursor uses es to embed the ESC character which terminals interpret specially (ANSI escape sequence): : at "\033[" es ! ! ";" ! ! "H" ! ; The most important thing is Gek can handle UTF-8 input, the modern standard.

Fluidity

We'll wrap up with a few powerful words that change Gek. The words PRENUM and POSTNUM let us change what represents a number. They're events that take code up to a ; like B and E. PRENUM blocks execute before each number argument is taken from the stack. To give an example, many countries use a decimal comma (1,23) instead of a decimal point (1.23), we could replace commas with periods so they both work: PRENUM "," "." SUB ; fields sum $ If we wanted return values use commas, POSTNUM blocks execute on each returned number: PRENUM "," "." SUB ; POSTNUM "." "," SUB ; fields sum $ The TOKEN event lets us intercept the code itself. Tokens here are what we call each a bit of code, maybe a word name, number or string literal, that Gek executes one after another. Before each token is evaluated, the token is put on the stack, the TOKEN block executes, and the new top value is taken to be the token. This example ignores underscores in tokens: TOKEN "_" "" gsub ; f_ie_lds s_um_ _$_ A big brained use of this fluidity is creating syntaxes that can become programatic. Imagine a game that interprets levels from text files, the levels would usually be limited to words declaring the start, obstacles, checkpoints, etc., but say the hub level for downloading levels could harness all of Gek. Here's an example of that without the fun game bit:

B
	"mylevel" set LevelFile

	: beep  "BEEP" $ ;
	: boop  "BOOP" $ ;
	: programmatic  1 set Programmatic ;

	1 set Programmatic
	0 set InMacro
	TOKEN
		set Token
		Programmatic InMacro or if
			Token
		else
			Token "beep" "boop" "programmatic" 3 within assert
			1 set InMacro  Token eval  0 set InMacro
			""  # Skip the token, we've already evaluated it.
		then
	;

	"0 set Programmatic " LevelFile read , eval
;

Here's a valid level: beep boop boop Another valid level: programmatic beep boop boop 5 for beep loop But not this: beep boop boop 5 for beep loop I've defined three macros for the level syntax, the placeholders beep and boop, and programmatic who should enable all Gek's words. In the real one you can imagine the programatic word would throw an error if --programatic wasn't passed. The TOKEN code just passes the token untouched if Programmatic is true, but otherwise complains if the token isn't a macro name unless it's already evaluating a macro's code. It keeps track of whether it's evaluating a macro's code by setting a variable before and after evaluating macro names with eval, after which it returns an empty string so the token isn't evaluated twice. It might be a weird niche but I don't know a language better suited for this kind of thing than Gek. And finally finally finally, the word yank ( -- s ) is used in macros to steal the next token after the macro, so you can define your own words like set and map. Example 1: : plus yank + ; B 3 plus 4 $ ; Example 2: : ( forever yank ")" is until loop ; B 1 $ ( 2 $ ) 3 $ ( 4 $ ) 5 $ ; In our game example it'd let us define a syntax where arguments come after words like "background: blue".

The End

You made it, that's Gek. Gek's old school stack approach makes it simple yet expressive, it's got the ability of a general-purpose language without the drag. It's a bunch of words. Well, time to say goodbye. I hope you've enjoyed this voyage of a tutorial and find Gek useful.
$ gek 'E "bye" $ ;' But seeing as I'm a man of my word, here's my cursor moving program, it's snake:

#! /usr/bin/env -S gek -F

# ( coordinate -- ) Moves the cursor to the coordinate "<row> <column>".
: at  "\033[" es ! " " ";" sub ! "H" ! ;

# ( s -- ) A wrapper around the stty command for changing terminal settings.
: stty  "stty " swap , shell ;

# On termination, reset the terminal to sane settings.
PE "sane" stty ;

B
	# Controls
	"w" set Up  "a" set Left  "s" set Down  "d" set Right
	# Arena Dimensions
	16 set Width  16 set Height
	# Characters For Shapes
	"#" set Border  " " set Blank  "@" set Skin  "A" set AppleSkin

	# Clear the screen and move the cursor to the top left and hide it.
	"\033[2J\033[H\033[?25l" es !
	# At the end, unhide the cursor and move it below the output.
	PE "\033[?25h\033[" es ! Height 3 + ! "H" $ ;

	# Print the arena and the controls.
	Border Width 2 + scale $
	Height for Border ! Blank Width scale ! Border $ loop
	Border Width 2 + scale $
	"up:" . Up . "left:" . Left . "down:" . Down . "right:" . Right $

	# Put the snake coordinates on the stack.
	Height 2 / 2 +  Width 2 / 1 +  2 join
	Height 2 / 1 +  Width 2 / 1 +  2 join
	Height 2 /      Width 2 / 1 +  2 join
	# Draw the snake.
	Stack foreach x at Skin ! loop
	# At the end, draw a pulsing snake animation.
	E
		Stack foreach x at Blank ! 0.075 sleep loop
		Stack foreach x at Skin ! 0.075 sleep loop
	;

	: SpawnApple
		forever
			random Height 2 - mod 2 +  random Width 2 - mod 2 +  2 join set Apple
			Apple in Stack while
		loop
		Apple at AppleSkin !
	;

	# Have the terminal send each character rather than lines.
	# (and don't display the the characters and allow signals like Ctrl+C)
	"raw min 1 isig -echo" stty
	# Await a keypress.
	"dd bs=1 count=1 2>/dev/null" shellout set Key

	-1 set Vertical
	0 set Horizontal

	SpawnApple
	# Have the terminal send nothing if a key is not pressed.
	"raw min 0 time 2 isig -echo" stty
	forever
		Key Up is Vertical 0 = and if  -1 set Vertical  0 set Horizontal  then
		Key Down is Vertical 0 = and if  1 set Vertical  0 set Horizontal  then
		Key Left is Horizontal 0 = and if  0 set Vertical  -1 set Horizontal  then
		Key Right is Horizontal 0 = and if  0 set Vertical  1 set Horizontal  then

		# Add a new head and game over if it collides with the border or body.
		dup split Vertical Horizontal 2 #+  set HeadX set HeadY  HeadY HeadX 2 join set Head
		HeadY 2 <  HeadX 2 < or  HeadY Height 1 + > or  HeadX Width 1 + > or  if exit then
		Head in Stack if exit then
		Head at Skin !
		Head

		Apple in Stack if
			SpawnApple
		else
			# Erase the tail.
			depth depth #rot at Blank !
		then

		"dd bs=1 count=1 2>/dev/null" shellout set Key
	loop
;