rpn
Intro
rpn User Manual - Documentation
TODO
Critical
Long Term
The Interpreter and the Stack
Patterns
Pointers
Memory
Data Types
Real Time Type Checking
Status: Not working.
Still broken after port from a long time ago.
Reverse Polish Notation - implementing an application language. Consider the HP calculators. The motivation comes from using HP calculators which are great to program on (sticking to the basic programming structure, passing information between functions through the stack). However I came across a book called "Write your own programming language using C++" by Noman E. Smith. He argued for an application having its own mini-compiler programming language and I don't know much but it sounded good.
Bringing a command line to an application is an offer I can't refuse. The aim is to implement a stack based programming language in C++.
hello world, hello\ world. String Interpretation
The simplest solution is to change the interpreter so that " " is a white-space and is not ignored.
Extend to include backspace so quote character can be included inside the string too.
Indexed pointers are not being saved across sessions.
-.5 was interpreted as a string and not a real number.
Read file into a string and write the file out.
This can replace read. ie read in a file and interp the file.
Possible renaming of load and
save to
envread and envwrite.
Implement fileread and filewrite to read and write text files.
Evaluation of commands outside the program.
{ ... } <function>
eg
size inside program from outside.
This could be done within the language but may
be sped up with an explicit command.
Other commands would be deleted, eg prev.
Implement @ to do exactly
the same as p@ except that
it is within the programs perspective.
Change the interpreter to allow lone @ .
See rawinterpreter.h 490
rpnsave::writeprogram in scopindependentfunctions.h 530 needs to write the programs evaluation
state. So program saved and recalled are in
the same state.
Implement not equal to operator !=.
Document the loop restriction of
number of iterations, with reasons why this was
done.
This number is very rough and a guide, whats inside the loop is the multiplier! However different machines will give very different results.
See rpnfunc.cpp line 2303 rpnfor
Remove process dead code.
String manipulation. Consider adding string functionality.
substr
See C++ Standard Library p.520 ...
Almost need OO string object, but no OO crap.
string_[] string_find string_rfind
No Change of verb to subject first. Reverse back stateevalis.
mvup mvdn mvupn mvdnn to pass stack
parameters between the front and next
program on the stack.
push pop can do this but are not
direct enough. The aim of the new functions is
to transfer the stack contents more directly.
Put in functionality to toggle complex number display, but write out complex numbers in Cartesian form when saving.
Implement a ploop1 function which is similar
to a for loop but more basic.
ploop1 repeatedly evaluates a
program.
The program
continues to be evaluated if an integer of
1 is returned to the stack.
Also supply ploop0 which
repeatedly evaluates a program if an integer
of 0 is returned to the stack.
This is exactly the same as ploop1
with an inverted condition. Both are provided
because as a programmer I want the programmer
to write conditions naturally, not necessarily
invert them to satisfy the language.
Conditions are often points of failure
in writing algorithms, so I want people to
be their own masters.
eg Sum the numbers 1 to 5
{ 1 i var 0 0
{ i & + i ++ i 5 <= } ploop i rm
}
Do not copy but use pointers. The condition and body are returned.
Before - 0: C 1: B
After - B eval C eval ... , 0: C 1: B
&[] =[] rcl[]
isvector I do not believe that this was
implemented, there is no vector type. Remove any such code.
And documentation.
It is whats called an indexed pointer, I can see no gain in a function which returns its type.
pstream
The second gave the correct output with { 0 1 2 3 4 5 6 }.
f1={ x var 3 x & 2 ^ * x & - 5 + 7 mod }
{ f1 & }
interp does not consume the string which
it should.
If an objects scope is read before the dictionary it is possible to overload everything, eg []. So overloading goes some way, could overload * and do arithmetic with types.
There are issues with construction, destruction and ownership. OOP functions have ownership shared between all the class members but my code uses variables which are local and there is no sharing. An addition could destroy everything if the object is consumed, a terrible scenario.
There is also the issue of where something is done. Does an object need to be transfered to the targets own stack. Is it consumed, is it first or second in order, ... This is really unclear which is not good because I have no idea what to do, and until I can dream something up nothing will get done.
User Defined Type System.
A general user defined type system being supported. I have worked out the pattern, see One Way Visitor Pattern .
Motivations:
The One Ways visitor pattern type system is fast. If RTTI was implemented correctly it would be interesting to compare it with the visitor pattern pattern matching - it could similar in magnitude.
But for small objects its challenging to write something that is fast. If small objects have to be constructed many times and their execution is much less time in magnitude than their matching then there is a problem.
In light of this the implementation of bit set is critical, can I write something that goes fast - comparable with native C++ code?
Read and write files to a string variable.
String variables not being saved correctly.
eg the wanker saved is split
to two strings when read in.
cp needs to be extended for copying
stack values. If the first argument is a string a
copy of a variable is made.
If the first argument is an integer then two obvious choices are the first n items are copied, or the nth item in the stack is copied. The first is better because the programmer already has the pointer mechanism - eg go to the destination and create a pointer with a relative path.
How the copy proceeds is an issue. Is the pointer copied
or the object fully copied? Let cp be a copy because
its so related to the name. And if someone wants efficiency
the should use mv.
eg
a b c /p1/tmp 3 cp
copies a b c on stack to /p1/tmp,
assuming it exists.
Implement string variable lookup for all the basic operators. The string names are to return pointers. Perhaps a generic strategy of realizing the names, then recalling the operator.
Motivation includes a simpler language.
i + reads better than
i & + .
Describe pointers in user manual.
Programming Language refinement:
Start writing programs and having them as an end section
to this page.
eg.
Implement bubble sort algorithm on stack.
or more excitingly erdos sieve.
Change the dictionary data structure from a map to a multi map for robustness.
Implement OOP.
See User Manual - Object Orientated Programming .
Consider the following examples to
be using the following program.
{ i & } with variables
inc={ i ++ } , init={ 0 i var }
Implement pds to evaluate
the programs data stack on the current stack.
0: { program } pds
variable: i={2}
to
1: 2
0: { program }
Implement pf to evaluate
the variable on the current stack.
0: { program } 1: inc pf
variable: i={2}
to
0: { program }
variable:i={3}
Implement pf_void and
pfi , pfi_void
which takes an integer
index instead of a string.
The _void tag indicates that
the program is not returned to the stack.
Further implement these binary functions
for both orderings of the program and
the index. The reason for this is that
these may be the most heavily used parts
of the language.
Will need scoping. An object would be a node. By moving to the node and back again bring the objects variables in the current scope. If the variables can be treated as operators(which can overload), the language can be redefined here.
pushd2 A -> B -> A pushes
the two directories onto the program stack.
It would need to be accompanied by popd2.
for needs to be altered so that
it has a maximum. When this is reached the
for loop exits.
Consider for to return a value
as to the success or failure.
lsvar to list the names of
the variables in the current directory,
returning a program containing string names.
Extend the pointer mechanism -> to include relative paths. This is a very critical mechanism. It will be used in OOP to access member variables.
eg
"/utilities/counter" ->
Returns a pointer to the counter program. The counter program is an OOP object and is subsequently used.
ls - extend to two states :
Dependent on OOP.
Implement as an object which can be configured to either state.
Implement a basic compiler for maths functions.
eg "3*x+2" and "sin(2*x)+f(x)"
and "3x>2" .
Let the input be a string and the output the program. It is non-reversible - thats another problem.
"3*x+2" compile would produce
{ 3 x & * 2 + }
The compiler could resolve the names. Then the x variable would be placed on the stack probably with a pointer. So the variables are not constant. I would probably consider two versions of compile. eg
"3*x+2" compilev would produce
{ 3 ->x * 2 + }
The differences in the two versions is significant. The binding of the variables produces faster code. However when the program is saved, pointers are realized as values so the first is the same between sessions but the fast one degenerated.
Pattern matching operations and wild cards for the string data type.
In bash there is * # ## % %%.
${cat%*.txt} where
cat=crap2.txt gives crap2 .
Note: test against bash so it is has exactly the same output as bash would produce.
In relation to wild card * the above also worked
for ${cat%.txt} giving the same
output.
0: ${cat%.txt} pattern match
The string is interpreted: variables are substituted and the pattern matching function is performed. Let the input string be modified to the new pattern if a successful match occurred. Let the result of the match be placed on the stack as 0 or 1.
Consider issues of a random number generator.
Date and Time extraction - in OO. stl014.cpp
Graphical app - link in with a gl program and run on a separate thread. May need to write functions like "wait" to pole a variable and display its contents every ten seconds. The more tricky issue of infinite looping may need to be addressed. Also the display of the stack and output should be separate. How will the interface be communicating?
Menu systems - have the directory structure represent
the menu system. Each call or menu item would be
hooked up by deriving an rpnfunction which forwards the
work to the application. For unconfigurable menu entries
the mapping is one-one. However there are examples
of menu entries that are generic.
grid<-10*10 grid<-50*50 grid<-200*200 grid<-500*500
could all use the same generic grid function and be
stored in their own directory.
{ 50 50 gltLife_grid } stored as variable "Grid".
The application's menu program looks to the current
directory to see what to display. Support for
writing and building these directory structures would be needed.
Template the rpninteger and rpnreal data types.
Add a complex number data type.
Make a calculator application. Use NTL for the maths library. If possible have a 3D function plotter using OpenGl.
Implement sin cos tan sinarc
cosarc tanarc sinh cosh
log log10 ^ sqrt as objects
using continued
fractions algorithms.
Let the number of iterations of the algorithm be configurable. So the algorithms can be written independently from the number systems and can be adjusted to them.
This is done this way with the view that rpninteger and rpnreal may be templated so the user inserts their own number system.
eg
rpnreal is given a number system that
evaluates to 200 digits. Configure tanh to
iterate 40 times - so a variable in
tanh is set to 40.
Implementing configurable functions. These are essentially objects which are also functions. But configurable functions are in C++ and hence there is a danger of having two types of objects with different perspectives.
So implement OOP first, then configurable functions.
Implement the sin function
in C++ and rpn. It will be configured
by one parameter n which
sets the number of iterations
our algorithm will perform.
Discuss using argument types to separate
processing from configuring. eg
For sin the string type could be
used to set n.
1: 5 0: "n=" sin
sets n to 5.
Where as 0: 2.3 sin triggers the
rpnreal argument and evaluates the
sin argument.
Efficiency is a very good reason to do this. For critical operators its probably best to do this.
There are differences in implementations.
The C++ will need to be initialized differently.
Considering that say sin will have to be
initialized each time rpn is loaded.
/bin/init will be created.
Programs inside this node will be evaluated
once at startup after the dictionary is
read and in the order left to right.
A dequeue will hold the variables. To determine if a string is a variable, search the dequeue from start to finish, and take the first match. This lets the variables be overloaded by putting a new variable at the start of the list. For example implementing temporary variables. Or for a more 3D data structure implement directories and the current directory, the variables are from the directories, home to the current one each added down the list.
Since its a runtime compiler any error will be ignored. Generally the stack state is preserved. Sometimes an error message is printed, but I would prefer a strategy of non-action.
The visitor pattern was used to match the functions to the data. For one dimensional functions - that is functions that only need to have one parameter type identified the standard visitor pattern does the job. On closer inspection you will have observed the visitor or data in the base class and the functions derived. This is because the functions are also data, can be put on the stack and evaluated too. So the hierarchy is a natural consequence of having everything evaluated on the stack.
For two dimensional functions such as addition the
matching of data type is more complicated. However
it is solved by reducing the problem to solving two
one-dimensional functions. For example
the function rpnadd determines the
first arguments type. It then forwards the work to
another class which assumes the first argument type.
So if its an rpninteger the work is
forwarded to addtointeger.
The visitor pattern is being used as an interpreter.
It is matching the data and function. "Write your own programming
language in C++" page 36 refers to other interpreters - of which
the switch threaded code is fast, and the subroutine threaded code
is fastest. Probably both these methods are faster than what I
am implementing. I don't know too much about compilers, but
I hope that by in-lining my virtual function calls the code
will be competitive - say about 8 times slower.
This is by no means the end of the story.
To be honest the way a rpnprogram data type
evaluates has no reuse - once a program is evaluated its
thrown away. The data structure is very versatile and
provides support for local variables, but it seems to
me to be a good candidate for a pattern.
Consider real time type checking vs pattern. Each time the pattern is used two function calls are made, and at least one class has to be implemented for each dimension. Well thats how I started thinking about the issue. However I am not a purest and implementing a class for an if statement is questionable. After some thought I believe that both can be used together. The visitor pattern still has a clear separation between function and data - and can be used for these mappings. Runtime type checking can then be used to validate the stack arguments. The solution still has structure and is versatile.
Compare rpnaddition and rpnmultiply against other binary functions. The visitor pattern uses three classes, but using real time type determination (although built into C++ I didn't use) only one class was required.
All programming constructs will be in reverse polish notation - ie stack based.
For example
consider the problem with traditional languages, the
FOR loop structure of the HP.
1 N FOR I ... NEXT
The FOR and NEXT are coupled syntax
and the whole has to be read before it can be interpreted.
This is ok but a more generic for loop has been written.
{} --- test
{} --- loop body
for
This has the advantage of using one token and can be immediately interpreted. Similarly an if-then-else construct is built.
result of test
{ "1" }
{ "0" }
ifthenelse
The commands have to be interpreted. This is based on
the state - either objects are placed on the
stack or evaluated on the stack.
The user sets the state. @+
Functions such as +,dup
derived from rpnfunction are placed in a dictionary.
When a string from the input stream is encountered, it is interpreted.
If its a primitive data type its put on the stack as such.
If its a program the interpreter has to recursively build it.
If the string token isn't a number or program data type its looked up in dictionary hierarchies. If its not found in these its assumed to be a primitive string data type.
The dictionaries are layered. If the string isn't found in one its looked for in the next. The first correct match is the one chosen, so variables can be overloaded. However more common is overloading within a dictionary itself. As it stands a dictionary is implemented as a LIFO structure. Overloading is simply being closer to the start of the stack.
The memory and evaluation were coupled so that when an object is evaluated it releases its resource. The exception to this is data types that are not programs which don't free themselves but are released by functions. The implications are far reaching - when a function or program is evaluated its destructed and can not be reused.
The reason for implementing it this way was that it was the easiest way to handle evaluation by forwarding the cleanup responsibility to the functions which know when an object has to die.
To be efficient this
compiler would have to be rewritten and memory issues
better addressed.
A fix it strategy is to implement reference counted objects,
say on programs would drastically reducing copying in something
like the for loop.
This kind of optimization
would benefit this compiler. A persistence boolean
flag in every class would be implemented. When an object
is evaluated it would delete itself based on the persistence flag.
Another strategy is to build another compiler that compiles the code much better. I will have to read up. Being independent it could plug in to this framework - where it shares the same language.
Pointers arose from the need to give the programmer flexibility when writing algorithms. eg multiply two numbers in the stack, swap two numbers in the stack. The pointer mechanism allows referencing into the stack, not just on the bottom. Thanks to Nigel Stewart for raising the issues about programming on the stack. From the shortcomings of specific functions I implemented pointers - a more generic solution.
Reference counting and pointers were simultaneously implemented. For a robust system it is critical that a null pointer can never be dereferenced (and segfault). As this is to be an application language stability takes priority over efficiency. The application language serves the program, it can never bring it down.
So reference counting lets an object stay alive as long as there exists a reference to it. So if its deleted from the stack and there still exists a pointer to it then the pointer is still pointing to a valid object.
Saving the system state is also of interest - how is a pointer to be saved. I choose to realize the pointer, so the value is written. So pointers are a unique data type in that they always represent something else and are not manipulated by the user. ie the user can create pointers but any operations on them are on what the point to.