New unit structure and mini parser

To speed up the amount of time the IDE uses to display the unit structure, we have spent some time implementing a special parser. This parser is diffent from the one already present in the compiler core – in that it does not go through the ordinary compiler steps (tokenizing, parsing, building an AST, structure verification). It is designed to chew trough the syntax according to the language rules, extract symbol names and their positions – and finish the moment it reaches the “implementation” or “end.” termination symbols.

This is also a very hands-on test for the parser framework I wrote about earlier, testing to see how well our framework works with such a daunting task. One thing is having a parser that handles miniscule tasks like an ini-file or dfm text files; a full on language parser is something else entirely.

Performance

Since this is not a full parser in the traditional sense, although it could relatively easily be turned into one, the cost of parsing is surprisingly low. Normally you can draw some statistical vectors between speed and resource consumption, where the latter is usually a penalty of caching and pre-calculation. In our parser there is none of this. It does exactly what it says, and it’s every bit as tricky to implement as you imagine.

The current parser test application. Here parsing qtx.dom.widget.pas from our RTL.

When you write a parser for a language like Object Pascal, especially a living, vibrant dialect such as QTX with it’s many syntax modernizations and enhancements – you really find out just how many combinations and variations the language truly has.

This one for example:

function jsObject: variant; external 'Object' property;

Just like Delphi or Freepascal can reference DLL functions by declaring them as external, so can QTX. The difference is that under QTX we are not mapping DLL library symbols to a local symbol so we can use it; we are telling the compiler what it should replace our mapping with. Whenever we use JsObject() in our code, the compiler will eventually just replace that with “Object”. This is a big difference.

So all we are doing here is telling the compiler that there is an external symbol called “object” that we want mapped as a local function.

The wildcard is that extra keyword at the end, namely “property”.

What does the property keyword do? Well, when the property keyword is found after the external qualifier, then the method will be codegen’ed without parenthesis. If it has parameters, brackets will be used instead of parenthesis.

Another road rarely traveled, is when you have external objects which (just as an example) only allow a write-only property. Take this for example:

type
   JExternal = class external
      function SetRW(value : String); external 'setRW';
      property FieldR : Integer read external 'hello';
      property FieldW : Integer write external 'foo';
      property FieldRW : String read external 'world' write SetRW;
   end;

Here we have a class that is marked as external, but the JS names for properties doesnt really suit us, so we use our own names in the pascal code. This is perfectly legal as long as we mark the properties as external, and provide a valid name for them.

But notice that last property, FieldRW, and how we suddenly have an external keyword right in the middle of the definition. Basically what this syntax is saying, is that whenever you read from this property — you will just read the JS property called “world”. But you can write to it all you want.

The number of times you will be using these declarations will be slim. And honestly, most people would define these wrapper classes differently. The above will be more useful if you have a JS object that has a “world” property, and then a SetWorld() method. Since these go together in the pascal world you can use the esoteric syntax above to unify things.

My point with all of this was not to scare you with esoteric, edge case syntax — but rather to underline that my parser code have to deal with all of this. I still need to add that somewhat odd use of external, but I think I have implemented the rest of it.

This is why it’s so important to test with real-life units, like those from our RTL that is already being used. Because if it can handle the RTL, then it’s ready for inclusion in the IDE.

Why a separate parser to begin with?

DWS is awesome, but building an AST means doing a full compile of the unit you are editing. If this is a form-unit, it will pull in all the units in the uses-clause (recursively), which however brief is a consuming task. In a perfect world we would just stick to the AST for everything, but reality has proven that it’s sometimes good to have a scalpel and not try to chainsaw our way through everything.

Things like Rename Unit, Add unit to uses, Remove unit from uses, Check if unit is in the uses, check if a class is in the unit, check if a member is defined in a class — stuff that you probably dont think about that the IDE does. We also want to expand things, like rename classes etc too later.

This was yesterday before i optimized things, so it was 16ms slower then 🙂

Once this is in place I will circle back and put our background compiler in a thread, now that a lot of the housekeeping jobs can be delegated to our mini-parser. And then it’s time to deal with the UI tickets and license back-end (yes we need to have the license node.js server on our website before we launch, but just the license stuff, not a full shop!) – and then (unless something else turns up) we are done!

Published by Jon Lennart Aasenden

Lead developer for Quartex Pascal

Leave a Reply