LCD (Lua Compact Debug) was developed by Terry Ellison as a patch to the Lua system to decrease the RAM usage of Lua scripts. This makes it possible to run larger Lua scripts on systems with limited RAM. Its use is most typically for eLua-type applications, and in this first version it targets the nodeMCU implementation for the ESP8266 chipsets.
This section gives a full description of LCD. If you are writing nodeMCU Lua modules, then this page will be of interest to you, as it shows how to use LCD in an easy to configure way.
The main issue that led me to write this patch is the relatively high Lua memory
consumption of its embedded debug information, as this typically results in a
60% memory increase for most Lua code. This information is generated when any
Lua source is complied because the Lua parser uses this as meta information
during the compilation process. It is then retained by default for use in
generating debug information. The only standard method of removing this
information is to use the “strip” option when precompiling source using a
standard eLua luac.cross on the host, or (in the case of nodeMCU) using the
node.compile() function on the target environment.
Most application developers that are new to embedded development simply live with this overhead, because either they aren't familiar with these advanced techniques, or they want to keep the source line information in error messages for debugging.
The compiler generates fixed 4 byte instructions which are interpreted by the Lua VM during execution. The debug information consists of a map from instruction count to source line number (4 bytes per instruction) and two tables keyed by the names of each local and upvalue. These tables contain metadata on these variables used in the function. This information can be accessed to enable symbolic debugging of Lua source (which isn't supported on nodeMCU platforms anyway), and the line number information is also used to generate error messages.
This overhead is sufficient large on limited RAM systems to replace this scheme by making two changes which optimize for space rather than time:
luac -soption and
node.compile()do). The line number information if available is used in error reporting. An extra API call has therefore been added to discarded this debug information on completion of the compilation.
To minimise the impact within the C source code for the Lua system, an extra
system define LUAOPTIMIZEDEBUG can be set in the
user_init.h file to
configure a given firmware build. This define sets the default value for all
compiles and can take one of four values:
Building the firmware with the 0 option compiles to the pre-patch version.
Options 1-3 generate the
strip_debug() function, which allows this default
value to be set at runtime.
Note that options 2 and 3 can also change the
default behaviour of the
loadstring() function in that any functions
declared within the string cannot inherited any outer locals within the parent
hierarchy as upvalues if these have been stripped of the locals and upvalues
There are various API calls which compile and load Lua source code. During compilation each variable name is parsed, and is then resolved in the following order:
The parser and code generator must therefore access the line mapping, upvalues, and locals information tables maintained in each function Prototype header during source compilation. This scoping scheme works because function compilation is recursive: if function A contains the definition of function B which contains the definition of function C, then the compilation of A is paused to compile B and this is in turn paused to compile C; then B completes and then A completes.
The variable meta information is stored in standard Lua tables which are
allocated using the standard Lua doubling algorithm and hence they can contain a
lot of unused space. The parser therefore calls
compilation of a function has been completed to trim these vectors to the final
The patch makes the following changes if
LUAOPTIMIZEDEBUG > 0. (The existing functionality is preserved if this define is zero or undefined.)
node.stripdebug([level[,function]])as discussed below.
close_func()to replace this trim action by deleting the information according to the current default debug optimization level.
lineinfovector associated with each function is replaced by a
packedlineinfostring using a run length encoding scheme that uses a repeat of an optional line number delta (this is omitted if the line offset is zero) and a count of the number of instruction generated for that source line. This scheme uses roughly an M byte vector where M is the number of non-blank source lines, as opposed to a 4N byte vector where N is the number of VM instruction. This vector is built sequentially during code generation so this patch conditionally replaces the current map with an algorithm to generate the packed version on the fly.
node.stripdebug([level[,function]]) call is processed as follows:
node.stripdebug(3)is included in
init.lua, then all debug information will be stripped out of subsequently compiled functions.
setfenv()(except that the integer 0 level is not permitted), and this function tree corresponding to this scope is walked to implement this debug optimization level.
packedlineinfo encoding scheme is as follows:
126 (0) 24for a line generating 150 VM instructions. The high bit is always unset, and note that this scheme reserves the code
0x7Fas discussed below.
denoting the sign andn…n
the value element using the following map. This means that a single byte is used encode line deltas in the range -63 … 65; two bytes used to encode line deltas in the range -8191 … 8193, etc.. <code> value = (sign == 1) ? -delta : delta - 2</code> * This approach has no arbitrary limits, in that it can accommodate any line delta or IC count. Though in practice, most deltas are omitted and multi-byte sequences are rarely generated. * The codes0x00
are reserved in this scheme. This is because Lua allocates such growing vectors on a size-doubling basis. The line info vector is always null terminated so that the standardstrlen()
function can be used to determine its length. Any unused bytes between the last IC and the terminating null are filled with0x7F
. The current mapping scheme has O(1) access, but with a code-space overhead of some 140%. This alternative approach has been designed to be space optimized rather than time optimized. It requires the actual IC to line number map to be computed by linearly enumerating the string from the low instruction end during execution, resulting in an O(N) access cost, where N is the number of bytes in the encoded vector . However, code generation builds this information incrementally, and so only appends to it (or occasionally updates the last element's line number), and the patch adds a couple of fields to the parserFuncState
record to enable efficient O(1) access during compilation. =====Testing===== Essentially testing any eLua compiler or runtime changes are a total pain, because eLua is designed to be build against a newlib-based ELF. Newlib uses a stripped down set of headers and libraries that are intended for embedded use (rather than being ran over a standard operating system). Gdb support is effectively non-existent, so I found it just easier first to develop this code on a standard Lua build running under Linux (and therefore with full gdb support), and then port the patch to nodeMCU once tested and working. I tested my patch in standard Lua built with “make generic” and against the Lua 5.1 suite. The test suite was an excellent testing tool, and it revealed a number of cases that exposed logic flaws in my approach, resulting from Lua's approach of not carrying out inline status testing by instead implementing a throw / catch strategy. In fact I realised that I had to redesign the vector generation algorithm to handle this robustly. As with all eLua builds the patch assumes Lua will not be executing in a multithreaded environment with OS threads running different lua_States. (This is also the case for the nodeMCU firmware). It executes the full test suite cleanly as maximum test levels and I also added some specific tests to cover newnode.stripdebug()
usecases. Once this testing was completed, I then ported the patch to the nodeMCU build. This was pretty straight forward as this code is essentially independent of the nodeMCU functional changes. The only real issue as to ensure that the nodeMCUc_strlen()
calls replaced the standardstrlen()
, etc. I then built both luac.cross and firmware images with the patch disable to ensure binary compatibility with the non-patched version and then with the patch enabled at optimization level 3. In use there is little noticeable difference other than the code size during development are pretty much the same as when running withnode.compile()
stripped code. The new option 2 (retaining packed line info only) has such a minimal size impact that its worth using this all the time. I've also added a separate patch to nodeMCU (which this assumes) so that errors now generate a full traceback. =====How to enable LCD===== Enabling LCD is simple: all you need is a patched version and defineLUAOPTIMIZEDEBUG
at the default level that you want inapp/include/user_config.h
and do a normal make. Without this define enabled, the unpatched version is generated. Note that sincenode.compile()
strips all debug information, old.lc
files generated by this command will still run under the patched firmware, but binary files which retain debug information will not work across patched and non-patched versions. Other than optionally including anode.stripdebug(2)
or whatever in yourinit.lua'' , the patch is otherwise transparent at an application level.