I am always interested to hear when a language is ported to a new plaform, in this case Python (or a subset of) on GRUB. The question would be why? Is this because you have lots of test cases already in Python and can easily deployed those test cases on using this?
Originally, it was because we had a lot of one-off test cases not written in Python. Before BITS, most BIOS test programs consisted of one-off DOS programs, or later one-off EFI programs. If you wanted a new test, or you wanted a customer to quickly gather some information for you, you sent them a new test program, or a bootable image. Now, many of those one-off tests have become Python one-liners, and you can use Python to explore the system with an interactive REPL.
Python also lets us provide additional APIs to make scripting easy. For instance, we include the complete ACPICA interpreter, so you can evaluate an arbitrary ACPI method with arguments and process the result. And we have an FFI with EFI support, so you can locate and call an arbitrary EFI protocol.
Wow. I thought this was one of those "just because you can," projects, but it looks like there are some real practical uses for firmware developers/testers.
Yeah, we wrote BITS almost entirely for practical reasons. Several groups at Intel use it to test both the BIOS and the CPU itself. (You have to have enough functionality to boot an OS, but once you have that, BITS can easily test things like CPU power management or MSR configuration.)
It could be ported to ARM systems running EFI with some effort, if someone wanted to do so. It wouldn't make much sense to port to a completely different BIOS architecture such as those on PowerPC, though.
We build CPython's source into BITS, which runs as an EFI binary. (Or as a non-EFI image bootable on a BIOS, but that version doesn't include network support.) We have almost no changes to CPython itself, other than a couple of minor fixes to enable some additional calling conventions in the ctypes module.
The hardest part of porting Python in general was porting libffi to support the EFI calling convention, which required some very fiddly assembly. That libffi support allows us to call EFI protocols from Python without writing any C code.
The hardest part of implementing networking was handling the impedance mismatch between POSIX select and EFI asynchronous calls. POSIX select lets you ask "do any of these sockets have data to read", without actually reading the data; likewise for pending client connections to accept(). EFI expects you to provide a buffer to read the data and a callback for when the data gets read. So, the only way to ask for pending data is to buffer that data for the application to read later. This requires additional copies, limiting performance. (You might have heard of "zero-copy" networking; this is "several-copy" networking.)
I apologize in advance for my lack of detailed knowledge about low-level IO, but,
Would a library like libuv/pyuv that provides an async interface to networking and file IO help in this regard?
I understand that your goals were to implement as much of the Python stdlib's APIs so that programs can run unmodified, but could you envision Python applications that are coded against pyuv running on a platform that bridges pyuv calls directly to EFI without the stdlib as an intermediary?
> Would a library like libuv/pyuv that provides an async interface to networking and file IO help in this regard?
No, it wouldn't. libuv depends on OS facilities like select, poll, or epoll, and then builds a high-performance event loop on top of them. We needed to implement those facilities themselves.
In theory, if we didn't care about Python's standard library modules, we could potentially port libuv to support the asynchronous callback mechanism used by EFI. However, we also wanted to write as much of this code as possible in Python, rather than C, to make it easier to maintain. That's why, for instance, we didn't implement the Berkeley sockets API in C to use the CPython socketmodule.c, but instead implemented our own _socket.py in Python directly on top of the efi module.
How big is the base image, e.g. printing "hello world" to the console? Is it modular, e.g. if you don't need networking[1] how hard is it to not include networking?
[1] I'm using networking as an example package, not knowing what other semi-optional packages could be used as an example.
> How big is the base image, e.g. printing "hello world" to the console?
46MB. However, that image includes three separate complete copies of all the compiled code (4.5-6MB each), so that it simultaneously boots on 32-bit BIOS, 32-bit EFI, and 64-bit EFI. It also includes the complete source code (24MB) needed to reproduce the image, to make it self-contained and redistributable without needing to have a separate distribution of source. And it includes .py files for all the BITS .pyc files (other than the Python standard library), to simplify debugging and hacking. A quick check suggests that you could pretty easily build an 8.4MB image, if you cared about the size.
> Is it modular, e.g. if you don't need networking[1] how hard is it to not include networking?
No. We haven't needed a smaller image, so we haven't worked on that. Typically, we boot it from a USB drive or hard drive, so storage space isn't a concern.
Networking, though, consists almost exclusively of Python code, and the compiled .pyc files for that don't take up much space; not a good target to start with if you wanted to reduce size.
> Awesome stuff! What are your future plans in this effort?
On a small scale, moving as much as possible from C to Python using ctypes. Much of the C code pre-dates us implementing ctypes support, so it provides bindings to things like ACPI via the Python C API. We'd like to re-write that using ctypes, which will result in far less C code and no manual management of things like reference counts of Python objects.
On a larger scale, we're outgrowing GRUB's menu system. It's been incredibly helpful to get started, but we'd like a more usable and flexible UI than GRUB menus based on dynamically generated GRUB cfg files. (Several of the menus consist of lines like "source (python)/test.cfg", where "(python)" is an in-memory GRUB filesystem implemented in Python, and Python code enumerates the test hierarchy at boot time to generate test.cfg.) In particular, we'd like to be able to have simple UI elements to select one of a set of configuration options and display the current value directly in the menu.
So yes, I can envision a UI toolkit written in Python. :)
Micropython is a independent implementation of Python with some microcontroller-specific extensions and able to run on smallish microcontrollers, this is a slightly modified "standard" CPython für x86.
why would they not use lua ? which is pretty popular (and pretty much designed) as embedded scripting languages. Not to mention much more lightweight than python.. and has a much smaller footprint.
The target audience for this is BIOS developers and firmware engineers. Several existing development tools in that area already support Python, including Simics and the Intel ITP, so the target audience is used to it.
Lua also has several other impedance mismatches with the target audience, such as 1-based array indexing (remember, the target audience writes assembly and C).
While people often describe Lua as "easy to embed", personally, I don't find Lua's stack-based C API easy at all; in particular, the references to stack-based indexes that change when you push or pop the stack feels like working with stack-based parameters in assembly without a base pointer, where the offset of a given location can change at any time. I much prefer the Python C API.
Finally, Lua would require additional libraries to provide a useful environment, precisely because it can assume so little about its environment. For instance, we'd need to bundle one of the Lua socket libraries in addition to Lua itself, as well as an HTTP library. Python's "batteries included" standard library works nicely here.
Lua isn't part of upstream GRUB; it lives in a separate module, "grub-extras". As far as I can tell, it also wouldn't be easy to extend that with additional functionality or modules without forking it, due to the way GRUB builds out-of-tree modules.
Note that this port of Python is almost as old as grub's Lua; we've been working on BITS since ~2009, and it has had Python support since May 2011.
For reference, the Python port consists of less than 1000 lines of code, most of which provides compatibility implementations of standard C and POSIX functions that Python expects. (Plus another couple thousand lines implementing additional C Python modules like "_smp", "_acpi", and "_efi", but those are getting smaller as we move more code into Python using ctypes.)
Your other reasons for going with Python make sense, but this one surprises me. Isn't the Python C API's reference counting tedious? I've written bindings to C libraries for both Python and Lua. With Python, I always use something higher level than the Python C API, like Cython or ctypes. With Lua, I've cranked out lots of bindings using the Lua C Api directly.
The reference counting is annoying, yes; however, it's the cost of allowing C to hold references to Python objects.
I find a single call to PyArgs_ParseTuple at the top of a Python C entry point much nicer than referencing various values off the Lua stack as they move around.
That said, we're in the process of rewriting as many of those functions as possible from C into Python itself, using ctypes.
I made no mention of Lua. The context is C and assembly programmers as an audience.
But let's consider Lua. Most of the Lua pogrammers I know also know C. How do they handle the so-called "impedence mismatches"? (Should this be plural? What else besides 1-based arrays?)
As far as I know Lua's target audience is, at least in part, C programmers. That's why it's made to be easy to embed in a C program.
JoshTriplett's comment was a response to "why would they not use lua?" The points were all made relative to "C and assembly programmers" using Python vs. the same using Lua.
You can't simply pull a few words out of the comment and assume it means there is no mismatch what C programmers expect and what Python offers. It doesn't mean that and wasn't meant to mean that.
You wrote "Lua's target audience is, at least in part, C programmers". That is also true of CPython. Bear in mind that these facts aren't that relevant. This code is not mean for the wider class of "C programmers". The relevant bit is "the target audience writes assembly and C". This is far different than "the target audience is assembly and C programmers", which appears to be what you think it says.
Instead, the target audience is the narrower class of people writing EFI binaries and working with the EFI protocol. As JoshTriplett further clarified, the "existing development tools in that area already support Python", but apparently not in Lua.
Speculation: At least for the given Intel use case, lightweight doesn't sound to important: they can use the entire machine. And given the fairly specialist application, porting a language experts and users are familiar with (or maybe was used in existing tooling) might be better.
You can launch a lua vm with an empty environment table (in other words no standard library). This means only the functions you manually put into the environment table can be used which makes sandboxing extremely easy. I must admit I've only used luaj where interop to java boils down to something like http://stackoverflow.com/a/19629914 and you can optionally use reflection for even easier intertop if I recall correctly.
Lua as mentioned before is very suited to run in an embedded environment because of it small size (both when executing and the actual source code size) and its modules, which makes it trivial to remove everything except the core language (IO support etc).
I'm running Lua on a DLX like processor implemented on a fpga, in this context Lua performs the same role as Basic did on many home computers in the 80's, a small IPL is setting up the machine and loading Lua on boot.
Lua is extremely small, simpler, and has a jitted version. It's been used extensively as a scripting language in video games so has more mileage in constrained environment. Never used it really, just stating what other people say.
And luajit author, Mike Pall, is extremely skilled.
No. We don't have any need for that ourselves, but if someone wanted to work on that, and produced a clean patch that wasn't filled with #ifdefs in C and ifs in Python, we'd take it.
The biggest challenge would be adding equivalents of the assembly that implements SMP support (waking up other CPUs and putting them into a loop ready to run code on request).