( available now in all kernels >= 2.0.28, >= 2.1.15 )
The below gives some details on the new kernel vm86 functionality that is used for a `full feature dosemu'. We had more of those kernel changes in the older emumodule, but reduced the kernel support to an absolute minimum. As a result of this we now have this support in the mainstream kernels >= 2.0.28 as well as >= 2.1.15 and do not need emumodule any more (removed since dosemu 0.64.3). To distinguish between the old vm86 functionality and the new one, we call the later VM86PLUS.
Written on January 14, 1997 by Hans Lermen <email@example.com>.
Changes to arch/i386/kernel/vm86.c
New vm86() syscall interface
The vm86() syscall of vm86plus contains a generic interface: old style vm86 syscall is 113, the new one is 166. At entry of vm86() the vm86_struct gets completely copied into kernel space and now remains on the kernel stack until control return to user space. This has the advantage that performance is increased as long as emulation loops between VM86 and kernel space ( which happens quite often ). A second advantage is, that we now better can translate between old vm86_struct, vm86plus_struct and kernel 2.1.x changed internal pt_regs, hence old vm86 and new vm86plus user space binaries run on both 2.0.x and 2.1.x kernel. The entry routine of the old style vm86() translates to the new expanded vm86plus_struct before calling the common new do_sys_vm86().
It is possible to detect the existence of vm86plus support in the kernel by just calling vm86(0,(void *)0) on syscall 166 entry. On success 0 is returned, an unpatched kernel will return with -1.
Additional Data passed to vm86()
When in vm86plus mode vm86() uses the new `struct vm86plus_struct' instead of `struct vm86_struct'. This contains some additional flags that are used to control whether vm86() should return earlier than usual to give the timer emulation in dosemu a chance to be in sync. Without this, updating the emulated timer chip happens too seldom and may even result in `jumping back', because the granulation is too big and rounding happens. As we don't know what granulation the DOS application is relying on, we can't emulate the expected behave, hence the application locks or crashes. This especially happens when the application is doing micro timing.
As a downside of `returning more often', we get DOS-space stack overflows, when we suck too much CPU. This we compensate by detecting this possibility and decreasing the `return rate', hence giving more CPU back to DOS-space.
So we can realize a self adapting control loop with this feature.
Vm86plus also hosts the IRQ passing stuff now, that was a separate syscall in the older emumodule (no syscallmgr any more). As this IRQ passing is special to dosemu, we anyway couldn't it use for other (unix) applications. So having it as part of vm86() should be the right place.
GDB is a great tool, however, we can't debug DOS and/or DPMI code with it. Dosemu has its own builtin debugger (dosdebug) which allows especially the dosemu developers to track down problems with dosemu and DOS applications for which (as usual) we have no source. ( ... and debugging DOS applications always has been the `heart' of dosemu development ).
Dosdebug uses some special flags and data in `vm86plus_struct', which are passed to vm86(), and vm86() reacts on it and returns back to dosemu with the dosdebug special return codes.
As with dosemu-0.64.1 you now can run both debuggers simultaneously, dosdebug as well as GDB. Dosdebug will be triggered only for VM86 traps and with GDB you may debug dosemu itself. However, GDB can't be used when DPMI is in use, because it will break on each trap that is used to simulate DPMI, you won't like that.
Changes to arch/i386/kernel/ldt.c
New functioncode for `write' in modify_ldt syscall
In order to preserve backword compatibility with Wine and Wabi the changes in the LDT stuff are only available when using function code 0x11 for `write' in the modify_ldt syscall. Hence old binaries will be served with the old LDT behavior.
`useable' bit in LDT descriptor
The `struct modify_ldt_ldt_s' got an additional bit: `useable'. This is needed for DPMI clients that make use of the `available' bits in the descriptor (bit 52). `available' means, the hardware isn't using it, but software can put information into.
Because the kernel does not use this bit, its save and harmless. Windows 3.1 is such a client, but also some 32-bit DPMI clients are reported to need it. This bit only is used for 32-bit clients. DPMI-function SetDescriptorAccessRights (AX=0009) passes this in bit 4 of CH ((80386 extended access rights).
`present' bit in LDT selector
The function 1 (write_ldt) of syscall modify_ldt() allows creation/modification of selectors containing a `present' bit, that get updated correctly later on. These selectors are setup so, that they either can't be used for access (null-selector) or the `present' info goes into bit 47 (bit 7 of type byte) of a call gate descriptor (segment present). This call gate of course is checked to not give any kernel access rights. Hence, security will not be hurt by this.
Changes to arch/i386/kernel/signal.c
Because DPMI code switches via signal return, some type of selectors that the kernel normally would not allow to be loaded into a segment registers have been made loadable. The involved register are DS, ES FS and GS. Loading of CS or SS is not changed.
The original kernel code would forbid any non-null selector that hasn't privilege level 3, and this also could be one of the LDT selectors. However, sys_sigreturn doesn't check the descriptors that belong to the selector, hence would not see that they are save. But as we assure proper setting of all LDT selector via `write_ldt' of modify_ldt(), we safely may allow LDT selectors to be loaded. If they are not proper, we then get an exception and have a chance to emulate access. And because old type binaries (Wabi) will not be able create newer type selector (see 2.2.1), gain this wont hurt.
Changes to arch/i386/kernel/traps.c
The low-level exception entry points for INTx (x= 0, 1..5, 6) in the kernel normally send a signal to the process, that then may handle the exception. For INT1 (debug), the kernel does special treatment and checks whether it gets interrupted from VM86.
Due to limitation in how we can handle signals in dosemu without becoming to far behind `real time' and because we need to handle those things on the current vm86() return stack, we need to handle the above INTx in a similar manor then INT1.
When INTx happens out of VM86 (i.e. the CPU was in virtual 8086 mode when the exception occurred), we do not send a signal, but return from the vm86() syscall with an appropriate return code.
If the above INTx happens from within old style vm86() call, the exceptions also are handled `the old way'. (backward comptibility)
( If you have an application that needs it, well then it won't work, and please don't ask us to re-implement the old behaviour. We have good reasons for our decision. )
Kernel space LDT.
Some DPMI clients have really odd programming techniques that don't use the LAR instruction to get info from a descriptor but access the LDT directly to get it. Well, this is not problem with our user space LDT copy (LDT_ALIAS) as long as the DPMI client doesn't need a reliable information about the `accessed bit'.
In the older emumodule we had a so called KERNEL_LDT, which (readonly) accessed the LDT directly in kernel space. This now has been abandoned and we use some workarounds which may (or may not) work for the above mentioned DPMI clients.
LDT Selectors accessing the `whole space'
DPMI clients may very well try to create selectors with a type and size that would overlap with kernel space, though the client normally only would access user space with such selectors (e.g. expand down segments).
This was a security hole in the older Linux kernel, that was fixed in the early 1.3.x kernel series. Due to complaints on linux-msdos emumodule did allow those selectors if dosemu was run as root. Because only very few DOS applications are needing this (e.g. some odd programmed games), we now favourite security and don't allow this any more.
In order to gain speed and to be more atomic on some operations we had so called fast syscalls, that uses INT 0xe6 to quickly enter kernel space get/set the dosemu used IRQ-flags and return without letting the kernel a chance to reschedule.
Today the machines perform much better, so there is no need for for those ugly tricks any more. In dosemu-0.64.1 fast syscalls are no longer used.
Separate syscall interface (syscall manager)
The old emumodule uses the syscallmgr interface to establish a new (temporary) system call, that was used to interface with emumodule. We now have integrated all needed stuff into the vm86 system call, hence we do not need this technique any more.
Next Previous Contents
|The DOSEMU team|