An object memory dumper for boxed lisp data. Alastair Bridgewater, December 2008 [STATUS: DRAFT] A memory viewer can be an essential tool for an emulator author or other system implementor during the exploration phase of their work. It can also be helpful for other people curious about the actual layout of objects in memory in their environments. This has aspects of raw memory access, knowledge of the low-level type system of an implementation, and so on. Built correctly, it could conceivably be used to examine boxed memory for non-lisp environments. The first step in this project is to be able to display information about a boxed data word. Typically, this would be its encoded form, the symbolic name of its tag bits, and what we can infer of its value. So, for 64-bit SBCL, we might start with the following: (defparameter *lowtag-names* (let ((names (make-array sb-vm:lowtag-limit))) (dolist (sym (apropos-list "-lowtag" "SB-VM")) (when (and (boundp sym) (not (eq sym 'sb-vm:n-lowtag-bits))) (setf (aref names (symbol-value sym)) sym))) names)) (defun dump-boxed-word (data) (let ((lowtag (logand data sb-vm:lowtag-mask))) (format t "~(~16,'0x~) ~24A" data (aref *lowtag-names* lowtag)) (case lowtag ((#.sb-vm:even-fixnum-lowtag #.sb-vm:odd-fixnum-lowtag) (let ((value (dpb (ldb (byte 61 3) data) (byte 61 0) (if (logbitp 63 data) -1 0)))) (format t " ~D ~((#x~x)~)~%" value value))) (t (format t " ~%"))))) Testing it? Well... CL-USER> (dump-boxed-word 0) 0000000000000000 EVEN-FIXNUM-LOWTAG 0 (#x0) NIL CL-USER> (dump-boxed-word 42) 000000000000002a OTHER-IMMEDIATE-2-LOWTAG NIL CL-USER> (dump-boxed-word 40) 0000000000000028 ODD-FIXNUM-LOWTAG 5 (#x5) NIL CL-USER> (dump-boxed-word #x8000000000000000) 8000000000000000 EVEN-FIXNUM-LOWTAG -1152921504606846976 (#x-1000000000000000) NIL Looks good so far. Because we wish to be able to examine other type systems (either non-lisp type systems or merely other lisp type systems such as for a different lisp implementation or target platform) we need to define our key interfaces as generic functions. Because we wish to be able to examine other address spaces, we need proxy objects through which to do our memory access. (defgeneric read-word (address-space address)) (defclass sbcl-address-space () ()) (defmethod read-word ((address-space sbcl-address-space) address) (declare (ignore address-space)) (sb-sys:sap-ref-64 (sb-sys:int-sap address) 0)) Does it work? CL-USER> (defparameter *address-space* (make-instance 'sbcl-address-space)) *ADDRESS-SPACE* CL-USER> (read-word *address-space* (- (sb-kernel:get-lisp-obj-address nil) sb-vm:list-pointer-lowtag)) 537919511 CL-USER> (sb-kernel:get-lisp-obj-address nil) 537919511 Good enough. And we can change the interface later if we need to. A quick review of the contents of *lowtag-names* shows two fixnum lowtags (even and odd), six pad lowtags (pad0 through pad5), four immediate lowtags (other-immediate 0-3), and four pointer lowtags (instance, list, fun and other), for a total of (+ 2 6 4 4) => 16, which is right. So, on to the next step, the in-memory data associated with an object. Let's call this function dump-object-data. (defgeneric dump-object-data (address-space address)) I seem to have forgotten about supporting different type systems. Oh well, I can always redesign things later to add them. (defmethod dump-object-data (address-space address) (dump-boxed-word address) (let ((lowtag (logand address sb-vm:lowtag-mask))) (dump-sbcl-object address-space address lowtag))) What's dump-sbcl-object? (defgeneric dump-sbcl-object (address-space address lowtag) (:method (address-space address lowtag) (declare (ignore address-space address lowtag)) (format t "This object is immediate data.~%"))) (defmethod dump-sbcl-object (address-space address (lowtag (eql #.sb-vm:list-pointer-lowtag))) (dotimes (i 2) (let ((slot-address (+ address (ash i sb-vm:word-shift) (- sb-vm:list-pointer-lowtag)))) (format t "~(~16,'0X~): " slot-address) (dump-boxed-word (read-word address-space slot-address))))) Does it work? CL-USER> (dump-object-data *address-space* (sb-kernel:get-lisp-obj-address #'dump-object-data)) 00000010034ef0c9 FUN-POINTER-LOWTAG This object is immediate data. NIL That's what we expected, as we didn't tell dump-boxed-word or dump-sbcl-object about fun-pointers. Hey, why don't we go back and make dump-boxed-word use a generic function for lowtag dispatch like we're doing for dump-sbcl-object? Something for later, anyway. Continuing on: CL-USER> (dump-object-data *address-space* (sb-kernel:get-lisp-obj-address nil)) 0000000020100017 LIST-POINTER-LOWTAG 0000000020100010: 0000000020100017 LIST-POINTER-LOWTAG 0000000020100018: 0000000020100017 LIST-POINTER-LOWTAG NIL It would probably help if we could distinguish NIL automatically, but other than that and the for lists, that's not bad. So let's fix up a few pending items... First, a generic function for lowtag dispatch in dump-boxed-word, and adding the address-space for good measure (it'll be important when we try to read the header words for other-immediate objects). (defgeneric dump-boxed-sbcl-word-data (address-space data lowtag) (:method (address-space data lowtag) (declare (ignore address-space data lowtag)) (format t " ~%"))) (macrolet ((fixnum-dumper (lowtag) `(defmethod dump-boxed-sbcl-word-data (address-space data (lowtag (eql ,lowtag))) (let ((value (dpb (ldb (byte 61 3) data) (byte 61 0) (if (logbitp 63 data) -1 0)))) (format t " ~D ~((#x~x)~)~%" value value))))) (fixnum-dumper sb-vm:even-fixnum-lowtag) (fixnum-dumper sb-vm:odd-fixnum-lowtag)) (defun dump-boxed-word (address-space data) (let ((lowtag (logand data sb-vm:lowtag-mask))) (format t "~(~16,'0x~) ~24A" data (aref *lowtag-names* lowtag)) (dump-boxed-sbcl-word-data address-space data lowtag))) And while we're at it, we'll add list-pointers to dump-boxed-word. (defmethod dump-boxed-sbcl-word-data (address-space data (lowtag (eql sb-vm:list-pointer-lowtag))) (declare (ignore address-space lowtag)) (if (= data (sb-kernel:get-lisp-obj-address nil)) (format t " NIL~%") (format t " A CONS~%"))) And fix up the two callers of dump-boxed-word, dump-object-data and dump-sbcl-object (not shown). There's an extra complication with NIL on SBCL. It's also a symbol, and games were played with its in-memory representation and lowtag to make checking for listness slightly faster. CL-USER> (dump-boxed-word *address-space* (read-word *address-space* (- (sb-kernel:get-lisp-obj-address nil) sb-vm:other-pointer-lowtag))) 0000000000000046 OTHER-IMMEDIATE-1-LOWTAG NIL CL-USER> (format t "~X" sb-vm:symbol-header-widetag) 46 NIL The cheap way to deal with this is to check, in dump-sbcl-object for list-pointers, for NIL in the same way as we did in dump-boxed-sbcl-word-data and call dump-sbcl-object with other-immediate-lowtag. (defmethod dump-sbcl-object (address-space address (lowtag (eql #.sb-vm:list-pointer-lowtag))) (if (= address (sb-kernel:get-lisp-obj-address nil)) (dump-sbcl-object address-space address sb-vm:other-pointer-lowtag) (dotimes (i 2) (let ((slot-address (+ address (ash i sb-vm:word-shift) (- sb-vm:list-pointer-lowtag)))) (format t "~(~16,'0X~): " slot-address) (dump-boxed-word (read-word address-space slot-address)))))) Now, of course, NIL claims to be immediate data because we haven't defined a method for other-pointer-lowtag. Let's make a start on that, shall we? (defmethod dump-sbcl-object (address-space address (lowtag (eql #.sb-vm:other-pointer-lowtag))) (let* ((header-word (read-word address-space (- address sb-vm:other-pointer-lowtag))) (widetag (logand header-word #xff))) (dump-boxed-sbcl-object address-space (- address lowtag) header-word widetag))) (defgeneric dump-boxed-sbcl-object (address-space base-address header-word widetag) (:method (address-space base-address header-word widetag) (declare (ignore address-space base-address header-word)) (format t "Object header word has unknown widetag ~(#x~2,'0x~)~%" widetag))) And, of course, the testing: CL-USER> (dump-object-data *address-space* (sb-kernel:get-lisp-obj-address nil)) 0000000020100017 LIST-POINTER-LOWTAG NIL Object header word has unknown widetag #x46 NIL Which is what we expected. So, next we want to define a method on dump-boxed-sbcl-object for symbol-header-widetag. Looking at the SBCL source, in src/compiler/generic/objdef.lisp, we find that a symbol has a header and five or six slots (the sixth slot is the TLS index on threaded systems). (defmethod dump-boxed-sbcl-object (address-space base-address header-word (widetag (eql sb-vm:symbol-header-widetag))) (dotimes (i #-sb-thread 6 #+sb-thread 7) (let ((slot-address (+ base-address (ash i sb-vm:word-shift)))) (format t "~(~16,'0X~): " slot-address) (dump-boxed-word address-space (read-word address-space slot-address))))) CL-USER> (dump-object-data *address-space* (sb-kernel:get-lisp-obj-address nil)) 0000000020100017 LIST-POINTER-LOWTAG NIL 0000000020100008: 0000000000000046 OTHER-IMMEDIATE-1-LOWTAG 0000000020100010: 0000000020100017 LIST-POINTER-LOWTAG NIL 0000000020100018: 0000000020100017 LIST-POINTER-LOWTAG NIL 0000000020100020: 0000000020100017 LIST-POINTER-LOWTAG NIL 0000000020100028: 000000100000200f OTHER-POINTER-LOWTAG 0000000020100030: 0000001000001b41 INSTANCE-POINTER-LOWTAG 0000000020100038: 0000000000000000 EVEN-FIXNUM-LOWTAG 0 (#x0) NIL CL-USER> (dump-object-data *address-space* (sb-kernel:get-lisp-obj-address t)) 000000002010004f OTHER-POINTER-LOWTAG 0000000020100040: 0000000000000646 OTHER-IMMEDIATE-1-LOWTAG 0000000020100048: 000000002010004f OTHER-POINTER-LOWTAG 0000000020100050: 0000002ff357e610 EVEN-FIXNUM-LOWTAG 25743260866 (#x5fe6afcc2) 0000000020100058: 0000000020100017 LIST-POINTER-LOWTAG NIL 0000000020100060: 000000100000202f OTHER-POINTER-LOWTAG 0000000020100068: 0000001000001b41 INSTANCE-POINTER-LOWTAG 0000000020100070: 0000000000000000 EVEN-FIXNUM-LOWTAG 0 (#x0) NIL And it works. Where do we go from here? * We could do something reasonable with the other-immediate tags. * We could look up widetag names. * We could implement dumpers for other widetags. * We could add the other two pointer tags. * We could display something reasonable for other-pointer data, depending on the widetag (automatic symbol-name lookup, for example). * When we dump a code-object, we could disassemble the instruction spaces. * When we dump a function, we could redirect it to dump the enclosing code-object. * We could decide to make dump-boxed-word a generic function, and dispatch on address-space on the assumption that an address space is also a type system. * If we do the above with dump-boxed-word, we can define separate 32-bit and 64-bit SBCL type systems, with various options for threaded builds, changes to the type system over time, etc. and mixins for memory access style (SAP functions, reading /proc//mem if we're using ptrace on another process, reading from a core file or postmortem image on disk). * If we do the above with dump-boxed-word, we can define type systems for other lisp environments, such as the TI Explorer, reading from a load band image, or even non-lisp systems (a JVM, a Smalltalk VM...). Until next time! EOF