|  | = Tracing = | 
|  |  | 
|  | == Introduction == | 
|  |  | 
|  | This document describes the tracing infrastructure in QEMU and how to use it | 
|  | for debugging, profiling, and observing execution. | 
|  |  | 
|  | == Quickstart == | 
|  |  | 
|  | 1. Build with the 'simple' trace backend: | 
|  |  | 
|  | ./configure --enable-trace-backend=simple | 
|  | make | 
|  |  | 
|  | 2. Create a file with the events you want to trace: | 
|  |  | 
|  | echo bdrv_aio_readv   > /tmp/events | 
|  | echo bdrv_aio_writev >> /tmp/events | 
|  |  | 
|  | 3. Run the virtual machine to produce a trace file: | 
|  |  | 
|  | qemu -trace events=/tmp/events ... # your normal QEMU invocation | 
|  |  | 
|  | 4. Pretty-print the binary trace file: | 
|  |  | 
|  | ./scripts/simpletrace.py trace-events trace-* | 
|  |  | 
|  | == Trace events == | 
|  |  | 
|  | There is a set of static trace events declared in the "trace-events" source | 
|  | file.  Each trace event declaration names the event, its arguments, and the | 
|  | format string which can be used for pretty-printing: | 
|  |  | 
|  | qemu_vmalloc(size_t size, void *ptr) "size %zu ptr %p" | 
|  | qemu_vfree(void *ptr) "ptr %p" | 
|  |  | 
|  | The "trace-events" file is processed by the "tracetool" script during build to | 
|  | generate code for the trace events.  Trace events are invoked directly from | 
|  | source code like this: | 
|  |  | 
|  | #include "trace.h"  /* needed for trace event prototype */ | 
|  |  | 
|  | void *qemu_vmalloc(size_t size) | 
|  | { | 
|  | void *ptr; | 
|  | size_t align = QEMU_VMALLOC_ALIGN; | 
|  |  | 
|  | if (size < align) { | 
|  | align = getpagesize(); | 
|  | } | 
|  | ptr = qemu_memalign(align, size); | 
|  | trace_qemu_vmalloc(size, ptr); | 
|  | return ptr; | 
|  | } | 
|  |  | 
|  | === Declaring trace events === | 
|  |  | 
|  | The "tracetool" script produces the trace.h header file which is included by | 
|  | every source file that uses trace events.  Since many source files include | 
|  | trace.h, it uses a minimum of types and other header files included to keep the | 
|  | namespace clean and compile times and dependencies down. | 
|  |  | 
|  | Trace events should use types as follows: | 
|  |  | 
|  | * Use stdint.h types for fixed-size types.  Most offsets and guest memory | 
|  | addresses are best represented with uint32_t or uint64_t.  Use fixed-size | 
|  | types over primitive types whose size may change depending on the host | 
|  | (32-bit versus 64-bit) so trace events don't truncate values or break | 
|  | the build. | 
|  |  | 
|  | * Use void * for pointers to structs or for arrays.  The trace.h header | 
|  | cannot include all user-defined struct declarations and it is therefore | 
|  | necessary to use void * for pointers to structs. | 
|  |  | 
|  | * For everything else, use primitive scalar types (char, int, long) with the | 
|  | appropriate signedness. | 
|  |  | 
|  | Format strings should reflect the types defined in the trace event.  Take | 
|  | special care to use PRId64 and PRIu64 for int64_t and uint64_t types, | 
|  | respectively.  This ensures portability between 32- and 64-bit platforms. | 
|  |  | 
|  | === Hints for adding new trace events === | 
|  |  | 
|  | 1. Trace state changes in the code.  Interesting points in the code usually | 
|  | involve a state change like starting, stopping, allocating, freeing.  State | 
|  | changes are good trace events because they can be used to understand the | 
|  | execution of the system. | 
|  |  | 
|  | 2. Trace guest operations.  Guest I/O accesses like reading device registers | 
|  | are good trace events because they can be used to understand guest | 
|  | interactions. | 
|  |  | 
|  | 3. Use correlator fields so the context of an individual line of trace output | 
|  | can be understood.  For example, trace the pointer returned by malloc and | 
|  | used as an argument to free.  This way mallocs and frees can be matched up. | 
|  | Trace events with no context are not very useful. | 
|  |  | 
|  | 4. Name trace events after their function.  If there are multiple trace events | 
|  | in one function, append a unique distinguisher at the end of the name. | 
|  |  | 
|  | == Generic interface and monitor commands == | 
|  |  | 
|  | You can programmatically query and control the state of trace events through a | 
|  | backend-agnostic interface provided by the header "trace/control.h". | 
|  |  | 
|  | Note that some of the backends do not provide an implementation for some parts | 
|  | of this interface, in which case QEMU will just print a warning (please refer to | 
|  | header "trace/control.h" to see which routines are backend-dependent). | 
|  |  | 
|  | The state of events can also be queried and modified through monitor commands: | 
|  |  | 
|  | * info trace-events | 
|  | View available trace events and their state.  State 1 means enabled, state 0 | 
|  | means disabled. | 
|  |  | 
|  | * trace-event NAME on|off | 
|  | Enable/disable a given trace event or a group of events (using wildcards). | 
|  |  | 
|  | The "-trace events=<file>" command line argument can be used to enable the | 
|  | events listed in <file> from the very beginning of the program. This file must | 
|  | contain one event name per line. | 
|  |  | 
|  | If a line in the "-trace events=<file>" file begins with a '-', the trace event | 
|  | will be disabled instead of enabled.  This is useful when a wildcard was used | 
|  | to enable an entire family of events but one noisy event needs to be disabled. | 
|  |  | 
|  | Wildcard matching is supported in both the monitor command "trace-event" and the | 
|  | events list file. That means you can enable/disable the events having a common | 
|  | prefix in a batch. For example, virtio-blk trace events could be enabled using | 
|  | the following monitor command: | 
|  |  | 
|  | trace-event virtio_blk_* on | 
|  |  | 
|  | == Trace backends == | 
|  |  | 
|  | The "tracetool" script automates tedious trace event code generation and also | 
|  | keeps the trace event declarations independent of the trace backend.  The trace | 
|  | events are not tightly coupled to a specific trace backend, such as LTTng or | 
|  | SystemTap.  Support for trace backends can be added by extending the "tracetool" | 
|  | script. | 
|  |  | 
|  | The trace backend is chosen at configure time and only one trace backend can | 
|  | be built into the binary: | 
|  |  | 
|  | ./configure --trace-backend=simple | 
|  |  | 
|  | For a list of supported trace backends, try ./configure --help or see below. | 
|  |  | 
|  | The following subsections describe the supported trace backends. | 
|  |  | 
|  | === Nop === | 
|  |  | 
|  | The "nop" backend generates empty trace event functions so that the compiler | 
|  | can optimize out trace events completely.  This is the default and imposes no | 
|  | performance penalty. | 
|  |  | 
|  | Note that regardless of the selected trace backend, events with the "disable" | 
|  | property will be generated with the "nop" backend. | 
|  |  | 
|  | === Stderr === | 
|  |  | 
|  | The "stderr" backend sends trace events directly to standard error.  This | 
|  | effectively turns trace events into debug printfs. | 
|  |  | 
|  | This is the simplest backend and can be used together with existing code that | 
|  | uses DPRINTF(). | 
|  |  | 
|  | === Simpletrace === | 
|  |  | 
|  | The "simple" backend supports common use cases and comes as part of the QEMU | 
|  | source tree.  It may not be as powerful as platform-specific or third-party | 
|  | trace backends but it is portable.  This is the recommended trace backend | 
|  | unless you have specific needs for more advanced backends. | 
|  |  | 
|  | The "simple" backend currently does not capture string arguments, it simply | 
|  | records the char* pointer value instead of the string that is pointed to. | 
|  |  | 
|  | ==== Monitor commands ==== | 
|  |  | 
|  | * trace-file on|off|flush|set <path> | 
|  | Enable/disable/flush the trace file or set the trace file name. | 
|  |  | 
|  | ==== Analyzing trace files ==== | 
|  |  | 
|  | The "simple" backend produces binary trace files that can be formatted with the | 
|  | simpletrace.py script.  The script takes the "trace-events" file and the binary | 
|  | trace: | 
|  |  | 
|  | ./scripts/simpletrace.py trace-events trace-12345 | 
|  |  | 
|  | You must ensure that the same "trace-events" file was used to build QEMU, | 
|  | otherwise trace event declarations may have changed and output will not be | 
|  | consistent. | 
|  |  | 
|  | === LTTng Userspace Tracer === | 
|  |  | 
|  | The "ust" backend uses the LTTng Userspace Tracer library.  There are no | 
|  | monitor commands built into QEMU, instead UST utilities should be used to list, | 
|  | enable/disable, and dump traces. | 
|  |  | 
|  | === SystemTap === | 
|  |  | 
|  | The "dtrace" backend uses DTrace sdt probes but has only been tested with | 
|  | SystemTap.  When SystemTap support is detected a .stp file with wrapper probes | 
|  | is generated to make use in scripts more convenient.  This step can also be | 
|  | performed manually after a build in order to change the binary name in the .stp | 
|  | probes: | 
|  |  | 
|  | scripts/tracetool --dtrace --stap \ | 
|  | --binary path/to/qemu-binary \ | 
|  | --target-type system \ | 
|  | --target-arch x86_64 \ | 
|  | <trace-events >qemu.stp | 
|  |  | 
|  | == Trace event properties == | 
|  |  | 
|  | Each event in the "trace-events" file can be prefixed with a space-separated | 
|  | list of zero or more of the following event properties. | 
|  |  | 
|  | === "disable" === | 
|  |  | 
|  | If a specific trace event is going to be invoked a huge number of times, this | 
|  | might have a noticeable performance impact even when the event is | 
|  | programmatically disabled. | 
|  |  | 
|  | In this case you should declare such event with the "disable" property. This | 
|  | will effectively disable the event at compile time (by using the "nop" backend), | 
|  | thus having no performance impact at all on regular builds (i.e., unless you | 
|  | edit the "trace-events" file). | 
|  |  | 
|  | In addition, there might be cases where relatively complex computations must be | 
|  | performed to generate values that are only used as arguments for a trace | 
|  | function. In these cases you can use the macro 'TRACE_${EVENT_NAME}_ENABLED' to | 
|  | guard such computations and avoid its compilation when the event is disabled: | 
|  |  | 
|  | #include "trace.h"  /* needed for trace event prototype */ | 
|  |  | 
|  | void *qemu_vmalloc(size_t size) | 
|  | { | 
|  | void *ptr; | 
|  | size_t align = QEMU_VMALLOC_ALIGN; | 
|  |  | 
|  | if (size < align) { | 
|  | align = getpagesize(); | 
|  | } | 
|  | ptr = qemu_memalign(align, size); | 
|  | if (TRACE_QEMU_VMALLOC_ENABLED) { /* preprocessor macro */ | 
|  | void *complex; | 
|  | /* some complex computations to produce the 'complex' value */ | 
|  | trace_qemu_vmalloc(size, ptr, complex); | 
|  | } | 
|  | return ptr; | 
|  | } | 
|  |  | 
|  | You can check both if the event has been disabled and is dynamically enabled at | 
|  | the same time using the 'trace_event_get_state' routine (see header | 
|  | "trace/control.h" for more information). |