查看“Inline Assembly/Examples”的源代码

What follows is a collection of [[Inline Assembly]] functions so common that they should be useful to most OS developers using GCC.  Other compilers may have intrinsic alternatives (see references). Notice how these functions are implemented using GNU extensions to the [[C]] language and that particular keywords may cause you trouble if you disable GNU extensions. You can still use the disabled keywords such as <tt>asm</tt> if you instead use the alternate keywords in the reserved namespace such as <tt>__asm__</tt>. Be wary of getting inline assembly just right: The compiler doesn't understand the assembly it emits and can potentially cause rare nasty bugs if you lie to the compiler.

'''WARNING!'''

Be aware, <tt>asm("");</tt> <small>(no constaints, known as [https://gcc.gnu.org/onlinedocs/gcc/Basic-Asm.html basic <tt>asm</tt>])</small> and <tt>asm("":);</tt> <small>(empty constraints, a form of [https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html extended <tt>asm</tt>])</small> are not the same! Among other differences, in basic <tt>asm</tt> you must prefix registers with a single percentage sign '%', in extended <tt>asm</tt> (even if the list of constraints is empty) you must use '%%' as a prefix in the assembly template string. If you get gcc error messages like "Error: bad register name '%%eax'", then you should insert a colon (':') before the closing parenthesis. Extended <tt>asm</tt> is generally preferred, though there are [https://gcc.gnu.org/onlinedocs/gcc/Basic-Asm.html#Remarks cases where basic <tt>asm</tt> is necessary ]. Also note that this register prefix only applies to AT&T syntax assembly, gcc's default.


== Memory access ==

=== FAR_PEEKx ===

使用默认C数据段以外的另一段读取给定内存位置的8/16/32位值。 不幸的是，直接操作段寄存器没有限制， 因此，需要手动发布<tt>mov<reg>，<segmentreg></tt>。

<source lang="c">
static inline uint32_t farpeekl(uint16_t sel, void* off)
{
    uint32_t ret;
    asm ( "push %%fs\n\t"
          "mov  %1, %%fs\n\t"
          "mov  %%fs:(%2), %0\n\t"
          "pop  %%fs"
          : "=r"(ret) : "g"(sel), "r"(off) );
    return ret;
}
</source>

=== FAR_POKEx ===

将8/16/32位值写入段：偏移地址。请注意，与farpeek非常相似，farpoke的此版本保存并恢复用于访问的段寄存器。

<source lang="c">
static inline void farpokeb(uint16_t sel, void* off, uint8_t v)
{
    asm ( "push %%fs\n\t"
          "mov  %0, %%fs\n\t"
          "movb %2, %%fs:(%1)\n\t"
          "pop %%fs"
          : : "g"(sel), "r"(off), "r"(v) );
    /* TODO: Should "memory" be in the clobber list here? */
}
</source>

== I/O access ==

=== OUTx ===

在I/O位置发送8/16/32位值。 传统名称分别为<tt>outb</tt>、<tt>outu</tt>和<tt>outl</tt>。 <tt>a</tt>修饰符强制在发出asm命令之前将val置于eax寄存器中，<tt>Nd</tt>允许将一个字节的常量值组合为常量，从而释放edx寄存器用于其他情况。

<source lang="c">
static inline void outb(uint16_t port, uint8_t val)
{
    asm volatile ( "outb %0, %1" : : "a"(val), "Nd"(port) );
    /* There's an outb %al, $imm8  encoding, for compile-time constant port numbers that fit in 8b.  (N constraint).
     * Wider immediate constants would be truncated at assemble-time (e.g. "i" constraint).
     * The  outb  %al, %dx  encoding is the only option for all other cases.
     * %1 expands to %dx because  port  is a uint16_t.  %w1 could be used if we had the port number a wider C type */
}
</source>

=== INx ===

从I/O位置接收8/16/32位值。传统名称分别为<tt>inb</tt>、<tt>inw</tt>和<tt>inl</tt>。

<source lang="c">
static inline uint8_t inb(uint16_t port)
{
    uint8_t ret;
    asm volatile ( "inb %1, %0"
                   : "=a"(ret)
                   : "Nd"(port) );
    return ret;
}
</source>

=== IO_WAIT ===

等待很短的时间（通常为1到4微秒）。 用于在旧硬件上实现PIC重新映射的小延迟，或通常作为简单但不精确的等待。

您可以在任何未使用的端口上执行IO操作：Linux内核默认使用端口0x80，该端口通常在POST期间用于在主板的十六进制显示上记录信息，但在引导后几乎总是未使用。
<source lang="c">
static inline void io_wait(void)
{
    outb(0x80, 0);
}
</source>

== 中断相关功能 ==

=== Enabled? ===

如果为CPU启用irq，则返回一个<tt>true</tt>布尔值。

<source lang="c">
static inline bool are_interrupts_enabled()
{
    unsigned long flags;
    asm volatile ( "pushf\n\t"
                   "pop %0"
                   : "=g"(flags) );
    return flags & (1 << 9);
}
</source>
=== 推/弹出中断标志 ===
有时，禁用中断，然后仅当中断以前被禁用时才重新启用它们是有帮助的。 尽管上述功能可用于此目的，但无论IF的状态设置如何，以下功能都以相同的方式进行：

<source lang="c">
static inline unsigned long save_irqdisable(void)
{
    unsigned long flags;
    asm volatile ("pushf\n\tcli\n\tpop %0" : "=r"(flags) : : "memory");
    return flags;
}

static inline void irqrestore(unsigned long flags)
{
asm ("push %0\n\tpopf" : : "rm"(flags) : "memory","cc");
}

static void intended_usage(void)
{
unsigned long f = save_irqdisable();
do_whatever_without_irqs();
irqrestore(f);
}
</source>

Memory clobber强制排序，cc clobber，因为在第二个示例中，所有条件代码都被覆盖。 在第二种情况下没有易失性，因为它没有输出，因此总是易失性的。

=== LIDT ===

定义一个新的中断表。

<source lang="c">
static inline void lidt(void* base, uint16_t size)
{   // This function works in 32 and 64bit mode
    struct {
        uint16_t length;
        void*    base;
    } __attribute__((packed)) IDTR = { size, base };

    asm ( "lidt %0" : : "m"(IDTR) );  // let the compiler choose an addressing mode
}
</source>

== CPU相关功能 ==

=== CPUID ===

请求CPU标识。有关更多信息，请参见[[CPUID]]。

<source lang="c">
/* GCC has a <cpuid.h> header you should use instead of this. */
static inline void cpuid(int code, uint32_t* a, uint32_t* d)
{
    asm volatile ( "cpuid" : "=a"(*a), "=d"(*d) : "0"(code) : "ebx", "ecx" );
}
</source>

=== RDTSC ===

读取CPU时间戳计数器的当前值并存储到EDX:EAX中。 时间戳计数器包含自上次CPU重置以来经过的时钟滴答数。 该值存储在64位MSR中，并在每个时钟周期后递增。

<source lang="c">
static inline uint64_t rdtsc()
{
    uint64_t ret;
    asm volatile ( "rdtsc" : "=A"(ret) );
    return ret;
}
</source>

这可以用来找出执行某些功能所需的时间，对于测试/基准测试等非常有用。 注意：这只是一个近似值。

在x86_64上，“A”约束期望写入“rdx:rax”寄存器，而不是“edx:eax”。 因此，GCC实际上可以通过根本不设置“rdx”来优化上述代码。 相反，您需要使用位移位手动执行此操作：

<source lang="c">
uint64_t rdtsc(void)
{
    uint32_t low, high;
    asm volatile("rdtsc":"=a"(low),"=d"(high));
    return ((uint64_t)high << 32) | low;
}
</source>

Read [https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints GCC Inline Assembly Machine Constraints] for more details.

=== READ_CRx ===

读取控制寄存器中的值。

<source lang="c">
static inline unsigned long read_cr0(void)
{
    unsigned long val;
    asm volatile ( "mov %%cr0, %0" : "=r"(val) );
    return val;
}
</source>

=== INVLPG ===

使一个特定虚拟地址的TLB（转换查找缓冲区）无效。 页面的下一个内存引用将被强制从主内存重新读取PDE和PTE。 必须在每次更新其中一个表时发出。 <tt>m</tt>指针指向逻辑地址，而不是物理或虚拟地址：ds段的偏移量。

<source lang="c">
static inline void invlpg(void* m)
{
    /* Clobber memory to avoid optimizer re-ordering access before invlpg, which may cause nasty bugs. */
    asm volatile ( "invlpg (%0)" : : "b"(m) : "memory" );
}
</source>

=== WRMSR ===

Write a 64-bit value to a MSR. The <tt>A</tt> constraint stands for concatenation of registers EAX and EDX.

<source lang="c">
static inline void wrmsr(uint32_t msr_id, uint64_t msr_value)
{
    asm volatile ( "wrmsr" : : "c" (msr_id), "A" (msr_value) );
}
</source>

On x86_64, this needs to be:

<source lang="c">
static inline void wrmsr(uint64_t msr, uint64_t value)
{
	uint32_t low = value & 0xFFFFFFFF;
	uint32_t high = value >> 32;
	asm volatile (
		"wrmsr"
		:
		: "c"(msr), "a"(low), "d"(high)
	);
}
</source>

=== RDMSR ===

从MSR读取64位值。 The <tt>A</tt> constraint stands for concatenation of registers EAX and EDX.

<source lang="c">
static inline uint64_t rdmsr(uint32_t msr_id)
{
    uint64_t msr_value;
    asm volatile ( "rdmsr" : "=A" (msr_value) : "c" (msr_id) );
    return msr_value;
}
</source>

在x86_64上，这需要：
<source lang="c">
static inline uint64_t rdmsr(uint64_t msr)
{
	uint32_t low, high;
	asm volatile (
		"rdmsr"
		: "=a"(low), "=d"(high)
		: "c"(msr)
	);
	return ((uint64_t)high << 32) | low;
}
</source>

[[Category:Assembly]]