]> bbs.cooldavid.org Git - net-next-2.6.git/log
net-next-2.6.git
13 years agoKVM: MMU: remove unnecessary remote tlb flush
Xiao Guangrong [Tue, 8 Jun 2010 12:05:05 +0000 (20:05 +0800)]
KVM: MMU: remove unnecessary remote tlb flush

This remote tlb flush is no necessary since we have synced while
sp is zapped

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: fix rcu usage warning in init_rmode()
Xiao Guangrong [Tue, 8 Jun 2010 02:15:51 +0000 (10:15 +0800)]
KVM: VMX: fix rcu usage warning in init_rmode()

fix:

[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
include/linux/kvm_host.h:258 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 1
1 lock held by qemu-system-x86/3796:
 #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffa0217fd8>] vcpu_load+0x1a/0x66 [kvm]

stack backtrace:
Pid: 3796, comm: qemu-system-x86 Not tainted 2.6.34 #25
Call Trace:
 [<ffffffff81070ed1>] lockdep_rcu_dereference+0x9d/0xa5
 [<ffffffffa0214fdf>] gfn_to_memslot_unaliased+0x65/0xa0 [kvm]
 [<ffffffffa0216139>] gfn_to_hva+0x22/0x4c [kvm]
 [<ffffffffa0216217>] kvm_write_guest_page+0x2a/0x7f [kvm]
 [<ffffffffa0216286>] kvm_clear_guest_page+0x1a/0x1c [kvm]
 [<ffffffffa0278239>] init_rmode+0x3b/0x180 [kvm_intel]
 [<ffffffffa02786ce>] vmx_set_cr0+0x350/0x4d3 [kvm_intel]
 [<ffffffffa02274ff>] kvm_arch_vcpu_ioctl_set_sregs+0x122/0x31a [kvm]
 [<ffffffffa021859c>] kvm_vcpu_ioctl+0x578/0xa3d [kvm]
 [<ffffffff8106624c>] ? cpu_clock+0x2d/0x40
 [<ffffffff810f7d86>] ? fget_light+0x244/0x28e
 [<ffffffff810709b9>] ? trace_hardirqs_off_caller+0x1f/0x10e
 [<ffffffff8110501b>] vfs_ioctl+0x32/0xa6
 [<ffffffff81105597>] do_vfs_ioctl+0x47f/0x4b8
 [<ffffffff813ae654>] ? sub_preempt_count+0xa3/0xb7
 [<ffffffff810f7da8>] ? fget_light+0x266/0x28e
 [<ffffffff810f7c53>] ? fget_light+0x111/0x28e
 [<ffffffff81105617>] sys_ioctl+0x47/0x6a
 [<ffffffff81002c1b>] system_call_fastpath+0x16/0x1b

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: rename vpid_sync_vcpu_all() to vpid_sync_vcpu_single()
Gui Jianfeng [Mon, 7 Jun 2010 02:33:27 +0000 (10:33 +0800)]
KVM: VMX: rename vpid_sync_vcpu_all() to vpid_sync_vcpu_single()

The name "pid_sync_vcpu_all" isn't appropriate since it just affect
a single vpid, so rename it to vpid_sync_vcpu_single().

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Add all-context INVVPID type support
Gui Jianfeng [Mon, 7 Jun 2010 02:32:29 +0000 (10:32 +0800)]
KVM: VMX: Add all-context INVVPID type support

Add all-context INVVPID type support.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: reduce remote tlb flush in kvm_mmu_pte_write()
Xiao Guangrong [Fri, 4 Jun 2010 13:56:59 +0000 (21:56 +0800)]
KVM: MMU: reduce remote tlb flush in kvm_mmu_pte_write()

collect remote tlb flush in kvm_mmu_pte_write() path

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: traverse sp hlish safely
Xiao Guangrong [Fri, 4 Jun 2010 13:56:11 +0000 (21:56 +0800)]
KVM: MMU: traverse sp hlish safely

Now, we can safely to traverse sp hlish

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: gather remote tlb flush which occurs during page zapped
Xiao Guangrong [Fri, 4 Jun 2010 13:55:29 +0000 (21:55 +0800)]
KVM: MMU: gather remote tlb flush which occurs during page zapped

Using kvm_mmu_prepare_zap_page() and kvm_mmu_zap_page() instead of
kvm_mmu_zap_page() that can reduce remote tlb flush IPI

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: don't get free page number in the loop
Xiao Guangrong [Fri, 4 Jun 2010 13:54:38 +0000 (21:54 +0800)]
KVM: MMU: don't get free page number in the loop

In the later patch, we will modify sp's zapping way like below:

kvm_mmu_prepare_zap_page A
kvm_mmu_prepare_zap_page B
kvm_mmu_prepare_zap_page C
....
kvm_mmu_commit_zap_page

[ zaped multiple sps only need to call kvm_mmu_commit_zap_page once ]

In __kvm_mmu_free_some_pages() function, the free page number is
getted form 'vcpu->kvm->arch.n_free_mmu_pages' in loop, it will
hinders us to apply kvm_mmu_prepare_zap_page() and kvm_mmu_commit_zap_page()
since kvm_mmu_prepare_zap_page() not free sp.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: split the operations of kvm_mmu_zap_page()
Xiao Guangrong [Fri, 4 Jun 2010 13:53:54 +0000 (21:53 +0800)]
KVM: MMU: split the operations of kvm_mmu_zap_page()

Using kvm_mmu_prepare_zap_page() and kvm_mmu_commit_zap_page() to
split kvm_mmu_zap_page() function, then we can:

- traverse hlist safely
- easily to gather remote tlb flush which occurs during page zapped

Those feature can be used in the later patches

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: introduce some macros to cleanup hlist traverseing
Xiao Guangrong [Fri, 4 Jun 2010 13:53:07 +0000 (21:53 +0800)]
KVM: MMU: introduce some macros to cleanup hlist traverseing

Introduce for_each_gfn_sp() and for_each_gfn_indirect_valid_sp() to
cleanup hlist traverseing

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: skip invalid sp when unprotect page
Xiao Guangrong [Fri, 4 Jun 2010 13:52:17 +0000 (21:52 +0800)]
KVM: MMU: skip invalid sp when unprotect page

In kvm_mmu_unprotect_page(), the invalid sp can be skipped

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Make sure single type invvpid is supported before issuing invvpid instruction
Gui Jianfeng [Fri, 4 Jun 2010 00:51:39 +0000 (08:51 +0800)]
KVM: VMX: Make sure single type invvpid is supported before issuing invvpid instruction

According to SDM, we need check whether single-context INVVPID type is supported
before issuing invvpid instruction.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Reviewed-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86: use linux/uaccess.h instead of asm/uaccess.h
Lai Jiangshan [Wed, 2 Jun 2010 09:06:03 +0000 (17:06 +0800)]
KVM: x86: use linux/uaccess.h instead of asm/uaccess.h

Should use linux/uaccess.h instead of asm/uaccess.h

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: cleanup "*new.rmap" type
Lai Jiangshan [Wed, 2 Jun 2010 09:01:23 +0000 (17:01 +0800)]
KVM: cleanup "*new.rmap" type

The type of '*new.rmap' is not 'struct page *', fix it

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: VMX: Enforce EPT pagetable level checking
Sheng Yang [Wed, 2 Jun 2010 06:05:24 +0000 (14:05 +0800)]
KVM: VMX: Enforce EPT pagetable level checking

We only support 4 levels EPT pagetable now.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: Add Documentation/kvm/msr.txt
Glauber Costa [Tue, 1 Jun 2010 12:22:48 +0000 (08:22 -0400)]
KVM: Add Documentation/kvm/msr.txt

This patch adds a file that documents the usage of KVM-specific
MSRs.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Reviewed-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: PPC: elide struct thread_struct instances from stack
Andreas Schwab [Mon, 31 May 2010 19:59:13 +0000 (21:59 +0200)]
KVM: PPC: elide struct thread_struct instances from stack

Instead of instantiating a whole thread_struct on the stack use only the
required parts of it.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: VMX: Properly return error to userspace on vmentry failure
Mohammed Gamal [Mon, 31 May 2010 19:40:54 +0000 (22:40 +0300)]
KVM: VMX: Properly return error to userspace on vmentry failure

The vmexit handler returns KVM_EXIT_UNKNOWN since there is no handler
for vmentry failures. This intercepts vmentry failures and returns
KVM_FAIL_ENTRY to userspace instead.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: MMU: Don't calculate quadrant if tdp_enabled
Gui Jianfeng [Mon, 31 May 2010 09:11:39 +0000 (17:11 +0800)]
KVM: MMU: Don't calculate quadrant if tdp_enabled

There's no need to calculate quadrant if tdp is enabled.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: MMU: Document large pages
Avi Kivity [Thu, 27 May 2010 13:44:12 +0000 (16:44 +0300)]
KVM: MMU: Document large pages

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: MMU: Document cr0.wp emulation
Avi Kivity [Thu, 27 May 2010 11:46:04 +0000 (14:46 +0300)]
KVM: MMU: Document cr0.wp emulation

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: Allow spte.w=1 for gpte.w=0 and cr0.wp=0 only in shadow mode
Avi Kivity [Thu, 27 May 2010 11:22:51 +0000 (14:22 +0300)]
KVM: MMU: Allow spte.w=1 for gpte.w=0 and cr0.wp=0 only in shadow mode

When tdp is enabled, the guest's cr0.wp shouldn't have any effect on spte
permissions.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: Propagate fpu_alloc errors
Jan Kiszka [Tue, 25 May 2010 14:01:50 +0000 (16:01 +0200)]
KVM: x86: Propagate fpu_alloc errors

Memory allocation may fail. Propagate such errors.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Fix EFER.LME being stripped
Zachary Amsden [Thu, 27 May 2010 01:09:43 +0000 (15:09 -1000)]
KVM: SVM: Fix EFER.LME being stripped

Must set VCPU register to be the guest notion of EFER even if that
setting is not valid on hardware.  This was masked by the set in
set_efer until 7657fd5ace88e8092f5f3a84117e093d7b893f26 broke that.
Fix is simply to set the VCPU register before stripping bits.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: don't check PT_WRITABLE_MASK directly
Gui Jianfeng [Thu, 27 May 2010 08:09:48 +0000 (16:09 +0800)]
KVM: MMU: don't check PT_WRITABLE_MASK directly

Since we have is_writable_pte(), make use of it.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: calculate correct gfn for small host pages backing large guest pages
Lai Jiangshan [Wed, 26 May 2010 08:48:19 +0000 (16:48 +0800)]
KVM: MMU: calculate correct gfn for small host pages backing large guest pages

In Documentation/kvm/mmu.txt:
  gfn:
    Either the guest page table containing the translations shadowed by this
    page, or the base page frame for linear translations. See role.direct.

But in function FNAME(fetch)(), sp->gfn is incorrect when one of following
situations occurred:

 1) guest is 32bit paging and the guest PDE maps a 4-MByte page
    (backed by 4k host pages), FNAME(fetch)() miss handling the quadrant.

    And if guest use pse-36, "table_gfn = gpte_to_gfn(gw->ptes[level - delta]);"
    is incorrect.

 2) guest is long mode paging and the guest PDPTE maps a 1-GByte page
    (backed by 4k or 2M host pages).

So we fix it to suit to the document and suit to the code which
requires sp->gfn correct when sp->role.direct=1.

We use the goal mapping gfn(gw->gfn) to calculate the base page frame
for linear translations, it is simple and easy to be understood.

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Reported-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: Calculate correct base gfn for direct non-DIR level
Lai Jiangshan [Wed, 26 May 2010 08:48:25 +0000 (16:48 +0800)]
KVM: MMU: Calculate correct base gfn for direct non-DIR level

In Document/kvm/mmu.txt:
  gfn:
    Either the guest page table containing the translations shadowed by this
    page, or the base page frame for linear translations. See role.direct.

But in __direct_map(), the base gfn calculation is incorrect,
it does not calculate correctly when level=3 or 4.

Fix by using PT64_LVL_ADDR_MASK() which accounts for all levels correctly.

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: Don't allocate gfns page for direct mmu pages
Lai Jiangshan [Wed, 26 May 2010 08:49:59 +0000 (16:49 +0800)]
KVM: MMU: Don't allocate gfns page for direct mmu pages

When sp->role.direct is set, sp->gfns does not contain any essential
information, leaf sptes reachable from this sp are for a continuous
guest physical memory range (a linear range).
So sp->gfns[i] (if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL)
Obviously, it is not essential information, we can calculate it when need.

It means we don't need sp->gfns when sp->role.direct=1,
Thus we can save one page usage for every kvm_mmu_page.

Note:
  Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn()
  or kvm_mmu_page_set_gfn().
  It is only exposed in FNAME(sync_page).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Add constant for invalid guest state exit reason
Mohammed Gamal [Sun, 23 May 2010 22:01:04 +0000 (01:01 +0300)]
KVM: VMX: Add constant for invalid guest state exit reason

For the sake of completeness, this patch adds a symbolic
constant for VMX exit reason 0x21 (invalid guest state).

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: allow more page become unsync at getting sp time
Xiao Guangrong [Mon, 24 May 2010 07:41:33 +0000 (15:41 +0800)]
KVM: MMU: allow more page become unsync at getting sp time

Allow more page become asynchronous at getting sp time, if need create new
shadow page for gfn but it not allow unsync(level > 1), we should unsync all
gfn's unsync page

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: allow more page become unsync at gfn mapping time
Xiao Guangrong [Mon, 24 May 2010 07:40:07 +0000 (15:40 +0800)]
KVM: MMU: allow more page become unsync at gfn mapping time

In current code, shadow page can become asynchronous only if one
shadow page for a gfn, this rule is too strict, in fact, we can
let all last mapping page(i.e, it's the pte page) become unsync,
and sync them at invlpg or flush tlb time.

This patch allow more page become asynchronous at gfn mapping time

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Update Red Hat copyrights
Avi Kivity [Sun, 23 May 2010 15:37:00 +0000 (18:37 +0300)]
KVM: Update Red Hat copyrights

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: correctly trace irq injection
Gleb Natapov [Sun, 23 May 2010 11:28:26 +0000 (14:28 +0300)]
KVM: SVM: correctly trace irq injection

On SVM interrupts are injected by svm_set_irq() not svm_inject_irq().
The later is used only to wait for irq window.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: only update unsync page in invlpg path
Xiao Guangrong [Sat, 15 May 2010 10:53:35 +0000 (18:53 +0800)]
KVM: MMU: only update unsync page in invlpg path

Only unsync pages need updated at invlpg time since other shadow
pages are write-protected

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: don't write-protect if have new mapping to unsync page
Xiao Guangrong [Sat, 15 May 2010 10:52:34 +0000 (18:52 +0800)]
KVM: MMU: don't write-protect if have new mapping to unsync page

Two cases maybe happen in kvm_mmu_get_page() function:

- one case is, the goal sp is already in cache, if the sp is unsync,
  we only need update it to assure this mapping is valid, but not
  mark it sync and not write-protect sp->gfn since it not broke unsync
  rule(one shadow page for a gfn)

- another case is, the goal sp not existed, we need create a new sp
  for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule,
  we should sync(mark sync and write-protect) gfn's unsync shadow page.
  After enabling multiple unsync shadows, we sync those shadow pages
  only when the new sp not allow to become unsync(also for the unsyc
  rule, the new rule is: allow all pte page become unsync)

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: split kvm_sync_page() function
Xiao Guangrong [Sat, 15 May 2010 10:51:24 +0000 (18:51 +0800)]
KVM: MMU: split kvm_sync_page() function

Split kvm_sync_page() into kvm_sync_page() and kvm_sync_page_transient()
to clarify the code address Avi's suggestion

kvm_sync_page_transient() function only update shadow page but not mark
it sync and not write protect sp->gfn. it will be used by later patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: Use FPU API
Sheng Yang [Mon, 17 May 2010 09:08:28 +0000 (17:08 +0800)]
KVM: x86: Use FPU API

Convert KVM to use generic FPU API.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: Use unlazy_fpu() for host FPU
Sheng Yang [Mon, 17 May 2010 09:08:27 +0000 (17:08 +0800)]
KVM: x86: Use unlazy_fpu() for host FPU

We can avoid unnecessary fpu load when userspace process
didn't use FPU frequently.

Derived from Avi's idea.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agox86: Export FPU API for KVM use
Sheng Yang [Mon, 17 May 2010 09:22:23 +0000 (17:22 +0800)]
x86: Export FPU API for KVM use

Also add some constants.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Consolidate arch specific vcpu ioctl locking
Avi Kivity [Thu, 13 May 2010 09:35:17 +0000 (12:35 +0300)]
KVM: Consolidate arch specific vcpu ioctl locking

Now that all arch specific ioctls have centralized locking, it is easy to
move it to the central dispatcher.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: PPC: Centralize locking of arch specific vcpu ioctls
Avi Kivity [Thu, 13 May 2010 09:30:43 +0000 (12:30 +0300)]
KVM: PPC: Centralize locking of arch specific vcpu ioctls

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: s390: Centrally lock arch specific vcpu ioctls
Avi Kivity [Thu, 13 May 2010 09:21:46 +0000 (12:21 +0300)]
KVM: s390: Centrally lock arch specific vcpu ioctls

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: Lock arch specific vcpu ioctls centrally
Avi Kivity [Thu, 13 May 2010 08:53:06 +0000 (11:53 +0300)]
KVM: x86: Lock arch specific vcpu ioctls centrally

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: move vcpu locking to dispatcher for generic vcpu ioctls
Avi Kivity [Thu, 13 May 2010 08:25:04 +0000 (11:25 +0300)]
KVM: move vcpu locking to dispatcher for generic vcpu ioctls

All vcpu ioctls need to be locked, so instead of locking each one specifically
we lock at the generic dispatcher.

This patch only updates generic ioctls and leaves arch specific ioctls alone.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: cleanup unused local variable
Xiao Guangrong [Thu, 13 May 2010 02:09:57 +0000 (10:09 +0800)]
KVM: x86: cleanup unused local variable

fix:
 arch/x86/kvm/x86.c: In function â€˜handle_emulation_failure’:
 arch/x86/kvm/x86.c:3844: warning: unused variable â€˜ctxt’

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: MMU: unalias gfn before sp->gfns[] comparison in sync_page
Xiao Guangrong [Thu, 13 May 2010 02:08:08 +0000 (10:08 +0800)]
KVM: MMU: unalias gfn before sp->gfns[] comparison in sync_page

sp->gfns[] contain unaliased gfns, but gpte might contain pointer
to aliased region.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: MMU: remove rmap before clear spte
Xiao Guangrong [Thu, 13 May 2010 02:07:00 +0000 (10:07 +0800)]
KVM: MMU: remove rmap before clear spte

Remove rmap before clear spte otherwise it will trigger BUG_ON() in
some functions such as rmap_write_protect().

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: MMU: use proper cache object freeing function
Xiao Guangrong [Thu, 13 May 2010 02:06:02 +0000 (10:06 +0800)]
KVM: MMU: use proper cache object freeing function

Use kmem_cache_free to free objects allocated by kmem_cache_alloc.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: remove CAP_SYS_RAWIO requirement from kvm_vm_ioctl_assign_irq
Alex Williamson [Wed, 12 May 2010 13:46:31 +0000 (09:46 -0400)]
KVM: remove CAP_SYS_RAWIO requirement from kvm_vm_ioctl_assign_irq

Remove this check in an effort to allow kvm guests to run without
root privileges.  This capability check doesn't seem to add any
security since the device needs to have already been added via the
assign device ioctl and the io actually occurs through the pci
sysfs interface.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: VMX: Only reset MMU when necessary
Sheng Yang [Wed, 12 May 2010 08:40:42 +0000 (16:40 +0800)]
KVM: VMX: Only reset MMU when necessary

Only modifying some bits of CR0/CR4 needs paging mode switch.

Modify EFER.NXE bit would result in reserved bit updates.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86: Clean up duplicate assignment
Sheng Yang [Wed, 12 May 2010 08:40:41 +0000 (16:40 +0800)]
KVM: x86: Clean up duplicate assignment

mmu.free() already set root_hpa to INVALID_PAGE, no need to do it again in the
destory_kvm_mmu().

kvm_x86_ops->set_cr4() and set_efer() already assign cr4/efer to
vcpu->arch.cr4/efer, no need to do it again later.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: Add missing decoder flags for xor instructions
Mohammed Gamal [Tue, 11 May 2010 22:39:22 +0000 (01:39 +0300)]
KVM: x86 emulator: Add missing decoder flags for xor instructions

This adds missing decoder flags for xor instructions (opcodes 0x34 - 0x35)

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: Add missing decoder flags for sub instruction
Mohammed Gamal [Tue, 11 May 2010 22:39:21 +0000 (01:39 +0300)]
KVM: x86 emulator: Add missing decoder flags for sub instruction

This adds missing decoder flags for sub instructions (opcodes 0x2c - 0x2d)

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: Add test acc, imm instruction (opcodes 0xA8 - 0xA9)
Mohammed Gamal [Tue, 11 May 2010 19:22:40 +0000 (22:22 +0300)]
KVM: x86 emulator: Add test acc, imm instruction (opcodes 0xA8 - 0xA9)

This adds test acc, imm instruction to the x86 emulator

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: pass correct parameter to kvm_mmu_free_some_pages
Marcelo Tosatti [Thu, 13 May 2010 00:00:35 +0000 (21:00 -0300)]
KVM: pass correct parameter to kvm_mmu_free_some_pages

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: VMX: VMXON/VMXOFF usage changes
Dongxiao Xu [Tue, 11 May 2010 10:29:48 +0000 (18:29 +0800)]
KVM: VMX: VMXON/VMXOFF usage changes

SDM suggests VMXON should be called before VMPTRLD, and VMXOFF
should be called after doing VMCLEAR.

Therefore in vmm coexistence case, we should firstly call VMXON
before any VMCS operation, and then call VMXOFF after the
operation is done.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: VMX: VMCLEAR/VMPTRLD usage changes
Dongxiao Xu [Tue, 11 May 2010 10:29:45 +0000 (18:29 +0800)]
KVM: VMX: VMCLEAR/VMPTRLD usage changes

Originally VMCLEAR/VMPTRLD is called on vcpu migration. To
support hosted VMM coexistance, VMCLEAR is executed on vcpu
schedule out, and VMPTRLD is executed on vcpu schedule in.
This could also eliminate the IPI when doing VMCLEAR.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: VMX: Some minor changes to code structure
Dongxiao Xu [Tue, 11 May 2010 10:29:42 +0000 (18:29 +0800)]
KVM: VMX: Some minor changes to code structure

Do some preparations for vmm coexistence support.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: VMX: Define new functions to wrapper direct call of asm code
Dongxiao Xu [Tue, 11 May 2010 10:29:38 +0000 (18:29 +0800)]
KVM: VMX: Define new functions to wrapper direct call of asm code

Define vmcs_load() and kvm_cpu_vmxon() to avoid direct call of asm
code. Also move VMXE bit operation out of kvm_cpu_vmxoff().

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: update mmu documetation for role.nxe
Gui Jianfeng [Tue, 11 May 2010 06:36:58 +0000 (14:36 +0800)]
KVM: update mmu documetation for role.nxe

There's no member "cr4_nxe" in struct kvm_mmu_page_role, it names "nxe" now.
Update mmu document.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: MMU: Fix free memory accounting race in mmu_alloc_roots()
Avi Kivity [Mon, 10 May 2010 09:09:56 +0000 (12:09 +0300)]
KVM: MMU: Fix free memory accounting race in mmu_alloc_roots()

We drop the mmu lock between freeing memory and allocating the roots; this
allows some other vcpu to sneak in and allocate memory.

While the race is benign (resulting only in temporary overallocation, not oom)
it is simple and easy to fix by moving the freeing close to the allocation.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: inject #UD if instruction emulation fails and exit to userspace
Gleb Natapov [Mon, 10 May 2010 08:16:56 +0000 (11:16 +0300)]
KVM: inject #UD if instruction emulation fails and exit to userspace

Do not kill VM when instruction emulation fails. Inject #UD and report
failure to userspace instead. Userspace may choose to reenter guest if
vcpu is in userspace (cpl == 3) in which case guest OS will kill
offending process and continue running.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: Document KVM_SET_BOOT_CPU_ID
Avi Kivity [Thu, 29 Apr 2010 09:12:57 +0000 (12:12 +0300)]
KVM: Document KVM_SET_BOOT_CPU_ID

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Document KVM_SET_IDENTITY_MAP ioctl
Avi Kivity [Thu, 29 Apr 2010 09:08:56 +0000 (12:08 +0300)]
KVM: Document KVM_SET_IDENTITY_MAP ioctl

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed
Gui Jianfeng [Wed, 5 May 2010 01:03:49 +0000 (09:03 +0800)]
KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed

Currently, kvm_mmu_zap_page() returning the number of freed children sp.
This might confuse the caller, because caller don't know the actual freed
number. Let's make kvm_mmu_zap_page() return the number of pages it actually
freed.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: Fix debug output error in walk_addr()
Gui Jianfeng [Wed, 5 May 2010 01:58:33 +0000 (09:58 +0800)]
KVM: MMU: Fix debug output error in walk_addr()

Fix a debug output error in walk_addr

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: mark page table dirty when a pte is actually modified
Gui Jianfeng [Wed, 5 May 2010 01:09:21 +0000 (09:09 +0800)]
KVM: MMU: mark page table dirty when a pte is actually modified

Sometime cmpxchg_gpte doesn't modify gpte, in such case, don't mark
page table page as dirty.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Allow EFER.LMSLE to be set with nested svm
Joerg Roedel [Wed, 5 May 2010 14:04:44 +0000 (16:04 +0200)]
KVM: SVM: Allow EFER.LMSLE to be set with nested svm

This patch enables setting of efer bit 13 which is allowed
in all SVM capable processors. This is necessary for the
SLES11 version of Xen 4.0 to boot with nested svm.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Dump vmcb contents on failed vmrun
Joerg Roedel [Wed, 5 May 2010 14:04:42 +0000 (16:04 +0200)]
KVM: SVM: Dump vmcb contents on failed vmrun

This patch adds a function to dump the vmcb into the kernel
log and calls it after a failed vmrun to ease debugging.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Get rid of KVM_REQ_KICK
Avi Kivity [Mon, 3 May 2010 13:54:48 +0000 (16:54 +0300)]
KVM: Get rid of KVM_REQ_KICK

KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal
operation.  This causes the fast path check for a clear vcpu->requests
to fail all the time, triggering tons of atomic operations.

Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: do not inject exception directly into vcpu
Gleb Natapov [Wed, 28 Apr 2010 16:15:44 +0000 (19:15 +0300)]
KVM: x86 emulator: do not inject exception directly into vcpu

Return exception as a result of instruction emulation and handle
injection in KVM code.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: move interruptibility state tracking out of emulator
Gleb Natapov [Wed, 28 Apr 2010 16:15:43 +0000 (19:15 +0300)]
KVM: x86 emulator: move interruptibility state tracking out of emulator

Emulator shouldn't access vcpu directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: handle shadowed registers outside emulator
Gleb Natapov [Wed, 28 Apr 2010 16:15:42 +0000 (19:15 +0300)]
KVM: x86 emulator: handle shadowed registers outside emulator

Emulator shouldn't access vcpu directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: use shadowed register in emulate_sysexit()
Gleb Natapov [Wed, 28 Apr 2010 16:15:41 +0000 (19:15 +0300)]
KVM: x86 emulator: use shadowed register in emulate_sysexit()

emulate_sysexit() should use shadowed registers copy instead of
looking into vcpu state directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: set RFLAGS outside x86 emulator code
Gleb Natapov [Wed, 28 Apr 2010 16:15:40 +0000 (19:15 +0300)]
KVM: x86 emulator: set RFLAGS outside x86 emulator code

Removes the need for set_flags() callback.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: advance RIP outside x86 emulator code
Gleb Natapov [Wed, 28 Apr 2010 16:15:39 +0000 (19:15 +0300)]
KVM: x86 emulator: advance RIP outside x86 emulator code

Return new RIP as part of instruction emulation result instead of
updating KVM's RIP from x86 emulator code.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: handle emulation failure case first
Gleb Natapov [Wed, 28 Apr 2010 16:15:38 +0000 (19:15 +0300)]
KVM: handle emulation failure case first

If emulation failed return immediately.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: do not inject #PF in (read|write)_emulated() callbacks
Gleb Natapov [Wed, 28 Apr 2010 16:15:37 +0000 (19:15 +0300)]
KVM: do not inject #PF in (read|write)_emulated() callbacks

Return error to x86 emulator instead of injection exception behind its back.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: remove export of emulator_write_emulated()
Gleb Natapov [Wed, 28 Apr 2010 16:15:36 +0000 (19:15 +0300)]
KVM: remove export of emulator_write_emulated()

It is not called directly outside of the file it's defined in anymore.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: x86_emulate_insn() return -1 only in case of emulation failure
Gleb Natapov [Wed, 28 Apr 2010 16:15:35 +0000 (19:15 +0300)]
KVM: x86 emulator: x86_emulate_insn() return -1 only in case of emulation failure

Currently emulator returns -1 when emulation failed or IO is needed.
Caller tries to guess whether emulation failed by looking at other
variables. Make it easier for caller to recognise error condition by
always returning -1 in case of failure. For this new emulator
internal return value X86EMUL_IO_NEEDED is introduced. It is used to
distinguish between error condition (which returns X86EMUL_UNHANDLEABLE)
and condition that requires IO exit to userspace to continue emulation.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: fill in run->mmio details in (read|write)_emulated function
Gleb Natapov [Wed, 28 Apr 2010 16:15:34 +0000 (19:15 +0300)]
KVM: fill in run->mmio details in (read|write)_emulated function

Fill in run->mmio details in (read|write)_emulated function just like
pio does. There is no point in filling only vcpu fields there just to
copy them into vcpu->run a little bit later.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: fix X86EMUL_RETRY_INSTR and X86EMUL_CMPXCHG_FAILED values
Gleb Natapov [Wed, 28 Apr 2010 16:15:33 +0000 (19:15 +0300)]
KVM: x86 emulator: fix X86EMUL_RETRY_INSTR and X86EMUL_CMPXCHG_FAILED values

Currently X86EMUL_PROPAGATE_FAULT, X86EMUL_RETRY_INSTR and
X86EMUL_CMPXCHG_FAILED have the same value so caller cannot
distinguish why function such as emulator_cmpxchg_emulated()
(which can return both X86EMUL_PROPAGATE_FAULT and
X86EMUL_CMPXCHG_FAILED) failed.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: make (get|set)_dr() callback return error if it fails
Gleb Natapov [Wed, 28 Apr 2010 16:15:32 +0000 (19:15 +0300)]
KVM: x86 emulator: make (get|set)_dr() callback return error if it fails

Make (get|set)_dr() callback return error if it fails instead of
injecting exception behind emulator's back.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: make set_cr() callback return error if it fails
Gleb Natapov [Wed, 28 Apr 2010 16:15:31 +0000 (19:15 +0300)]
KVM: x86 emulator: make set_cr() callback return error if it fails

Make set_cr() callback return error if it fails instead of injecting #GP
behind emulator's back.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks
Gleb Natapov [Wed, 28 Apr 2010 16:15:30 +0000 (19:15 +0300)]
KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks

Use callbacks from x86_emulate_ops to access segments instead of calling
into kvm directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: add get_cached_segment_base() callback to x86_emulate_ops
Gleb Natapov [Wed, 28 Apr 2010 16:15:29 +0000 (19:15 +0300)]
KVM: x86 emulator: add get_cached_segment_base() callback to x86_emulate_ops

On VMX it is expensive to call get_cached_descriptor() just to get segment
base since multiple vmcs_reads are done instead of only one. Introduce
new call back get_cached_segment_base() for efficiency.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: add (set|get)_msr callbacks to x86_emulate_ops
Gleb Natapov [Wed, 28 Apr 2010 16:15:28 +0000 (19:15 +0300)]
KVM: x86 emulator: add (set|get)_msr callbacks to x86_emulate_ops

Add (set|get)_msr callbacks to x86_emulate_ops instead of calling
them directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: add (set|get)_dr callbacks to x86_emulate_ops
Gleb Natapov [Wed, 28 Apr 2010 16:15:27 +0000 (19:15 +0300)]
KVM: x86 emulator: add (set|get)_dr callbacks to x86_emulate_ops

Add (set|get)_dr callbacks to x86_emulate_ops instead of calling
them directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: handle "far address" source operand
Gleb Natapov [Wed, 28 Apr 2010 16:15:26 +0000 (19:15 +0300)]
KVM: x86 emulator: handle "far address" source operand

ljmp/lcall instruction operand contains address and segment.
It can be 10 bytes long. Currently we decode it as two different
operands. Fix it by introducing new kind of operand that can hold
entire far address.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: cleanup nop emulation
Gleb Natapov [Wed, 28 Apr 2010 16:15:25 +0000 (19:15 +0300)]
KVM: x86 emulator: cleanup nop emulation

Make it more explicit what we are checking for.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: cleanup xchg emulation
Gleb Natapov [Wed, 28 Apr 2010 16:15:24 +0000 (19:15 +0300)]
KVM: x86 emulator: cleanup xchg emulation

Dst operand is already initialized during decoding stage. No need to
reinitialize.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: fix Move r/m16 to segment register decoding
Gleb Natapov [Wed, 28 Apr 2010 16:15:23 +0000 (19:15 +0300)]
KVM: x86 emulator: fix Move r/m16 to segment register decoding

This instruction does not need generic decoding for its dst operand.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: introduce read cache
Gleb Natapov [Wed, 28 Apr 2010 16:15:22 +0000 (19:15 +0300)]
KVM: x86 emulator: introduce read cache

Introduce read cache which is needed for instruction that require more
then one exit to userspace. After returning from userspace the instruction
will be re-executed with cached read value.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Avoid writing HOST_CR0 every entry
Avi Kivity [Mon, 3 May 2010 13:05:44 +0000 (16:05 +0300)]
KVM: VMX: Avoid writing HOST_CR0 every entry

cr0.ts may change between entries, so we copy cr0 to HOST_CR0 before each
entry.  That is slow, so instead, set HOST_CR0 to have TS set unconditionally
(which is a safe value), and issue a clts() just before exiting vcpu context
if the task indeed owns the fpu.

Saves ~50 cycles/exit.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: kvm_pdptr_read() may sleep
Avi Kivity [Tue, 4 May 2010 10:00:55 +0000 (13:00 +0300)]
KVM: kvm_pdptr_read() may sleep

Annotate it thusly.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: avoid unnecessary bitmap allocation when memslot is clean
Takuya Yoshikawa [Wed, 28 Apr 2010 09:50:36 +0000 (18:50 +0900)]
KVM: x86: avoid unnecessary bitmap allocation when memslot is clean

Although we always allocate a new dirty bitmap in x86's get_dirty_log(),
it is only used as a zero-source of copy_to_user() and freed right after
that when memslot is clean. This patch uses clear_user() instead of doing
this unnecessary zero-source allocation.

Performance improvement: as we can expect easily, the time needed to
allocate a bitmap is completely reduced. In my test, the improved ioctl
was about 4 to 10 times faster than the original one for clean slots.
Furthermore, reducing memory allocations and copies will produce good
effects to caches too.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Simplify vmx_get_nmi_mask()
Avi Kivity [Tue, 4 May 2010 09:24:12 +0000 (12:24 +0300)]
KVM: VMX: Simplify vmx_get_nmi_mask()

!! is not needed due to the cast to bool.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Avoid killing userspace through guest SRAO MCE on unmapped pages
Huang Ying [Mon, 31 May 2010 06:28:19 +0000 (14:28 +0800)]
KVM: Avoid killing userspace through guest SRAO MCE on unmapped pages

In common cases, guest SRAO MCE will cause corresponding poisoned page
be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
the MCE to guest OS.

But it is reported that if the poisoned page is accessed in guest
after unmapping and before MCE is relayed to guest OS, userspace will
be killed.

The reason is as follows. Because poisoned page has been un-mapped,
guest access will cause guest exit and kvm_mmu_page_fault will be
called. kvm_mmu_page_fault can not get the poisoned page for fault
address, so kernel and user space MMIO processing is tried in turn. In
user MMIO processing, poisoned page is accessed again, then userspace
is killed by force_sig_info.

To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
and do not try kernel and user space MMIO processing for poisoned
page.

[xiao: fix warning introduced by avi]

Reported-by: Max Asbock <masbock@linux.vnet.ibm.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoMerge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel...
Linus Torvalds [Thu, 29 Jul 2010 03:01:26 +0000 (20:01 -0700)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  x86,kgdb: Fix hw breakpoint regression

13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
Linus Torvalds [Thu, 29 Jul 2010 03:00:42 +0000 (20:00 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] ibmvscsi: Fix oops when an interrupt is pending during probe
  [SCSI] zfcp: Update status read mempool
  [SCSI] zfcp: Do not wait for SBALs on stopped queue
  [SCSI] zfcp: Fix check whether unchained ct_els is possible
  [SCSI] ipr: fix resource path display and formatting