Catalin Marinas [Thu, 11 Nov 2010 22:05:10 +0000 (14:05 -0800)]
include/linux/highmem.h needs hardirq.h
Commit 3e4d3af501cc ("mm: stack based kmap_atomic()") introduced the
kmap_atomic_idx_push() function which warns on in_irq() with
CONFIG_DEBUG_HIGHMEM enabled. This patch includes linux/hardirq.h for
the in_irq definition.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Thu, 11 Nov 2010 22:05:08 +0000 (14:05 -0800)]
atomic: add atomic_inc_not_zero_hint()
Followup of perf tools session in Netfilter WorkShop 2010
In the network stack we make high usage of atomic_inc_not_zero() in
contexts we know the probable value of atomic before increment (2 for udp
sockets for example)
Using a special version of atomic_inc_not_zero() giving this hint can help
processor to use less bus transactions.
On x86 (MESI protocol) for example, this avoids entering Shared state,
because "lock cmpxchg" issues an RFO (Read For Ownership)
akpm: Adds a new include/linux/atomic.h. This means that new code should
henceforth include linux/atomic.h and not asm/atomic.h. The presence of
include/linux/atomic.h will in fact cause checkpatch.pl to warn about use
of asm/atomic.h. The new include/linux/atomic.h becomes the place where
arch-neutral atomic_t code should be placed.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: David Miller <davem@davemloft.net> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Thu, 11 Nov 2010 22:05:07 +0000 (14:05 -0800)]
rapidio: use resource_size()
The size calculation is done incorrectly here because it should include
both the start and end (end - start + 1). It's easiest to just use
resource_size() which does the right thing.
I was worried there was something non-standard going on because the
printk() subtracts "end - 1", but the rest of the file uses the normal
resource size calculations. This function is only called from
fsl_rio_setup() in arch/powerpc/sysdev/fsl_rio.c and the calculation
there is also:
Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: Alexandre Bounine <alexandre.bounine@idt.com> Acked-by: Li Yang <leoli@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drivers/macintosh/adb-iop.c: flags should be unsigned long
Fix these warnings:
drivers/macintosh/adb-iop.c: In function `adb_iop_complete':
drivers/macintosh/adb-iop.c:85: warning: comparison of distinct pointer types lacks a cast
drivers/macintosh/adb-iop.c:92: warning: comparison of distinct pointer types lacks a cast
drivers/macintosh/adb-iop.c: In function ¡adb_iop_listen¢:
drivers/macintosh/adb-iop.c:111: warning: comparison of distinct pointer types lacks a cast
drivers/macintosh/adb-iop.c:151: warning: comparison of distinct pointer types lacks a cast
Both commits 0a3d763f1a68 ("ptrace: cleanup arch_ptrace() on um") and 9b05a69e0534 ("ptrace: change signature of arch_ptrace()") broke the um
build. This patch fixes the issues.
0a3d763f1a68 introduced the undeclared variable "datavp". The patch seems
completely untested. :-(
9b05a69e0534 changed arch_ptrace()'s signature but did not update
um/include/asm/ptrace-generic.h.
Signed-off-by: Richard Weinberger <richard@nod.at> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Jeff Dike <jdike@addtoit.com> Tested-by: Will Newton <will.newton@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Software generated interrupts (SGI) are used for IPIs by the kernel.
While previous revisions of the GIC hardware were specified not to
implement enable bits for SGIs, more recent hardware is now permitted
to implement these bits in a per-CPU banked register.
The priority registers for the PPI and SGIs are also per-CPU banked
registers, so ensure that these are also appropriately initialized.
Reported-by: Scott Valentine <svalentine@concentris-systems.com> Acked-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Eric Paris [Fri, 12 Nov 2010 07:26:06 +0000 (08:26 +0100)]
netfilter: NF_HOOK_COND has wrong conditional
The NF_HOOK_COND returns 0 when it shouldn't due to what I believe to be an
error in the code as the order of operations is not what was intended. C will
evalutate == before =. Which means ret is getting set to the bool result,
rather than the return value of the function call. The code says
if (ret = function() == 1)
when it meant to say:
if ((ret = function()) == 1)
Normally the compiler would warn, but it doesn't notice it because its
a actually complex conditional and so the wrong code is wrapped in an explict
set of () [exactly what the compiler wants you to do if this was intentional].
Fixing this means that errors when netfilter denies a packet get propagated
back up the stack rather than lost.
Problem introduced by commit 2249065f (netfilter: get rid of the grossness
in netfilter.h).
Signed-off-by: Eric Paris <eparis@redhat.com> Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
Sonic Zhang [Wed, 27 Oct 2010 08:16:48 +0000 (04:16 -0400)]
serial: bfin_5xx: remove redundant SSYNC to improve TX speed
We don't need to force a SSYNC here as the LSR register will already
be updated by the time we get back to reading it. This speeds up TX
throughput and lowers general system overhead (since SSYNC is system
wide, not peripheral-specific).
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sonic Zhang [Wed, 27 Oct 2010 08:16:47 +0000 (04:16 -0400)]
serial: bfin_5xx: always include DMA headers
On Blackfin systems, peripherals that have optional DMA support always
route their interrupts through the corresponding DMA channel -- even
when DMA is not being used. So in PIO mode, we still need to request
the DMA channel (so interrupts are delivered) which means we need to
always include the DMA header for the DMA defines/functions.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Nicolas Pitre [Wed, 10 Nov 2010 06:33:12 +0000 (01:33 -0500)]
vcs: make proper usage of the poll flags
Kay Sievers pointed out that usage of POLLIN is well defined by POSIX,
and the current usage here doesn't follow that definition. So let's
duplicate the same semantics as implemented by sysfs_poll() instead.
Signed-off-by: Nicolas Pitre <nicolas.pitre@canonical.com> Acked-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Lawrence Rust [Wed, 27 Oct 2010 12:41:02 +0000 (14:41 +0200)]
8250: Fix tcsetattr to avoid ioctl(TIOCMIWAIT) hang
Calling tcsetattr prevents any thread(s) currently suspended in ioctl
TIOCMIWAIT for the same device from ever resuming.
If a thread is suspended inside a call to ioctl TIOCMIWAIT, waiting for
a modem status change, then the 8250 driver enables modem status
interrupts (MSI). The device interrupt service routine resumes the
suspended thread(s) on the next MSI.
If while the thread(s) are suspended, another thread calls tcsetattr
then the 8250 driver disables MSI (unless CTS/RTS handshaking is
enabled) thus preventing the suspended thread(s) from ever being
resumed.
This patch only disables MSI in tcsetattr handling if there are no
suspended threads.
Axel Lin [Tue, 9 Nov 2010 08:41:48 +0000 (08:41 +0000)]
hwmon: (gpio-fan) Fix fan_ctrl_init error path
In current implementation, the sysfs entries is not removed before return -ENODEV.
Creating the sysfs attribute should be the last thing done by the function,
after all the rest has been successful.
Otherwise there is a small window during which user-space can access the attribute
but the driver isn't ready to deal with the requests.
Fix it by moving sysfs_create_group to be the last thing done by the function.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Simon Guinot <sguinot@lacie.com> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Axel Lin [Tue, 9 Nov 2010 01:40:34 +0000 (20:40 -0500)]
hwmon: (ad7414) Return proper error code for ad7414_probe()
Return proper error if i2c_check_functionality reports
the adapter does not support the capability we need.
Also remove unneeded initialization for err variable.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Sean MacLennan <smaclennan@pikatech.com> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Jesper Juhl [Sun, 7 Nov 2010 21:04:43 +0000 (22:04 +0100)]
UWB: Return UWB_RSV_ALLOC_NOT_FOUND rather than crashing on NULL dereference if kzalloc fails
Crashing on a null pointer deref is never a nice thing to do. It seems
to me that it's better to simply return UWB_RSV_ALLOC_NOT_FOUND if
kzalloc() fails in uwb_rsv_find_best_allocation().
Vasiliy Kulikov [Sat, 6 Nov 2010 14:41:28 +0000 (17:41 +0300)]
usb: core: fix information leak to userland
Structure usbdevfs_connectinfo is copied to userland with padding byted
after "slow" field uninitialized. It leads to leaking of contents of
kernel stack memory.
Vasiliy Kulikov [Sat, 6 Nov 2010 14:41:31 +0000 (17:41 +0300)]
usb: misc: iowarrior: fix information leak to userland
Structure iowarrior_info is copied to userland with padding byted
between "serial" and "revision" fields uninitialized. It leads to
leaking of contents of kernel stack memory.
Jim Sung [Fri, 5 Nov 2010 01:47:51 +0000 (18:47 -0700)]
usb: subtle increased memory usage in u_serial
OK, the USB gadget serial driver actually has a couple of problems. On
gs_open(), it always allocates and queues an additional QUEUE_SIZE (16)
worth of requests, so with a loop like this:
i=1 ; while echo $i > /dev/ttyGS0 ; do let i++ ; done
eventually we run into OOM (Out of Memory).
Technically, it is not a leak as everything gets freed up when the USB
connection is broken, but not on gs_close().
With a USB device/gadget controller driver that has limited resources
(e.g., Marvell has a this MAX_XDS_FOR_TR_CALLS of 64 for transmit and
receive), so even after 4
stty -F /dev/ttyGS0
we cannot transmit anymore. We can still receive (not necessarily
reliably) as now we have 16 * 4 = 64 descriptors/buffers ready, but the
device is otherwise not usable.
Signed-off-by: Jim Sung <jsung@syncadence.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ma rui [Mon, 1 Nov 2010 03:32:18 +0000 (11:32 +0800)]
USB: option: fix when the driver is loaded incorrectly for some Huawei devices.
When huawei datacard with PID 0x14AC is insterted into Linux system, the
present kernel will load the "option" driver to all the interfaces. But
actually, some interfaces run as other function and do not need "option"
driver.
In this path, we modify the id_tables, when the PID is 0x14ac ,VID is
0x12d1, Only when the interface's Class is 0xff,Subclass is 0xff, Pro is
0xff, it does need "option" driver.
Andy Whitcroft [Wed, 3 Nov 2010 18:02:38 +0000 (18:02 +0000)]
usb: gadget: goku_udc: add registered flag bit, fixing build
The commit below cleaned up error handling, in part by introducing a
registered flag bit. This however was not added to the device
structure leding to build failures:
CC drivers/usb/host/ehci-hcd.o
In file included from drivers/usb/host/ehci-hcd.c:1166:
drivers/usb/host/ehci-mxc.c: In function 'ehci_mxc_drv_probe':
drivers/usb/host/ehci-mxc.c:192: error: 'ehci' undeclared (first use in this function)
drivers/usb/host/ehci-mxc.c:192: error: (Each undeclared identifier is reported only once
drivers/usb/host/ehci-mxc.c:192: error: for each function it appears in.)
drivers/usb/host/ehci-mxc.c:117: warning: unused variable 'temp'
make[3]: *** [drivers/usb/host/ehci-hcd.o] Error 1
make[2]: *** [drivers/usb/host/ehci-hcd.o] Error 2
make[1]: *** [sub-make] Error 2
make: *** [all] Error 2
Fix it together with the warning about the unused variable and use
msleep instead of mdelay as requested by Alan Stern.
USB: Fix FSL USB driver on non Open Firmware systems
Commit 126512e3f274802ca65ebeca8660237f0361ad48 added support for FSL's USB
controller on powerpc. In this commit the Open Firmware code was selected
and compiled unconditionally.
This breaks on ARM systems from FSL which use the same driver (.i.e. the i.MX
series), because ARM don't have OF support (yet). This patch fixes the problem
by only selecting the OF code on systems with Open Firmware support.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Compile-Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Staging: Merge 'tidspbridge-2.6.37-rc1' into staging-linus
This is a big revert of a lot of -rc1 tidspbridge patches in order to
get the driver back into a working state. It also includes a OMAP patch
that was approved by the OMAP maintainer.
Dmitry Torokhov [Tue, 9 Nov 2010 05:51:25 +0000 (21:51 -0800)]
Input: do not pass injected events back to the originating handler
Sometimes input handlers (as opposed to input devices) have a need to
inject (or re-inject) events back into input core. For example sysrq
filter may want to inject previously suppressed Alt-SysRq so that user
can take a screen print. In this case we do not want to pass such events
back to the same same handler that injected them to avoid loops.
Dan Carpenter [Thu, 11 Nov 2010 07:59:20 +0000 (23:59 -0800)]
Input: pcf8574_keypad - fix error handling in pcf8574_kp_probe
It is not allowed to call input_free_device() after calling
input_unregister_device() because input devices are refcounted and
unregister will free the device if we were holding he last referenc.
The preferred style in input/ is to make input_register_device() the
last function in the probe which can fail. That way we don't need to
call input_unregister_device().
Also do not need to call input_set_drvdata() as nothing in the driver
uses the data.
Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
David S. Miller [Thu, 11 Nov 2010 05:35:37 +0000 (21:35 -0800)]
tcp: Increase TCP_MAXSEG socket option minimum.
As noted by Steve Chen, since commit f5fff5dc8a7a3f395b0525c02ba92c95d42b7390 ("tcp: advertise MSS
requested by user") we can end up with a situation where
tcp_select_initial_window() does a divide by a zero (or
even negative) mss value.
The problem is that sometimes we effectively subtract
TCPOLEN_TSTAMP_ALIGNED and/or TCPOLEN_MD5SIG_ALIGNED from the mss.
Fix this by increasing the minimum from 8 to 64.
Reported-by: Steve Chen <schen@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Thu, 28 Oct 2010 18:32:29 +0000 (11:32 -0700)]
xen: do not release any memory under 1M in domain 0
We already deliberately setup a 1-1 P2M for the region up to 1M in
order to allow code which assumes this region is already mapped to
work without having to convert everything to ioremap.
Domain 0 should not return any apparently unused memory regions
(reserved or otherwise) in this region to Xen since the e820 may not
accurately reflect what the BIOS has stashed in this region.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Ian Campbell [Mon, 1 Nov 2010 16:30:09 +0000 (16:30 +0000)]
xen: events: do not unmask event channels on resume
The IRQ core code will take care of disabling and reenabling
interrupts over suspend resume automatically, therefore we do not need
to do this in the Xen event channel code.
The only exception is those event channels marked IRQF_NO_SUSPEND
which the IRQ core ignores. We must unmask these ourselves, taking
care to obey the current IRQ_DISABLED status. Failure check for
IRQ_DISABLED leads to enabling polled only event channels, such as
that associated with the pv spinlocks, which must never be enabled:
Felipe Contreras [Tue, 19 Oct 2010 07:37:24 +0000 (10:37 +0300)]
omap: dsp: remove shm from normal memory
Also, don't be picky about the location, which incidentally fixes the
build since MEMBLOCK_REAL_LIMIT is gone on 2.6.37.
arch/arm/plat-omap/devices.c: In function 'omap_dsp_reserve_sdram_memblock':
arch/arm/plat-omap/devices.c:287: error: 'MEMBLOCK_REAL_LIMIT'
undeclared (first use in this function)
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Omar Ramirez Luna <omar.ramirez@ti.com>
Stephane Eranian [Tue, 26 Oct 2010 14:08:01 +0000 (16:08 +0200)]
perf_events: Fix time tracking in samples
This patch corrects time tracking in samples. Without this patch
both time_enabled and time_running are bogus when user asks for
PERF_SAMPLE_READ.
One uses PERF_SAMPLE_READ to sample the values of other counters
in each sample. Because of multiplexing, it is necessary to know
both time_enabled, time_running to be able to scale counts correctly.
In this second version of the patch, we maintain a shadow
copy of ctx->time which allows us to compute ctx->time without
calling update_context_time() from NMI context. We avoid the
issue that update_context_time() must always be called with
ctx->lock held.
We do not keep shadow copies of the other event timings
because if the lead event is overflowing then it is active
and thus it's been scheduled in via event_sched_in() in
which case neither tstamp_stopped, tstamp_running can be modified.
This timing logic only applies to samples when PERF_SAMPLE_READ
is used.
Note that this patch does not address timing issues related
to sampling inheritance between tasks. This will be addressed
in a future patch.
With this patch, the libpfm4 example task_smpl now reports
correct counts (shown on 2.4GHz Core 2):
In commit 20cb52ebd1b5ca6fa8a5d9b6b1392292f5ca8a45, titled
"xfs: simplify xfs_vm_writepage" I added an assert that any !mapped and
uptodate buffers are not dirty. That asserts turns out to trigger a lot
when running fsx on filesystems with small block sizes. The reason for
that is that the assert is simply incorrect. !mapped and uptodate
just mean this buffer covers a hole, and whenever we do a set_page_dirty
we mark all blocks in the page dirty, no matter if they have data or
not. So remove the assert, and update the comment above the condition
to match reality.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
set_init_cxt() allocted sizeof(struct aa_task_cxt) bytes for cxt,
if register_security() failed, it will cause memory leak.
Signed-off-by: Zhitong Wang <zhitong.wangzt@alibaba-inc.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
policy->name is a substring of policy->hname, if prefix is not NULL, it will
allocted strlen(prefix) + strlen(name) + 3 bytes to policy->hname in policy_init().
use kzfree(ns->base.name) will casue memory leak if alloc_namespace() failed.
Signed-off-by: Zhitong Wang <zhitong.wangzt@alibaba-inc.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
Eric Dumazet [Tue, 9 Nov 2010 23:24:26 +0000 (23:24 +0000)]
net: avoid limits overflow
Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]
We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Robin Holt <holt@sgi.com> Reviewed-by: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Vasiliy Kulikov [Wed, 10 Nov 2010 20:09:10 +0000 (12:09 -0800)]
net: packet: fix information leak to userland
packet_getname_spkt() doesn't initialize all members of sa_data field of
sockaddr struct if strlen(dev->name) < 13. This structure is then copied
to userland. It leads to leaking of contents of kernel stack memory.
We have to fully fill sa_data with strncpy() instead of strlcpy().
The same with packet_getname(): it doesn't initialize sll_pkttype field of
sockaddr_ll. Set it to zero.
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
J. Bruce Fields [Wed, 3 Nov 2010 22:09:18 +0000 (18:09 -0400)]
locks: remove dead lease error-handling code
A minor oversight from f7347ce4ee7c65415f84be915c018473e7076f31,
"fasync: re-organize fasync entry insertion to allow it under a
spinlock": this cleanup-on-error was only needed to handle -ENOMEM. Now
that we're preallocating it's unneeded.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
David S. Miller [Wed, 10 Nov 2010 18:38:24 +0000 (10:38 -0800)]
filter: make sure filters dont read uninitialized memory
There is a possibility malicious users can get limited information about
uninitialized stack mem array. Even if sk_run_filter() result is bound
to packet length (0 .. 65535), we could imagine this can be used by
hostile user.
Initializing mem[] array, like Dan Rosenberg suggested in his patch is
expensive since most filters dont even use this array.
Its hard to make the filter validation in sk_chk_filter(), because of
the jumps. This might be done later.
In this patch, I use a bitmap (a single long var) so that only filters
using mem[] loads/stores pay the price of added security checks.
For other filters, additional cost is a single instruction.
[ Since we access fentry->k a lot now, cache it in a local variable
and mark filter entry pointer as const. -DaveM ]
Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vasiliy Kulikov [Wed, 10 Nov 2010 18:14:33 +0000 (10:14 -0800)]
net: ax25: fix information leak to userland
Sometimes ax25_getname() doesn't initialize all members of fsa_digipeater
field of fsa struct, also the struct has padding bytes between
sax25_call and sax25_ndigis fields. This structure is then copied to
userland. It leads to leaking of contents of kernel stack memory.
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
XFS does not need it's inodes to actuall be hashed in the VFS inode
cache, but we require the inode to be marked hashed for the
writeback code to work.
Insted of using insert_inode_hash, which requires a second
inode_lock roundtrip after the partial merge of the inode
scalability patches in 2.6.37-rc simply use the new hlist_add_fake
helper to mark it hashed without requiring a lock or touching a
global cache line.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
xfs: fix a few compiler warnings with CONFIG_XFS_QUOTA=n
Andi Kleen reported that gcc-4.5 gives lots of warnings for him
inside the XFS code. It turned out most of them are due to the
quota stubs beeing macros, and gcc now complaining about macros
evaluating to 0 that are not assigned to variables.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
xfs: tell lockdep about parent iolock usage in filestreams
The filestreams code may take the iolock on the parent inode while
holding it on a child. This is the only place in XFS where we take
both the child and parent iolock, so just telling lockdep about it
is enough. The lock flag required for that was already added as
part of the ilock lockdep annotations and unused so far.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Mon, 8 Nov 2010 08:55:05 +0000 (08:55 +0000)]
xfs: move delayed write buffer trace
The delayed write buffer split trace currently issues a trace for
every buffer it scans. These buffers are not necessarily queued for
delayed write. Indeed, when buffers are pinned, there can be
thousands of traces of buffers that aren't actually queued for
delayed write and the ones that are are lost in the noise. Move the
trace point to record only buffers that are split out for IO to be
issued on.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Mon, 8 Nov 2010 08:55:04 +0000 (08:55 +0000)]
xfs: fix per-ag reference counting in inode reclaim tree walking
The walk fails to decrement the per-ag reference count when the
non-blocking walk fails to obtain the per-ag reclaim lock, leading
to an assert failure on debug kernels when unmounting a filesystem.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
Kulikov Vasiliy [Sat, 30 Oct 2010 14:26:17 +0000 (14:26 +0000)]
xfs: xfs_ioctl: fix information leak to userland
al_hreq is copied from userland. If al_hreq.buflen is not properly aligned
then xfs_attr_list will ignore the last bytes of kbuf. These bytes are
unitialized. It leads to leaking of contents of kernel stack memory.
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Alex Elder <aelder@sgi.com>
Will Deacon [Wed, 10 Nov 2010 14:59:11 +0000 (15:59 +0100)]
ARM: 6472/1: vexpress ct-ca9x4: only set twd_base if local timers are being used
In commit bde28b84, I made the assumption that CONFIG_SMP is always set
for the quad-core ct-ca9x4 platform. As it turns out, people who aren't
using the SMP goodness are confronted with a build failure.
This patch fixes this issue by ensure that twd_base is only set if
local timers are being used (and therefore SMP support is configured).
Reported-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tom Zanussi [Wed, 10 Nov 2010 14:16:51 +0000 (08:16 -0600)]
perf trace: live-mode command-line cleanup
This patch attempts to make the perf trace command-line for live-mode
commands more user-friendly and consistent with other perf commands.
The main change it makes is to allow <commands> to be run as part of
perf trace live-mode commands, as other perf commands do, instead of
the system-wide traces they're currently hard-coded to by the shell
scripts.
With this patch, the following live-mode trace now works as expected:
$ perf trace rw-by-pid ls -al
The previous system-wide behavior for this command would still be
available by explicitly specifying -a:
$ perf trace rw-by-pid -a ls -al
and if no <command> is specified, the output is also system-wide:
$ perf trace rw-by-pid
Because live-mode requires both record and report steps to be invoked,
it isn't always possible to know which args to send to the report and
which to send to the record steps - mainly this is the case for report
scripts with optional args - in those cases it would be necessary to
use separate 'perf trace record' and 'perf trace report' steps.
For example:
$ perf trace syscall-counts ls
Here we can't decide whether ls should be passed as a param to the
syscall-counts script or whether we should invoke ls as a <command>.
In these cases, we just say that we'll ignore optional script params
and always interpret the extra arguments as a <command>.
If the user instead wants the other interpretation, that can be
accomplished by using separate record and report commands explicitly:
$ perf trace record syscall-counts
$ perf trace report syscall-counts ls
So the rules that this patch implements, which seem to make the most
intuitive sense for live-mode commands:
- for commands with optional args and commands with no args, no args
are sent to the report script, all are sent to the record step
- for 'top' commands i.e. that end with 'top', <commands> can't be
used - all extra args are send to the report script as params
- for commands with required args, the n required args are taken to be
the first n args after the script name and sent to the report
script, and the rest are sent to the record step
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Tom Zanussi [Wed, 10 Nov 2010 14:15:43 +0000 (08:15 -0600)]
perf trace record: handle commands correctly
Because the perf-trace shell scripts hard-coded the use of the
perf-record system-wide param, a perf trace record session was always
system wide, even if it was given a command.
If given a command, perf trace record now only records the events for
the command, as users expect.
If no command is given, or if the '-a' option is used, the recorded
events are system-wide, as before.
root@tropicana:~# perf trace record syscall-counts ls -al
root@tropicana:~# perf trace
ls-23152 [000] 39984.890387: sys_enter: NR 12 (0, 0, 0, 0, 0, 0)
ls-23152 [000] 39984.890404: sys_enter: NR 9 (0, 0, 0, 0, 0, 0)
root@tropicana:~# perf trace record syscall-counts -a ls -al
root@tropicana:~# perf trace
npviewer.bin-22297 [000] 39831.102709: sys_enter: NR 168 (0, 0, 0, 0, 0, 0)
ls-23111 [000] 39831.107679: sys_enter: NR 59 (0, 0, 0, 0, 0, 0)
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Tom Zanussi [Wed, 10 Nov 2010 14:08:20 +0000 (08:08 -0600)]
perf trace scripting: remove system-wide param from shell scripts
Including -a unconditionally when recording doesn't allow for the
option of running scripts without it. Future patches will add add it
back if needed at run-time.
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
REQ_HARDBARRIER is dead now, so remove the leftovers. What's left
at this point is:
- various checks inside the block layer.
- sanity checks in bio based drivers.
- now unused bio_empty_barrier helper.
- Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
but Xen really needs to sort out it's barrier situaton.
- setting of ordered tags in uas - dead code copied from old scsi
drivers.
- scsi different retry for barriers - it's dead and should have been
removed when flushes were converted to FS requests.
- blktrace handling of barriers - removed. Someone who knows blktrace
better should add support for REQ_FLUSH and REQ_FUA, though.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Tom Zanussi [Wed, 10 Nov 2010 13:52:32 +0000 (07:52 -0600)]
perf trace scripting: fix some small memory leaks and missing error checks
Free the other two fields of script_desc which somehow got overlooked,
free malloc'ed args in case exec fails, and add missing checks for
failed mallocs.
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
With 2.6.37-rc1, I observe sys_ioprio_set not taking the RCU lock [1]
across access to the task credentials.
Inspecting the code in fs/ioprio.c, the tasklist_lock is held for read
across the __task_cred call, which is presumably sufficient to prevent
the task credentials becoming stale.
Vasiliy Kulikov [Mon, 8 Nov 2010 13:42:40 +0000 (14:42 +0100)]
block: ioctl: fix information leak to userland
Structure hd_geometry is copied to userland with 4 padding bytes
between cylinders and start fields uninitialized on 64-bit platforms.
It leads to leaking of contents of kernel stack memory.
Currently there is no memset() in real implementations of getgeo()
in drivers/block/, so it makes sense to have memset() in blkdev_ioctl().
Mike Snitzer [Mon, 8 Nov 2010 13:39:12 +0000 (14:39 +0100)]
block: read i_size with i_size_read()
Convert direct reads of an inode's i_size to using i_size_read().
i_size_{read,write} use a seqcount to protect reads from accessing
incomple writes. Concurrent i_size_write()s require mutual exclussion
to protect the seqcount that is used by i_size_{read,write}. But
i_size_read() callers do not need to use additional locking.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: NeilBrown <neilb@suse.de> Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
which will happen if you don't actually have an HP CISS adapter,
since it'll do an uncondional removal of a proc directory it
never attempted to create in that case.
Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Tested-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Jens Axboe [Wed, 10 Nov 2010 13:36:25 +0000 (14:36 +0100)]
bio: take care not overflow page count when mapping/copying user data
If the iovec is being set up in a way that causes uaddr + PAGE_SIZE
to overflow, we could end up attempting to map a huge number of
pages. Check for this invalid input type.
Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>