bbs.cooldavid.org Git - net-next-2.6.git/log

md/raid5: don't include 'spare' drives when reshaping to fewer devices.

There are few situations where it would make any sense to add a spare
when reducing the number of devices in an array, but it is
conceivable:  A 6 drive RAID6 with two missing devices could be
reshaped to a 5 drive RAID6, and a spare could become available
just in time for the reshape, but not early enough to have been
recovered first.  'freezing' recovery can make this easy to
do without any races.

However doing such a thing is a bad idea.  md will not record the
partially-recovered state of the 'spare' and when the reshape
finished it will think that the spare is still spare.
Easiest way to avoid this confusion is to simply disallow it.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: add a missing 'continue' in a loop.

As the comment says, the tail of this loop only applies to devices
that are not fully in sync, so if In_sync was set, we should avoid
the rest of the loop.

This bug will hardly ever cause an actual problem. The worst it
can do is allow an array to be assembled that is dirty and degraded,
which is not generally a good idea (without warning the sysadmin
first).

This will only happen if the array is RAID4 or a RAID5/6 in an
intermediate state during a reshape and so has one drive that is
all 'parity' - no data - while some other device has failed.

This is certainly possible, but not at all common.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: Allow recovered part of partially recovered devices to be in-sync

During a recovery of reshape the early part of some devices might be
in-sync while the later parts are not.
We we know we are looking at an early part it is good to treat that
part as in-sync for stripe calculations.

This is particularly important for a reshape which suffers device
failure. Treating the data as in-sync can mean the difference between
data-safety and data-loss.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: More careful check for "has array failed".

When we are reshaping an array, the device failure combinations
that cause us to decide that the array as failed are more subtle.

In particular, any 'spare' will be fully in-sync in the section
of the array that has already been reshaped, thus failures that
affect only that section are less critical.

So encode this subtlety in a new function and call it as appropriate.

The case that showed this problem was a 4 drive RAID5 to 8 drive RAID6
conversion where the last two devices failed.
This resulted in:

good good good good incomplete good good failed failed

while converting a 5-drive RAID6 to 8 drive RAID5
The incomplete device causes the whole array to look bad,
bad as it was actually good for the section that had been
converted to 8-drives, all the data was actually safe.

Reported-by: Terry Morris <tbmorris@tbmorris.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: Don't update ->recovery_offset when reshaping an array to fewer devices.

When an array is reshaped to have fewer devices, the reshape proceeds
from the end of the devices to the beginning.

If a device happens to be non-In_sync (which is possible but rare)
we would normally update the ->recovery_offset as the reshape
progresses. However that would be wrong as the recover_offset records
that the early part of the device is in_sync, while in fact it would
only be the later part that is in_sync, and in any case the offset
number would be measured from the wrong end of the device.

Relatedly, if after a reshape a spare is discovered to not be
recoverred all the way to the end, not allow spare_active
to incorporate it in the array.

This becomes relevant in the following sample scenario:

A 4 drive RAID5 is converted to a 6 drive RAID6 in a combined
operation.
The RAID5->RAID6 conversion will cause a 5 drive to be included as a
spare, then the 5drive -> 6drive reshape will effectively rebuild that
spare as it progresses. The 6th drive is treated as in_sync the whole
time as there is never any case that we might consider reading from
it, but must not because there is no valid data.

If we interrupt this reshape part-way through and reverse it to return
to a 5-drive RAID6 (or event a 4-drive RAID5), we don't want to update
the recovery_offset - as that would be wrong - and we don't want to
include that spare as active in the 5-drive RAID6 when the reversed
reshape completed and it will be mostly out-of-sync still.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: avoid oops when number of devices is reduced then increased.

The entries in the stripe_cache maintained by raid5 are enlarged
when we increased the number of devices in the array, but not
shrunk when we reduce the number of devices.
So if entries are added after reducing the number of devices, we
much ensure to initialise the whole entry, not just the part that
is currently relevant. Otherwise if we enlarge the array again,
we will reference uninitialised values.

As grow_buffers/shrink_buffer now want to use a count that is stored
explicity in the raid_conf, they should get it from there rather than
being passed it as a parameter.

Signed-off-by: NeilBrown <neilb@suse.de>

md: enable raid4->raid0 takeover

Only level 5 with layout=PARITY_N can be taken over to raid0 now.
Lets allow level 4 either.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: clear layout after ->raid0 takeover

After takeover from raid5/10 -> raid0 mddev->layout is not cleared.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: fix raid10 takeover: use new_layout for setup_conf

Use mddev->new_layout in setup_conf.
Also use new_chunk, and don't set ->degraded in takeover(). That
gets set in run()

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: fix handling of array level takeover that re-arranges devices.

Most array level changes leave the list of devices largely unchanged,
possibly causing one at the end to become redundant.
However conversions between RAID0 and RAID10 need to renumber
all devices (except 0).

This renumbering is currently being done in the ->run method when the
new personality takes over. However this is too late as the common
code in md.c might already have invalidated some of the devices if
they had a ->raid_disk number that appeared to high.

Moving it into the ->takeover method is too early as the array is
still active at that time and wrong ->raid_disk numbers could cause
confusion.

So add a ->new_raid_disk field to mdk_rdev_s and use it to communicate
the new raid_disk number.
Now the common code knows exactly which devices need to be renumbered,
and which can be invalidated, and can do it all at a convenient time
when the array is suspend.
It can also update some symlinks in sysfs which previously were not be
updated correctly.

Reported-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: raid10: Fix null pointer dereference in fix_read_error()

Such NULL pointer dereference can occur when the driver was fixing the
read errors/bad blocks and the disk was physically removed
causing a system crash. This patch check if the
rcu_dereference() returns valid rdev before accessing it in fix_read_error().

Cc: stable@kernel.org
Signed-off-by: Prasanna S. Panchamukhi <prasanna.panchamukhi@riverbed.com>
Signed-off-by: Rob Becker <rbecker@riverbed.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Restore partition detection of newly created md arrays.

Commit  b821eaa572fd737faaf6928ba046e571526c36c6 broke partition
detection for md arrays.

The logic was almost right.  However if revalidate_disk is called
when the device is not yet open, bdev->bd_disk won't be set, so the
flush_disk() Call will not set bd_invalidated.

So when md_open is called we still need to ensure that
->bd_invalidated gets set.  This is easily done with a call to
check_disk_size_change in the place where the offending commit removed
check_disk_change.  At the important times, the size will have changed
from 0 to non-zero, so check_disk_size_change will set bd_invalidated.

Tested-by: Duncan <1i5t5.duncan@cox.net>
Reported-by: Duncan <1i5t5.duncan@cox.net>
Signed-off-by: NeilBrown <neilb@suse.de>

xfs: remove block number from inode lookup code

The block number comes from bulkstat based inode lookups to shortcut
the mapping calculations. We ar enot able to trust anything from
bulkstat, so drop the block number as well so that the correct
lookups and mappings are always done.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: rename XFS_IGET_BULKSTAT to XFS_IGET_UNTRUSTED

Inode numbers may come from somewhere external to the filesystem
(e.g. file handles, bulkstat information) and so are inherently
untrusted. Rename the flag we use for these lookups to make it
obvious we are doing a lookup of an untrusted inode number and need
to verify it completely before trying to read it from disk.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: validate untrusted inode numbers during lookup

When we decode a handle or do a bulkstat lookup, we are using an
inode number we cannot trust to be valid. If we are deleting inode
chunks from disk (default noikeep mode), then we cannot trust the on
disk inode buffer for any given inode number to correctly reflect
whether the inode has been unlinked as the di_mode nor the
generation number may have been updated on disk.

This is due to the fact that when we delete an inode chunk, we do
not write the clusters back to disk when they are removed - instead
we mark them stale to avoid them being written back potentially over
the top of something that has been subsequently allocated at that
location. The result is that we can have locations of disk that look
like they contain valid inodes but in reality do not. Hence we
cannot simply convert the inode number to a block number and read
the location from disk to determine if the inode is valid or not.

As a result, and XFS_IGET_BULKSTAT lookup needs to actually look the
inode up in the inode allocation btree to determine if the inode
number is valid or not.

It should be noted even on ikeep filesystems, there is the
possibility that blocks on disk may look like valid inode clusters.
e.g. if there are filesystem images hosted on the filesystem. Hence
even for ikeep filesystems we really need to validate that the inode
number is valid before issuing the inode buffer read.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

OMAP: hwmod: Fix the missing braces

As reported by Sergei, a couple of braces were missing after
the WARN removal patch.

[07/22] OMAP: hwmod: Replace WARN by pr_warning if clock lookup failed

https://patchwork.kernel.org/patch/100756/

Signed-off-by: Benoit Cousson <b-cousson@ti.com>
[paul@pwsan.com: fixed patch description per Anand's E-mail]
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: Anand Gadiyar <gadiyar@ti.com>

sched: silence PROVE_RCU in sched_fork()

Because cgroup_fork() is ran before sched_fork() [ from copy_process() ]
and the child's pid is not yet visible the child is pinned to its
cgroup. Therefore we can silence this warning.

A nicer solution would be moving cgroup_fork() to right after
dup_task_struct() and exclude PF_STARTING from task_subsys_state().

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

sky2: enable rx/tx in sky2_phy_reinit()

sky2_phy_reinit is called by the ethtool helpers sky2_set_settings,
sky2_nway_reset and sky2_set_pauseparam when netif_running.

However, at the end of sky2_phy_init GM_GP_CTRL has GM_GPCR_RX_ENA and
GM_GPCR_TX_ENA cleared. So, doing these commands causes the device to
stop working:

$ ethtool -r eth0
$ ethtool -A eth0 autoneg off

Fix this issue by enabling Rx/Tx after running sky2_phy_init in
sky2_phy_reinit.

Signed-off-by: Brandon Philips <bphilips@suse.de>
Tested-by: Brandon Philips <bphilips@suse.de>
Cc: stable@kernel.org
Tested-by: Mike McCormack <mikem@ring3k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: change mailing list address for CIFS

We're moving the mailing list to linux-cifs@vger.kernel.org.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>

cnic: Disable statistics initialization for eth clients that do not support statistics

Disable statistics initialization for eth clients that do not support
statistics. This prevents memory corruption on bnx2x hw.

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

net: add dependency on fw class module to qlcnic and netxen_nic

netxen_nic and qlcnic driver depends on firmware_class module.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

snmp: fix SNMP_ADD_STATS()

commit aa2ea0586d9d (tcp: fix outsegs stat for TSO segments) incorrectly
assumed SNMP_ADD_STATS() was used from BH context.

Fix this using mib[!in_softirq()] instead of mib[0]

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

idr: fix RCU lockdep splat in idr_get_next()

Convert to rcu_dereference_raw() given that many callers may have many
different locking models.

Located-by: Miles Lane <miles.lane@gmail.com>
Tested-by: Miles Lane <miles.lane@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

rcu: apply RCU protection to wake_affine()

The task_group() function returns a pointer that must be protected
by either RCU, the ->alloc_lock, or the cgroup lock (see the
rcu_dereference_check() in task_subsys_state(), which is invoked by
task_group()).  The wake_affine() function currently does none of these,
which means that a concurrent update would be within its rights to free
the structure returned by task_group().  Because wake_affine() uses this
structure only to compute load-balancing heuristics, there is no reason
to acquire either of the two locks.

Therefore, this commit introduces an RCU read-side critical section that
starts before the first call to task_group() and ends after the last use
of the "tg" pointer returned from task_group().  Thanks to Li Zefan for
pointing out the need to extend the RCU read-side critical section from
that proposed by the original patch.

Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

virtio-pci: disable msi at startup

virtio-pci resets the device at startup by writing to the status
register, but this does not clear the pci config space,
specifically msi enable status which affects register
layout.

This breaks things like kdump when they try to use e.g. virtio-blk.

Fix by forcing msi off at startup. Since pci.c already has
a routine to do this, we export and use it instead of duplicating code.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org

virtio: return ENOMEM on out of memory

add_buf returns ring size on out of memory,
this is not what devices expect.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # .34.x

xfs: always use iget in bulkstat

The non-coherent bulkstat versionsthat look directly at the inode
buffers causes various problems with performance optimizations that
make increased use of just logging inodes. This patch makes bulkstat
always use iget, which should be fast enough for normal use with the
radix-tree based inode cache introduced a while ago.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

xfs: prevent swapext from operating on write-only files

This patch prevents user "foo" from using the SWAPEXT ioctl to swap
a write-only file owned by user "bar" into a file owned by "foo" and
subsequently reading it. It does so by checking that the file
descriptors passed to the ioctl are also opened for reading.

Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

Input: wacom - fix serial number handling on Cintiq 21UX2

Cintiq 21UX2 added 8 more bits for the tool serial number and more
buttons for the expresskey. We did not enable them properly in the
last patch.

Signed-off-by: Ping Cheng <pingc@wacom.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

Input: fixup X86_MRST selects

Some of the recent X86_MRST additions make some "select"s
conditional on X86_MRST but missed some related kconfig symbols,
causing:

drivers/built-in.o: In function `ps2_end_command':
(.text+0x257ab2): undefined reference to `i8042_check_port_owner'
drivers/built-in.o: In function `ps2_end_command':
(.text+0x257ae1): undefined reference to `i8042_unlock_chip'
drivers/built-in.o: In function `ps2_begin_command':
(.text+0x257b40): undefined reference to `i8042_check_port_owner'
drivers/built-in.o: In function `ps2_begin_command':
(.text+0x257b6f): undefined reference to `i8042_lock_chip'

when SERIO_I8042=m, SERIO_LIBPS2=y, KEYBOARD_ATKBD=y.

We need to make i8042 dependant upon !X86_MRST and allow deselecting
atkbd on Moorestown even when !CONFIG_EMBEDDED.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

Merge commit 'v2.6.35-rc3' into for-linus

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6

NFSv4: Fix an embarassing typo in encode_attrs()

Apparently, we have never been able to set the atime correctly from the
NFSv4 client.

Reported-by: 小倉一夫 <ka-ogura@bd6.so-net.ne.jp>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org

NFSv4: Ensure that /proc/self/mountinfo displays the minor version number

Currently, we do not display the minor version mount parameter in the
/proc mount info.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org

NFSv4.1: Ensure that we initialise the session when following a referral

Put the code that is common to both the referral and ordinary mount cases
into a common helper routine.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Fix a re-entrancy bug in xs_tcp_read_calldir()

If the attempt to read the calldir fails, then instead of storing the read
bytes, we currently discard them. This leads to a garbage final result when
upon re-entry to the same routine, we read the remaining bytes.

Fixes the regression in bugzilla number 16213. Please see
https://bugzilla.kernel.org/show_bug.cgi?id=16213

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org

nfs4 use mandatory attribute file type in nfs4_get_root

S_ISDIR(fsinfo.fattr->mode) checks the file type rather than the mode bits,
so we should be checking for the NFS_ATTR_FATTR_TYPE fattr property.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org

hso: remove setting of low_latency flag

This patch removes the setting of the low_latency flag.
tty_flip_buffer_push() is occasionally being called in irq context, which
causes a hang if the low_latency flag is set.
Removing the low_latency flag only seems to impact the flush to ldisc,
which will now be put on a workqueue.

Signed-off-by: Filip Aben <f.aben@option.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

clocksource: sh_cmt: Fix up bogus shift value.

The previous CMT fixup accidentally copied in the TMU shift value, reset
this back to its original value while preserving the TMU fix.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>

udp: Fix bogus UFO packet generation

It has been reported that the new UFO software fallback path
fails under certain conditions with NFS. I tracked the problem
down to the generation of UFO packets that are smaller than the
MTU. The software fallback path simply discards these packets.

This patch fixes the problem by not generating such packets on
the UFO path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lasi82596: fix netdev_mc_count conversion

Fix commit 4cd24eaf0 (net: use netdev_mc_count and netdev_mc_empty when
appropriate)

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

NET: MIPSsim: Fix modpost warning.

$ make CONFIG_DEBUG_SECTION_MISMATCH=y
[...]
WARNING: drivers/net/built-in.o(.data+0x0): Section mismatch in reference from the variable mipsnet_driver to the function .init.text:mipsnet_probe()
The variable mipsnet_driver references
the function __init mipsnet_probe()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,
[...]

Fixed by making mipsnet_probe __devinit.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
drivers/net/mipsnet.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>

tracing: Fix undeclared ENOSYS in include/linux/tracepoint.h

The header file include/linux/tracepoint.h may be included without
include/linux/errno.h and then the compiler will fail on building for
undelcared ENOSYS. This patch fixes this problem via including <linux/errno.h>
to include/linux/tracepoint.h.

Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
LKML-Reference: <1277118549-622-1-git-send-email-wuzhangjin@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Merge branch 'fix/misc' into for-linus

ALSA: usb/endpoint, fix dangling pointer use

Stanse found that in snd_usb_parse_audio_endpoints, there is a
dangling pointer dereference. When snd_usb_parse_audio_format fails,
fp is freed, and continue invoked. On the next loop, there is
"fp && fp->altsetting == 1 && fp->channels == 1" test, but fp is set
from the last iteration (but is bogus) and thus ilegally dereferenced.

Set fp to NULL before "continue".

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Daniel Mack <daniel@caiaq.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

cfq: fix recursive call in cfq_blkiocg_update_completion_stats()

e98ef89b has a typo, causing cfq_blkiocg_update_completion_stats()
to call itself instead of blkiocg_update_completion_stats().

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

arch/sh/mm: Eliminate a double lock

The function begins and ends with a read_lock. The latter is changed to a
read_unlock.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@locked@
expression E1;
position p;
@@

read_lock(E1@p,...);

@r exists@
expression x <= locked.E1;
expression locked.E1;
expression E2;
identifier lock;
position locked.p,p1,p2;
@@

*lock@p1 (E1@p,...);
... when != E1
when != $x = E2\|&x$
*lock@p2 (E1,...);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

Merge branch 'fix/misc' into for-linus

Merge branch 'fix/asoc' into for-linus

x86: Fix rebooting on Dell Precision WorkStation T7400

Dell Precision WorkStation T7400 freezes on reboot unless
reboot=b is used.

Reference: https://qa.mandriva.com/show_bug.cgi?id=58017

Signed-off-by: Thomas Backlund <tmb@mandriva.org>
LKML-Reference: <4C1CC6E9.6000701@mandriva.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

hwmon: (k8temp) Bypass core swapping on single-core processors

Commit a2e066bba2aad6583e3ff648bf28339d6c9f0898 introduced core
swapping for CPU models 64 and later. I recently had a report about
a Sempron 3200+, model 95, for which this patch broke temperature
reading. It happens that this is a single-core processor, so the
effect of the swapping was to read a temperature value for a core
that didn't exist, leading to an incorrect value (-49 degrees C.)

Disabling core swapping on singe-core processors should fix this.

Additional comment from Andreas:

The BKDG says

  Thermal Sensor Core Select (ThermSenseCoreSel)-Bit 2. This bit
  selects the CPU whose temperature is reported in the CurTemp
  field. This bit only applies to dual core processors. For
  single core processors CPU0 Thermal Sensor is always selected.

k8temp_probe() correctly detected that SEL_CORE can't be used on single
core CPU. Thus k8temp did never update the temperature values stored
in temp[1][x] and -49 degrees was reported. For single core CPUs we
must use the values read into temp[0][x].

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Rick Moritz <rhavin@gmx.net>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: stable@kernel.org

hwmon: (i5k_amb) Fix sysfs attribute for lockdep

i5k_amb.ko uses dynamically allocated memory (by kmalloc) for
attributes passed to sysfs. So, sysfs_attr_init() should be called
for working happy with lockdep.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@kernel.org [2.6.34 only]

hwmon: (k10temp) Do not blacklist known working CPU models

When detecting AM2+ or AM3 socket with DDR2, only blacklist cores
which are known to exist in AM2+ format.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: stable@kernel.org

drm/i915: gen3 page flipping fixes

Gen3 chips have slightly different flip commands, and also contain a bit
that indicates whether a "flip pending" interrupt means the flip has
been queued or has been completed.

So implement support for the gen3 flip command, and make sure we use the
flip pending interrupt correctly depending on the value of ECOSKPD bit
0.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>

drm/i915: don't queue flips during a flip pending event

Hardware will set the flip pending ISR bit as soon as it receives the
flip instruction, and (supposedly) clear it once the flip completes
(e.g. at the next vblank). If we try to send down a flip instruction
while the ISR bit is set, the hardware can become very confused, and we
may never receive the corresponding flip pending interrupt, effectively
hanging the chip.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>

x86: Fix vsyscall on gcc 4.5 with -Os

This fixes the -Os breaks with gcc 4.5 bug. rdtsc_barrier needs to be
force inlined, otherwise user space will jump into kernel space and
kill init.

This also addresses http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44129
I believe.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <20100618210859.GA10913@basil.fritz.box>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org>

ath5k: initialize ah->ah_current_channel

ath5k assumes ah_current_channel is always a valid pointer in
several places, but a newly created interface may not have a
channel.  To avoid null pointer dereferences, set it up to point
to the first available channel until later reconfigured.

This fixes the following oops:
$ rmmod ath5k
$ insmod ath5k
$ iw phy0 set distance 11000

BUG: unable to handle kernel NULL pointer dereference at 00000006
IP: [<d0a1ff24>] ath5k_hw_set_coverage_class+0x74/0x1b0 [ath5k]
*pde = 00000000
Oops: 0000 [#1]
last sysfs file: /sys/devices/pci0000:00/0000:00:0e.0/ieee80211/phy0/index
Modules linked in: usbhid option usb_storage usbserial usblp evdev lm90
scx200_acb i2c_algo_bit i2c_dev i2c_core via_rhine ohci_hcd ne2k_pci
8390 leds_alix2 xt_IMQ imq nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_cc

Pid: 1597, comm: iw Not tainted (2.6.32.14 #8)
EIP: 0060:[<d0a1ff24>] EFLAGS: 00010296 CPU: 0
EIP is at ath5k_hw_set_coverage_class+0x74/0x1b0 [ath5k]
EAX: 000000c2 EBX: 00000000 ECX: ffffffff EDX: c12d2080
ESI: 00000019 EDI: cf8c0000 EBP: d0a30edc ESP: cfa09bf4
  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process iw (pid: 1597, ti=cfa09000 task=cf88a000 task.ti=cfa09000)
Stack:
  d0a34f35 d0a353f8 d0a30edc 000000fe cf8c0000 00000000 1900063d cfa8c9e0
<0> cfa8c9e8 cfa8c0c0 cfa8c000 d0a27f0c 199d84b4 cfa8c200 00000010 d09bfdc7
<0> 00000000 00000000 ffffffff d08e0d28 cf9263c0 00000001 cfa09cc4 00000000
Call Trace:
  [<d0a27f0c>] ? ath5k_hw_attach+0xc8c/0x3c10 [ath5k]
  [<d09bfdc7>] ? __ieee80211_request_smps+0x1347/0x1580 [mac80211]
  [<d08e0d28>] ? nl80211_send_scan_start+0x7b8/0x4520 [cfg80211]
  [<c10f5db9>] ? nla_parse+0x59/0xc0
  [<c11ca8d9>] ? genl_rcv_msg+0x169/0x1a0
  [<c11ca770>] ? genl_rcv_msg+0x0/0x1a0
  [<c11c7e68>] ? netlink_rcv_skb+0x38/0x90
  [<c11c9649>] ? genl_rcv+0x19/0x30
  [<c11c7c03>] ? netlink_unicast+0x1b3/0x220
  [<c11c893e>] ? netlink_sendmsg+0x26e/0x290
  [<c11a409e>] ? sock_sendmsg+0xbe/0xf0
  [<c1032780>] ? autoremove_wake_function+0x0/0x50
  [<c104d846>] ? __alloc_pages_nodemask+0x106/0x530
  [<c1074933>] ? do_lookup+0x53/0x1b0
  [<c10766f9>] ? __link_path_walk+0x9b9/0x9e0
  [<c11acab0>] ? verify_iovec+0x50/0x90
  [<c11a42b1>] ? sys_sendmsg+0x1e1/0x270
  [<c1048e50>] ? find_get_page+0x10/0x50
  [<c104a96f>] ? filemap_fault+0x5f/0x370
  [<c1059159>] ? __do_fault+0x319/0x370
  [<c11a55b4>] ? sys_socketcall+0x244/0x290
  [<c101962c>] ? do_page_fault+0x1ec/0x270
  [<c1019440>] ? do_page_fault+0x0/0x270
  [<c1002ae5>] ? syscall_call+0x7/0xb
Code: 00 b8 fe 00 00 00 b9 f8 53 a3 d0 89 5c 24 14 89 7c 24 10 89 44 24
0c 89 6c 24 08 89 4c 24 04 c7 04 24 35 4f a3 d0 e8 7c 30 60 f0 <0f> b7
43 06 ba 06 00 00 00 a8 10 75 0e 83 e0 20 83 f8 01 19 d2
EIP: [<d0a1ff24>] ath5k_hw_set_coverage_class+0x74/0x1b0 [ath5k] SS:ESP
0068:cfa09bf4
CR2: 0000000000000006
---[ end trace 54f73d6b10ceb87b ]---

Cc: stable@kernel.org
Reported-by: Steve Brown <sbrown@cortland.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

cfq-iosched: Fixed boot warning with BLK_CGROUP=y and CFQ_GROUP_IOSCHED=n

Hi Jens,

Few days back Ingo noticed a CFQ boot time warning. This patch fixes it.
The issue here is that with CFQ_GROUP_IOSCHED=n, CFQ should not really
be making blkio stat related calls.

> Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e. With
> some
> configs i get bad spinlock warnings during bootup:
>
> [   28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750
> usecs
> [   28.972003] calling  b44_init+0x0/0x55 @ 1
> [   28.976009] bus: 'pci': add driver b44
> [   28.976374]  sda:
> [   28.978157] BUG: spinlock bad magic on CPU#1, async/0/117
> [   28.980000]  lock: 7e1c5bbc, .magic: 00000000, .owner: <none>/-1, +.owner_cpu: 0
> [   28.980000] Pid: 117, comm: async/0 Not tainted +2.6.35-rc2-tip-01092-g010e7ef-dirty #8183
> [   28.980000] Call Trace:
> [   28.980000]  [<41ba6d55>] ? printk+0x20/0x24
> [   28.980000]  [<4134b7b7>] spin_bug+0x7c/0x87
> [   28.980000]  [<4134b853>] do_raw_spin_lock+0x1e/0x123
> [   28.980000]  [<41ba92ca>] ? _raw_spin_lock_irqsave+0x12/0x20
> [   28.980000]  [<41ba92d2>] _raw_spin_lock_irqsave+0x1a/0x20
> [   28.980000]  [<4133476f>] blkiocg_update_io_add_stats+0x25/0xfb
> [   28.980000]  [<41335dae>] ? cfq_prio_tree_add+0xb1/0xc1
> [   28.980000]  [<41337bc7>] cfq_insert_request+0x8c/0x425

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

PCI/PM: Do not use native PCIe PME by default

Commit c7f486567c1d0acd2e4166c47069835b9f75e77b
(PCI PM: PCIe PME root port service driver) causes the native PCIe
PME signaling to be used by default, if the BIOS allows the kernel to
control the standard configuration registers of PCIe root ports.
However, the native PCIe PME is coupled to the native PCIe hotplug
and calling pcie_pme_acpi_setup() makes some BIOSes expect that
the native PCIe hotplug will be used as well. That, in turn, causes
problems to appear on systems where the PCIe hotplug driver is not
loaded. The usual symptom, as reported by Jaroslav Kameník and
others, is that the ACPI GPE associated with PCIe hotplug keeps
firing continuously causing kacpid to take substantial percentage
of CPU time.

To work around this issue, change the default so that the native
PCIe PME signaling is only used if directly requested with the help
of the pcie_pme= command line switch.

Fixes https://bugzilla.kernel.org/show_bug.cgi?id=15924 , which is
a listed regression from 2.6.33.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-by: Jaroslav Kameník <jaroslav@kamenik.cz>
Tested-by: Antoni Grzymala <antekgrzymala@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

percpu: fix first chunk match in per_cpu_ptr_to_phys()

per_cpu_ptr_to_phys() determines whether the passed in @addr belongs
to the first_chunk or not by just matching the address against the
address range of the base unit (unit0, used by cpu0). When an adress
from another cpu was passed in, it will always determine that the
address doesn't belong to the first chunk even when it does. This
makes the function return a bogus physical address which may lead to
crash.

This problem was discovered by Cliff Wickman while investigating a
crash during kdump on a SGI UV system.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Cliff Wickman <cpw@sgi.com>
Tested-by: Cliff Wickman <cpw@sgi.com>
Cc: stable@kernel.org

kbuild: Clean up and speed up the localversion logic

Now that we run scripts/setlocalversion during every build, it makes
sense to move all the localversion logic there. This cleans up the
toplevel Makefile and also makes sure that the script is called only
once in 'make prepare' (previously, it would be called every time due to
a variable expansion in an ifneq statement). No user-visible change is
intended, unless one runs the setlocalversion script directly.

Reported-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Nico Schottelius <nico-linuxsetlocalversion@schottelius.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>

sched: Fix over-scheduling bug

Commit e70971591 ("sched: Optimize unused cgroup configuration") introduced
an imbalanced scheduling bug.

If we do not use CGROUP, function update_h_load won't update h_load. When the
system has a large number of tasks far more than logical CPU number, the
incorrect cfs_rq[cpu]->h_load value will cause load_balance() to pull too
many tasks to the local CPU from the busiest CPU. So the busiest CPU keeps
going in a round robin. That will hurt performance.

The issue was found originally by a scientific calculation workload that
developed by Yanmin. With that commit, the workload performance drops
about 40%.

CPU  before    after

00   : 2       : 7
01   : 1       : 7
02   : 11      : 6
03   : 12      : 7
04   : 6       : 6
05   : 11      : 7
06   : 10      : 6
07   : 12      : 7
08   : 11      : 6
09   : 12      : 6
10   : 1       : 6
11   : 1       : 6
12   : 6       : 6
13   : 2       : 6
14   : 2       : 6
15   : 1       : 6

Reviewed-by: Yanmin zhang <yanmin.zhang@intel.com>
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1276754893.9452.5442.camel@debian>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

bridge: fdb cleanup runs too often

It is common in end-node, non STP bridges to set forwarding
delay to zero; which causes the forwarding database cleanup
to run every clock tick. Change to run only as soon as needed
or at next ageing timer interval which ever is sooner.

Use round_jiffies_up macro rather than attempting round up
by changing value.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cfq: Don't allow queue merges for queues that have no process references

Hi,

A user reported a kernel bug when running a particular program that did
the following:

created 32 threads
- each thread took a mutex, grabbed a global offset, added a buffer size
  to that offset, released the lock
- read from the given offset in the file
- created a new thread to do the same
- exited

The result is that cfq's close cooperator logic would trigger, as the
threads were issuing I/O within the mean seek distance of one another.
This workload managed to routinely trigger a use after free bug when
walking the list of merge candidates for a particular cfqq
(cfqq->new_cfqq).  The logic used for merging queues looks like this:

static void cfq_setup_merge(struct cfq_queue *cfqq, struct cfq_queue *new_cfqq)
{
int process_refs, new_process_refs;
struct cfq_queue *__cfqq;

/* Avoid a circular list and skip interim queue merges */
while ((__cfqq = new_cfqq->new_cfqq)) {
if (__cfqq == cfqq)
return;
new_cfqq = __cfqq;
}

process_refs = cfqq_process_refs(cfqq);
/*
* If the process for the cfqq has gone away, there is no
* sense in merging the queues.
*/
if (process_refs == 0)
return;

/*
* Merge in the direction of the lesser amount of work.
*/
new_process_refs = cfqq_process_refs(new_cfqq);
if (new_process_refs >= process_refs) {
cfqq->new_cfqq = new_cfqq;
atomic_add(process_refs, &new_cfqq->ref);
} else {
new_cfqq->new_cfqq = cfqq;
atomic_add(new_process_refs, &cfqq->ref);
}
}

When a merge candidate is found, we add the process references for the
queue with less references to the queue with more.  The actual merging
of queues happens when a new request is issued for a given cfqq.  In the
case of the test program, it only does a single pread call to read in
1MB, so the actual merge never happens.

Normally, this is fine, as when the queue exits, we simply drop the
references we took on the other cfqqs in the merge chain:

/*
* If this queue was scheduled to merge with another queue, be
* sure to drop the reference taken on that queue (and others in
* the merge chain).  See cfq_setup_merge and cfq_merge_cfqqs.
*/
__cfqq = cfqq->new_cfqq;
while (__cfqq) {
if (__cfqq == cfqq) {
WARN(1, "cfqq->new_cfqq loop detected\n");
break;
}
next = __cfqq->new_cfqq;
cfq_put_queue(__cfqq);
__cfqq = next;
}

However, there is a hole in this logic.  Consider the following (and
keep in mind that each I/O keeps a reference to the cfqq):

q1->new_cfqq = q2   // q2 now has 2 process references
q3->new_cfqq = q2   // q2 now has 3 process references

// the process associated with q2 exits
// q2 now has 2 process references

// queue 1 exits, drops its reference on q2
// q2 now has 1 process reference

// q3 exits, so has 0 process references, and hence drops its references
// to q2, which leaves q2 also with 0 process references

q4 comes along and wants to merge with q3

q3->new_cfqq still points at q2!  We follow that link and end up at an
already freed cfqq.

So, the fix is to not follow a merge chain if the top-most queue does
not have a process reference, otherwise any queue in the chain could be
already freed.  I also changed the logic to disallow merging with a
queue that does not have any process references.  Previously, we did
this check for one of the merge candidates, but not the other.  That
doesn't really make sense.

Without the attached patch, my system would BUG within a couple of
seconds of running the reproducer program.  With the patch applied, my
system ran the program for over an hour without issues.

This addresses the following bugzilla:
    https://bugzilla.kernel.org/show_bug.cgi?id=16217

Thanks a ton to Phil Carns for providing the bug report and an excellent
reproducer.

[ Note for stable: this applies to 2.6.32/33/34 ].

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Phil Carns <carns@mcs.anl.gov>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

nohz: Fix nohz ratelimit

Chris Wedgwood reports that 39c0cbe (sched: Rate-limit nohz) causes a
serial console regression, unresponsiveness, and indeed it does. The
reason is that the nohz code is skipped even when the tick was already
stopped before the nohz_ratelimit(cpu) condition changed.

Move the nohz_ratelimit() check to the other conditions which prevent
long idle sleeps.

Reported-by: Chris Wedgwood <cw@f00f.org>
Tested-by: Brian Bloniarz <bmb@athenacr.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg KH <gregkh@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Jef Driesen <jefdriesen@telenet.be>
LKML-Reference: <1276790557.27822.516.camel@twins>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

perf record: prevent kill(0, SIGTERM);

At exit, perf record will kill the process it was profiling by sending a
SIGTERM to child_pid (if it had been initialised), but in certain situations
child_pid may be 0 and perf would mistakenly kill more processes than intended.

child_pid is set to the return of fork() to either 0 or the pid of the child.
Ordinarily this would not present an issue as the child calls execvp to spawn
the process to be profiled and would therefore never run it's sig_atexit and
never attempt to kill pid 0.

However, if a nonexistant binary had been passed in to perf record the call to
execvp would fail and child_pid would be left set to 0. The child would then
exit and it's atexit handler, finding that child_pid was initialised to 0,
would call kill(0, SIGTERM), resulting in every process within it's process
group being killed.

In the case that perf was being run directly from the shell this typically
would not be an issue as the shell isolates the process. However, if perf was
being called from another program it could kill unexpected processes, which may
even include X.

This patch changes the logic of the test for whether child_pid was initialised
to only consider positive pids as valid, thereby never attempting to kill pid
0.

Cc: David S. Miller <davem@davemloft.net>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <1276072680-17378-1-git-send-email-imunsie@au1.ibm.com>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Merge branch 'bugzilla-15951' into release

ACPI / PM: Do not enable GPEs for system wakeup in advance

After commit 9630bdd9b15d2f489c646d8bc04b60e53eb5ec78
(ACPI: Use GPE reference counting to support shared GPEs) the wakeup
enable mask bits of GPEs are set as soon as the GPEs are enabled to
wake up the system.  Unfortunately, this leads to a regression
reported by Michal Hocko, where a system is woken up from ACPI S5 by
a device that is not supposed to do that, because the wakeup enable
mask bit of this device's GPE is always set when
acpi_enter_sleep_state() calls acpi_hw_enable_all_wakeup_gpes(),
although it should only be set if the device is supposed to wake up
the system from the target state.

To work around this issue, rework the ACPI power management code so
that GPEs are not enabled to wake up the system upfront, but only
during a system state transition when the target state of the system
is known.  [Of course, this means that the reference counting of
"wakeup" GPEs doesn't really make sense and it is sufficient to
set/unset the wakeup mask bits for them during system sleep
transitions.  This will allow us to simplify the GPE handling code
quite a bit, but that change is too intrusive for 2.6.35.]

Fixes https://bugzilla.kernel.org/show_bug.cgi?id=15951

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-and-tested-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Len Brown <len.brown@intel.com>

bnx2: fix dma_get_ops compilation breakage

This removes dma_get_ops() prefetch optimization in bnx2.

bnx2 uses dma_get_ops() to see if dma_sync_single_for_cpu() is
noop. bnx2 does prefetch if it's noop.

But dma_get_ops() isn't available on all the architectures (only the
architectures that uses dma_map_ops struct have it). Using
dma_get_ops() in drivers leads to compilation breakage on many
architectures.

This patch removes dma_get_ops() and changes bnx2 to do prefetch on
all the architectures. This adds useless prefetch on non-coherent
architectures but this is harmless. It is also unlikely to cause the
performance drop.

[ Remove now unused local variable 'pdev' -DaveM ]

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

perf session: Remove threads from tree on PERF_RECORD_EXIT

Move them to a session->dead_threads list just like we do with maps that
are replaced, because we may have hist_entries pointing to them.

This fixes a bug when inserting maps for a new thread that reused the
TID, mixing maps for two different threads, causing an endless loop.

The code for insering maps should be made more robust but for .35 this
is the minimalistic patch.

Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

watchdog: at32ap700x_wdt: register misc device last in probe() function

This patch reworks the probe() function in the at32ap700x_wdt driver, this to
make sure the miscdev is properly initialized and the driver is ready to be
accessed.

Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Signed-off-by: Wim Van sebroeck <wim@iguana.be>

block: fix DISCARD_BARRIER requests

Filesystems assume that DISCARD_BARRIER are full barriers, so that they
don't have to track in-progress discard operation when submitting new I/O.
But currently we only treat them as elevator barriers, which don't
actually do the nessecary queue drains.

Also remove the unlikely around both the DISCARD and BARRIER requests -
the happen far too often for a static mispredict.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

percpu: fix trivial bugs in pcpu_build_alloc_info()

Fix the following two trivial bugs in pcpu_build_alloc_info()

* we should memset group_cnt to 0 by size of group_cnt, not size of
group_map (both are of the same size, so the bug isn't dangerous)

* we can delete useless variable group_cnt_max.

Signed-off-by: Pavel V. Panteleev <pp_84@mail.ru>
Signed-off-by: Tejun Heo <tj@kernel.org>

ALSA: asihpi - Get rid of incorrect "long" types and casts.

These give incorrect results for index wrap on 64 bit.

Signed-off-by: Eliot Blennerhassett <eblennerhassett@audioscience.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ucc_geth: fix for RX skb buffers recycling

This patch implements a proper modification of RX skb buffers before
recycling. Adjusting only skb->data is not enough because after that
skb->tail and skb->len become incorrect.

Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pcnet_cs: add new id (TOSHIBA Modem/LAN Card)

pcnet_cs:
serial_cs:
add new id (TOSHIBA Modem/LAN Card)

Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Fix oversized packets handling

Issuing the following command on host:

$ ifconfig eth2 mtu 1600 ; ping 10.0.0.27 -s 1485 -c 1

Makes some boards (tested with MPC8315 rev 1.1 and MPC8313 rev 1.0)
oops like this:

  skb_over_panic: text:c0195914 len:1537 put:1537 head:c79e4800 data:c79e4880 tail:0xc79e4e81 end:0xc79e4e80 dev:eth1
  ------------[ cut here ]------------
  kernel BUG at net/core/skbuff.c:127!
  Oops: Exception in kernel mode, sig: 5 [#1]
  MPC831x RDB
  last sysfs file: /sys/kernel/uevent_seqnum
  Modules linked in:
  NIP: c01c1840 LR: c01c1840 CTR: c016d918
  [...]
  NIP [c01c1840] skb_over_panic+0x48/0x5c
  LR [c01c1840] skb_over_panic+0x48/0x5c
  Call Trace:
  [c0339d50] [c01c1840] skb_over_panic+0x48/0x5c (unreliable)
  [c0339d60] [c01c3020] skb_put+0x5c/0x60
  [c0339d70] [c0195914] gfar_clean_rx_ring+0x25c/0x3d0
  [c0339dc0] [c01976e8] gfar_poll+0x170/0x1bc

Dumped buffer descriptors showed that eTSEC's length/truncation
logic sometimes passes oversized packets, i.e. for the above ICMP
packet the following two buffer descriptors may become ready:

  status=1400 length=1536
  status=1800 length=1541

So, it seems that gianfar actually receives the whole big frame,
and it tries to place the packet into two BDs. This situation
confuses the driver, and so the skb_put() sanity check fails.

This patch fixes the issue by adding an appropriate check, i.e.
the driver should not try to process frames with buffer
descriptor's length over rx_buffer_size (i.e. maxfrm and mrblr).

Note that sometimes eTSEC works correctly, i.e. in the second
(last) buffer descriptor bits 'truncated' and 'crcerr' are set,
and so there's no oops. Though I couldn't find any logic when
it works correctly and when not.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Fix kernel deadlock in DLPAR-mem processing

Port reset operations and memory add/remove operations need to
be serialized to avoid a kernel deadlock. The deadlock is caused
by calling the napi_disable() function twice.
Therefore we have to employ the dlpar_mem_lock in the ehea_reset_port
function as well

Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: fix delayed packet processing

In the eHEA poll function an rmb() is required. Without that some packets
on the receive queue are not seen and thus delayed until the next interrupt
is handled for the same receive queue.

Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: add comment on SFP+ ID for Active DA

These comments were forgotten in the initial patch to add this
functionality. This patch corrects that.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Clear IFF_XMIT_DST_RELEASE for teql interfaces

https://bugzilla.kernel.org/show_bug.cgi?id=16183

The sch_teql module, which can be used to load balance over a set of
underlying interfaces, stopped working after 2.6.30 and has been
broken in all kernels since then for any underlying interface which
requires the addition of link level headers.

The problem is that the transmit routine relies on being able to
access the destination address in the skb in order to do address
resolution once it has decided which underlying interface it is going
to transmit through.

In 2.6.31 the IFF_XMIT_DST_RELEASE flag was introduced, and set by
default for all interfaces, which causes the destination address to be
released before the transmit routine for the interface is called.

The solution is to clear that flag for teql interfaces.

Signed-off-by: Tom Hughes <tom@compton.nu>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Fix setup of RX time stamping

Previously the RCTRL_TS_ENABLE bit was set unconditionally. However, if
the RCTRL_TS_ENABLE is set without TMR_CTRL[TE], the driver does not work
properly on some boards (Anton had problems with the MPC8313ERDB and
MPC8568EMDS).

With this patch the bit will only be set if requested from user space
with the SIOCSHWTSTAMP ioctl command, meaning that time stamping is
disabled during normal operation. Users who are not interested in time
stamps will not experience problems with buggy CPU revisions or
performance drops any more.

The setting of TMR_CTRL[TE] is still up to the user. This is considered
safe because users wanting HW timestamps must initialize the eTSEC clock
first anyway, e.g. with the recently submitted PTP clock driver.

Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at>
Reviewed-by: Anton Vorontsov <cbouatmailru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6

mac80211: fix warn, enum may be used uninitialized

regression introduced by b8d92c9c141ee3dc9b3537b1f0ffb4a54ea8d9b2

In function ‘ieee80211_work_rx_queued_mgmt’:
warning: ‘rma’ may be used uninitialized in this function

this re-adds default value WORK_ACT_NONE back to rma

Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

cifs: remove bogus first_time check in NTLMv2 session setup code

This bug appears to be the result of a cut-and-paste mistake from the
NTLMv1 code. The function to generate the MAC key was commented out, but
not the conditional above it. The conditional then ended up causing the
session setup key not to be copied to the buffer unless this was the
first session on the socket, and that made all but the first NTLMv2
session setup fail.

Fix this by removing the conditional and all of the commented clutter
that made it difficult to see.

Cc: Stable <stable@kernel.org>
Reported-by: Gunther Deschner <gdeschne@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>

cifs: don't call cifs_new_fileinfo unless cifs_open succeeds

It's currently possible for cifs_open to fail after it has already
called cifs_new_fileinfo. In that situation, the new fileinfo will be
leaked as the caller doesn't call fput. That in turn leads to a busy
inodes after umount problem since the fileinfo holds an extra inode
reference now. Shuffle cifs_open around a bit so that it only calls
cifs_new_fileinfo if it's going to succeed.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>

cifs: don't ignore cifs_posix_open_inode_helper return value

...and ensure that we propagate the error back to avoid any surprises.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com>

cifs: clean up arguments to cifs_open_inode_helper

...which takes a ton of unneeded arguments and does a lot more pointer
dereferencing than is really needed.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>

cifs: pass instantiated filp back after open call

The current scheme of sticking open files on a list and assuming that
cifs_open will scoop them off of it is broken and leads to "Busy
inodes after umount..." errors at unmount time.

The problem is that there is no guarantee that cifs_open will always
be called after a ->lookup or ->create operation. If there are
permissions or other problems, then it's quite likely that it *won't*
be called.

Fix this by fully instantiating the filp whenever the file is created
and pass that filp back to the VFS. If there is a problem, the VFS
can clean up the references.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>

cifs: move cifs_new_fileinfo call out of cifs_posix_open

Having cifs_posix_open call cifs_new_fileinfo is problematic and
inconsistent with how "regular" opens work. It's also buggy as
cifs_reopen_file calls this function on a reconnect, which creates a new
struct cifsFileInfo that just gets leaked.

Push it out into the callers. This also allows us to get rid of the
"mnt" arg to cifs_posix_open.

Finally, in the event that a cifsFileInfo isn't or can't be created, we
always want to close the filehandle out on the server as the client
won't have a record of the filehandle and can't actually use it. Make
sure that CIFSSMBClose is called in those cases.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>

OMAP4: clock: Fix multi-omap boot with reset un-used clocks

This patch uses "ENABLE_ON_INIT" flag on the emif clock nodes
to avoid the emif clk getting cut as part of reset un-used clock
routine which prevents boot.

Since "omap4xxx_clk_init()" calls "clk_enable_init_clocks()"
which increases the usecount on all ENABLE_ON_INIT clocks, it
prevents "omap2_clk_disable_unused()" from disabling the clock.

The real fix is to have driver for EMIF and do clock get/enable
as part of it. The EMIF driver is planned to be done HWMOD way
so till that available to keep omap3_defconfig booting on OMAP4430,
this patch is necessary.
(Will updated the auto-gen script for 44xx accordingly)

The fix was suggested by Paul Walmsley

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Nishanth Menon <nm@ti.com>
Acked-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>

Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6

bridge: Fix OOM crash in deliver_clone

The bridge multicast patches introduced an OOM crash in the forward
path, when deliver_clone fails to clone the skb.

Reported-by: Mark Wagner <mwagner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

netxen: fix caching window register

CRB window register is not per pci-func for NX3031,
so caching can result in incorrect values.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netxen: fix rcv buffer leak

Rcv producer should be read in spin-lock.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netxen: fix memory leaks in error path

Fixes memory leak in error path when memory allocation
for adapter data structures fails.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pcmcia: dev_node removal bugfix

Patch c7c2fa07 removed one line too much from smc91c92_cs.c.

Reported-by: Komuro <komurojun-mbn@nifty.com>
CC: netdev@vger.kernel.org
CC: linux-wireless@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

powerpc/5200: fix lite5200 ethernet phy address

According to my schematics, on Lite5200 board ethernet phy uses address
0 (all ADDR lines are pulled down). With this change I can talk to
onboard phy (LXT971) and correctly use autonegotiation.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>

powerpc/5200: Fix build error in sound code.

Compiling in the MPC5200 sound drivers results in the following build error:

sound/soc/fsl/mpc5200_psc_ac97.o: In function `to_psc_dma_stream':
mpc5200_psc_ac97.c:(.text+0x0): multiple definition of `to_psc_dma_stream'
sound/soc/fsl/mpc5200_dma.o:mpc5200_dma.c:(.text+0x0): first defined here
sound/soc/fsl/efika-audio-fabric.o: In function `to_psc_dma_stream':
efika-audio-fabric.c:(.text+0x0): multiple definition of `to_psc_dma_stream'
sound/soc/fsl/mpc5200_dma.o:mpc5200_dma.c:(.text+0x0): first defined here
make[3]: *** [sound/soc/fsl/built-in.o] Error 1
make[2]: *** [sound/soc/fsl] Error 2
make[1]: *** [sound/soc] Error 2
make: *** [sound] Error 2

This patch fixes it by declaring the inline function in the header file to
also be a static.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Jon Smirl <jonsmirl@gmail.com>
Tested-by: John Hilmar Linkhorst <John.Linkhorst@rwth-aachen.de>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>

powerpc/5200: fix oops during going to standby

When going to standby mode mpc code maps the whole soc5200 node
to access warious MBAR registers. However as of_iomap uses 'reg'
property of device node, only small part of MBAR is getting mapped.
Thus pm code gets oops when trying to access high parts of MBAR.
As a way to overcome this, make mpc52xx_pm_prepare() explicitly
map whole MBAR (0xc0000).

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>