net: Move "struct net" declaration inside the __KERNEL__ macro guard
This patch reduces namespace pollution by moving the "struct net" declaration
out of the userspace-facing portion of linux/netlink.h. It has no impact on
the kernel.
(This came up because we have several C++ applications which use "net" as a
namespace name.)
Signed-off-by: Ollie Wild <aaw@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Olsa [Tue, 21 Sep 2010 21:17:34 +0000 (21:17 +0000)]
netfilter: nf_conntrack_defrag: check socket type before touching nodefrag flag
we need to check proper socket type within ipv4_conntrack_defrag
function before referencing the nodefrag flag.
For example the tun driver receive path produces skbs with
AF_UNSPEC socket type, and so current code is causing unwanted
fragmented packets going out.
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 21 Sep 2010 21:17:32 +0000 (21:17 +0000)]
netfilter: fix a race in nf_ct_ext_create()
As soon as rcu_read_unlock() is called, there is no guarantee current
thread can safely derefence t pointer, rcu protected.
Fix is to copy t->alloc_size in a temporary variable.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter: fix ipt_REJECT TCP RST routing for indev == outdev
ip_route_me_harder can't create the route cache when the outdev is the same
with the indev for the skbs whichout a valid protocol set.
__mkroute_input functions has this check:
1998 if (skb->protocol != htons(ETH_P_IP)) {
1999 /* Not IP (i.e. ARP). Do not create route, if it is
2000 * invalid for proxy arp. DNAT routes are always valid.
2001 *
2002 * Proxy arp feature have been extended to allow, ARP
2003 * replies back to the same interface, to support
2004 * Private VLAN switch technologies. See arp.c.
2005 */
2006 if (out_dev == in_dev &&
2007 IN_DEV_PROXY_ARP_PVLAN(in_dev) == 0) {
2008 err = -EINVAL;
2009 goto cleanup;
2010 }
2011 }
This patch gives the new skb a valid protocol to bypass this check. In order
to make ipt_REJECT work with bridges, you also need to enable ip_forward.
This patch also fixes a regression. When we used skb_copy_expand(), we
didn't have this issue stated above, as the protocol was properly set.
Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Tue, 21 Sep 2010 21:17:30 +0000 (21:17 +0000)]
netfilter: nf_ct_sip: default to NF_ACCEPT in sip_help_tcp()
I initially noticed this because of the compiler warning below, but it
does seem to be a valid concern in the case where ct_sip_get_header()
returns 0 in the first iteration of the while loop.
net/netfilter/nf_conntrack_sip.c: In function 'sip_help_tcp':
net/netfilter/nf_conntrack_sip.c:1379: warning: 'ret' may be used uninitialized in this function
Signed-off-by: Simon Horman <horms@verge.net.au>
[Patrick: changed NF_DROP to NF_ACCEPT] Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 21 Sep 2010 21:17:29 +0000 (21:17 +0000)]
netfilter: tproxy: nf_tproxy_assign_sock() can handle tw sockets
transparent field of a socket is either inet_twsk(sk)->tw_transparent
for timewait sockets, or inet_sk(sk)->transparent for other sockets
(TCP/UDP).
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Mon, 20 Sep 2010 20:40:24 +0000 (20:40 +0000)]
drivers: atm: use native kernel's hex_to_bin() func
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Chas Williams <chas@cmf.nrl.navy.mil> Cc: linux-atm-general@lists.sourceforge.net Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 21 Sep 2010 23:12:11 +0000 (16:12 -0700)]
ethtool: Fix build due to lack of ethtool.h include.
net/core/ethtool.c: In function 'ethtool_get_regs':
net/core/ethtool.c:818:2: error: implicit declaration of function 'vmalloc'
net/core/ethtool.c:818:9: warning: assignment makes pointer from integer without a cast
net/core/ethtool.c:833:2: error: implicit declaration of function 'vfree'
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 21 Sep 2010 23:11:06 +0000 (16:11 -0700)]
sfc: Fix build due to lack of vmalloc.h include.
drivers/net/sfc/filter.c: In function ‘efx_probe_filters’:
drivers/net/sfc/filter.c:422: error: implicit declaration of function ‘vmalloc’
drivers/net/sfc/filter.c:422: warning: assignment makes pointer from integer without a cast
drivers/net/sfc/filter.c: In function ‘efx_remove_filters’:
drivers/net/sfc/filter.c:442: error: implicit declaration of function ‘vfree’
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 21 Sep 2010 08:47:45 +0000 (08:47 +0000)]
ip: fix truesize mismatch in ip fragmentation
Special care should be taken when slow path is hit in ip_fragment() :
When walking through frags, we transfert truesize ownership from skb to
frags. Then if we hit a slow_path condition, we must undo this or risk
uncharging frags->truesize twice, and in the end, having negative socket
sk_wmem_alloc counter, or even freeing socket sooner than expected.
Many thanks to Nick Bowler, who provided a very clean bug report and
test program.
Thanks to Jarek for reviewing my first patch and providing a V2
While Nick bisection pointed to commit 2b85a34e911 (net: No more
expensive sock_hold()/sock_put() on each tx), underlying bug is older
(2.6.12-rc5)
A side effect is to extend work done in commit b2722b1c3a893e
(ip_fragment: also adjust skb->truesize for packets not owned by a
socket) to ipv6 as well.
Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com> Tested-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Mon, 20 Sep 2010 11:45:38 +0000 (13:45 +0200)]
ath9k: clean up / fix aggregation session flush
The tid aggregation cleanup is a bit fragile, as it discards failed
subframes in some places, and retransmits them in others. This could
block the cleanup of an existing aggregation session, if a retransmission
for a tid is issued, yet the tid is never scheduled again because of
the cleanup state.
Fix this by getting rid of as many subframes as possible, as early
as possible, and immediately transmitting pending subframes as regular
HT frames instead of waiting for the cleanup to complete.
Drop all pending subframes while keeping track of the Block ACK window
during aggregate tx completion to prevent sending out stale subframes,
which could confuse the receiver side.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
Felix Fietkau [Mon, 20 Sep 2010 17:35:28 +0000 (19:35 +0200)]
ath9k: fix an aggregation start related race condition
A new aggregation session start can be issued by mac80211, even when the
cleanup of the previous session has not completed yet. Since the data structure
for the session is not recreated, this could corrupt the block ack window
and lock up the aggregation session. Fix this by delaying the new session
until the old one has been cleaned up.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
Felix Fietkau [Mon, 20 Sep 2010 11:45:36 +0000 (13:45 +0200)]
ath9k: clean up block ack window handling
There's no reason to keep pointers to pending tx buffers around, if they're
only used to keep track of which frames are still pending. Use a bitfield
instead.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
carl9170: reinit phy after HT settings have changed
The driver has a set of different initvals for 20 MHz
vs dynamic HT2040 operation. Because we can't change
some of the registers "in-flight", the driver needs to
perform a warm reset.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Don't mark the device as completely dead just yet.
If all goes to plan and carl9170_reboot succeeds
then we can skip the expensive userspace-driven
reinitialization anyway.
And if it doesn't and carl9170_reboot fails,
then carl9170_usb_cancel_urbs will do the
necessary steps.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ever since carl9170 gained support to read the noisefloor,
the reported noisefloor level was pretty poor.
Initially I assumed that something was wrong in the PHY
setup and it would be impossible to fix without any
guidances. But this was not the case. In fact the nf
readings were correct and the thing that was broken
was the "simple" sign extension code!
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Bruno Randolf [Fri, 17 Sep 2010 02:37:12 +0000 (11:37 +0900)]
ath5k: Simplify cw_min/max and AIFS configuration
Get rid of overly complicated cw_min/max and AIFS configuration:
* Validate values in ath5k_hw_set_tx_queueprops(), so we can use them directly
without further checks or computation in ath5k_hw_reset_tx_queue().
* Simplifiy by using AR5K_TUNE_AIFS|CWMIN|CWMAX variables directly since we
don't support XR or B channels. That way we can also remove
AR5K_TXQ_USEDEFAULT and the confusing logic around it.
* Update data types: AIFS is u8, CW's are u16.
* Remove now unneeded variables in ath5k_hw.
Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Bruno Randolf [Fri, 17 Sep 2010 02:37:07 +0000 (11:37 +0900)]
ath5k: Keep last descriptor in queue
If we return a TX descriptor to the pool of available descriptors, while a
queues TXDP still points to it we could potentially run into all sorts of
troube.
It has been suggested that there is hardware which can set the descriptors
done bit before it reads ds_link and moves on to the next descriptor. While the
documentation says this is not true for newer chipsets (the descriptor contents
are copied to some internal memory), we don't know about older hardware.
To be safe, we always keep the last descriptor in the queue, and avoid dangling
TXDP pointers. Unfortunately this does not fully resolve the problem - queues
still get stuck!
This is similar to what ath9k does.
Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Bruno Randolf [Fri, 17 Sep 2010 02:36:56 +0000 (11:36 +0900)]
ath5k: Add watchdog for stuck TX queues
Since we do not know any better solution to the problem that TX queues can get
stuck, this adds a timer-based watchdog, which will check for stuck queues and
reset the hardware if necessary.
Bruno Randolf [Fri, 17 Sep 2010 02:36:35 +0000 (11:36 +0900)]
ath5k: Use four hardware queues
Prepare ath5k for WME by using four hardware queues.
The way we set up our queues matches the mac80211 queue priority 1:1, so we
don't have to do any mapping for queue numbers.
Every queue uses 50 of the total 200 available transmit buffers, so the DMA
memory usage does not increase with this patch, but it might be good to
fine-tune the number of buffers per queue later (depending on the CPU speed and
load, and the speed of the medium access, it might not be big enough).
Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Bob Copeland [Fri, 17 Sep 2010 03:45:07 +0000 (12:45 +0900)]
ath5k: reorder base.c to remove fwd decls
This change reorganizes the main ath5k file in order to re-group
related functions and remove most of the forward declarations
(from 61 down to 3). This is, unfortunately, a lot of churn, but
there should be no functional changes.
Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Remove unneeded ath_crypt_caps flags, as per "ath9k_hw: remove useless hw
capability flags" (364734fafbba0c3133e482db78149b9a823ae7a5), but set the
AESCCM flag for ath9k. common ath code still needs a flag for this because
there is ath5k hardware which can't do AES in hardware.
Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add a simple mechanism to pass platform data to the
SDIO instances of wl12xx.
This way there is no confusion over who owns the 'embedded data',
typechecking is preserved, and no possibility for the wrong driver to
pick up the data.
Originally proposed by Russell King.
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Make it possible for the set power method to indicate a
success/failure return value. This is needed to support
more complex power on/off operations such as SDIO
power manipulations.
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Acked-by: Luciano Coelho <luciano.coelho@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
The `options_received' struct is redundant, since it re-duplicates the existing
`p' and `x_recv' fields. This patch removes the sub-struct and migrates the
format conversion operations to ccid3_hc_tx_parse_options().
dccp tfrc/ccid-3: computing the loss rate from the Loss Event Rate
This adds a function to take care of the following, separate cases occurring in
the computation of the Loss Rate p:
* 1/(2^32-1) is mapped into 0% as per RFC 4342, 8.5;
* 1/0 is mapped into 100%, the maximum;
* to avoid that p = 1/x is rounded down to 0 when x is very large, since this
means accidentally re-entering slow-start indicated by p == 0, the minimum
resolution value of p is now returned instead;
* a bug in ccid3_hc_rx_getsockopt is fixed: 1/0 was mapped into ~0U.
This patch is thanks to an investigation by Leandro Sales de Melo and his
colleagues. They worked out two state diagrams which highlight the fact that
the xxx_TERM states in CCID-3/4 are in fact not necessary.
And this can be confirmed by in turn looking at the code: the xxx_TERM states
are only ever set in ccid3_hc_{rx,tx}_exit(): when CCID-3 sets the state
to xxx_TERM, it is at a time where no more processing should be going on,
hence it is not necessary to introduce a dedicated exit state - this is already
implied by unloading the CCID.
dccp: Add packet type information to CCID-specific option parsing
This
1. adds packet type information to ccid_hc_{rx,tx}_parse_options(). This is
necessary, since table 3 in RFC 4340, 5.8 leaves it to the CCIDs to state
which options may (not) appear on what packet type.
2. adds such a check for CCID-3's {Loss Event, Receive} Rate as specified in
RFC 4340 8.3 ("Receive Rate options MUST NOT be sent on DCCP-Data packets")
and 8.5 ("Loss Event Rate options MUST NOT be sent on DCCP-Data packets").
3. removes an unused argument `idx' from ccid_hc_{rx,tx}_parse_options(). This
is also no longer necessary, since the CCID-specific option-parsing routines
are passed every single parameter of the type-length-value option encoding.
Tom Marshall [Mon, 20 Sep 2010 22:42:05 +0000 (15:42 -0700)]
tcp: Fix race in tcp_poll
If a RST comes in immediately after checking sk->sk_err, tcp_poll will
return POLLIN but not POLLOUT. Fix this by checking sk->sk_err at the end
of tcp_poll. Additionally, ensure the correct order of operations on SMP
machines with memory barriers.
Signed-off-by: Tom Marshall <tdm.code@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Egerer [Mon, 20 Sep 2010 18:11:38 +0000 (11:11 -0700)]
xfrm: Allow different selector family in temporary state
The family parameter xfrm_state_find is used to find a state matching a
certain policy. This value is set to the template's family
(encap_family) right before xfrm_state_find is called.
The family parameter is however also used to construct a temporary state
in xfrm_state_find itself which is wrong for inter-family scenarios
because it produces a selector for the wrong family. Since this selector
is included in the xfrm_user_acquire structure, user space programs
misinterpret IPv6 addresses as IPv4 and vice versa.
This patch splits up the original init_tempsel function into a part that
initializes the selector respectively the props and id of the temporary
state, to allow for differing ip address families whithin the state.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When a driver doesn't fill the entire buffer, old
heap contents may remain, and if it also doesn't
update the length properly, this old heap content
will be copied back to userspace.
It is very unlikely that this happens in any of
the drivers using private ioctls since it would
show up as junk being reported by iwpriv, but it
seems better to be safe here, so use kzalloc.
Reported-by: Jeff Mahoney <jeffm@suse.com> Cc: stable@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Eric Dumazet [Mon, 20 Sep 2010 00:12:11 +0000 (00:12 +0000)]
net: rx_dropped accounting
Under load, netif_rx() can drop incoming packets but administrators dont
have a chance to spot which device needs some tuning (RPS activation for
example)
This patch adds rx_dropped accounting in vlans and tunnels.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bandan Das [Sun, 19 Sep 2010 09:34:33 +0000 (09:34 +0000)]
bridge : Sanitize skb before it enters the IP stack
Related dicussion here : http://lkml.org/lkml/2010/9/3/16
Introduce a function br_parse_ip_options that will audit the
skb and possibly refill IP options before a packet enters the
IP stack. If no options are present, the function will zero out
the skb cb area so that it is not misinterpreted as options by some
unsuspecting IP layer routine. If packet consistency fails, drop it.
Signed-off-by: Bandan Das <bandan.das@stratus.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Sat, 18 Sep 2010 13:44:14 +0000 (13:44 +0000)]
rds: spin_lock_irq() is not nestable
This is basically just a cleanup. IRQs were disabled on the previous
line so we don't need to do it again here. In the current code IRQs
would get turned on one line earlier than intended.
Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dca: disable dca on IOAT ver.3.0 multiple-IOH platforms
Direct Cache Access is not supported on IOAT ver.3.0 multiple-IOH platforms.
This patch blocks registering of dca providers when multiple IOH detected with IOAT ver.3.0.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 17 Sep 2010 23:55:03 +0000 (16:55 -0700)]
netpoll: Disable IRQ around RCU dereference in netpoll_rx
We cannot use rcu_dereference_bh safely in netpoll_rx as we may
be called with IRQs disabled. We could however simply disable
IRQs as that too causes BH to be disabled and is safe in either
case.
Thanks to John Linville for discovering this bug and providing
a patch.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Thu, 16 Sep 2010 11:28:07 +0000 (11:28 +0000)]
ethtool, ixgbe: Move RX n-tuple mask fixup to ethtool
The ethtool utility does not set masks for flow parameters that are
not specified, so if both value and mask are 0 then this must be
treated as equivalent to a mask with all bits set. Currently that is
done in the only driver that implements RX n-tuple filtering, ixgbe.
Move it to the ethtool core.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sctp: Do not reset the packet during sctp_packet_config().
sctp_packet_config() is called when getting the packet ready
for appending of chunks. The function should not touch the
current state, since it's possible to ping-pong between two
transports when sending, and that can result packet corruption
followed by skb overlfow crash.
Reported-by: Thomas Dreibholz <dreibh@iem.uni-due.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Lamparter [Fri, 17 Sep 2010 03:22:19 +0000 (03:22 +0000)]
netns: keep vlan slaves on master netns move
previously, if a vlan master device was moved from one network namespace
to another, all 802.1q and macvlan slaves were deleted.
we can use dev->reg_state to figure out whether dev_change_net_namespace
is happening, since that won't set dev->reg_state NETREG_UNREGISTERING.
so, this changes 8021q and macvlan to ignore NETDEV_UNREGISTER when
reg_state is not NETREG_UNREGISTERING.
Signed-off-by: David Lamparter <equinox@diac24.net> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: prevent dma init stuck in case of failures.
Add a limit when perform the DMA reset procedure
so, in case of problems (i.e. PHY reset failed) the
Kernel won't hang on the stmmac DMA initialisation
blocking the Kernels execution.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The first version of the driver had hard-coded the logic
for handling the checksum offloading.
This was designed according to the chips included in
the STM platforms where:
o MAC10/100 supports no COE at all.
o GMAC fully supports RX/TX COE.
This is not good for other chip configurations where,
for example, the mac10/100 supports the tx csum in HW
or when the GMAC has no IPC.
Thanks to Johannes Stezenbach; he provided me a first
draft of this patch that only reviewed the IPC for the
GMAC devices.
This patch also helps on SPEAr platforms where the
MAC10/100 can perform the TX csum in HW.
Thanks to Deepak SIKRI for his support on this.
In the end, GMAC devices for STM platforms have
a bugged Jumbo frame support that needs to have
the Tx COE disabled for oversized frames (due to
limited buffer sizes). This information is also
passed through the driver's platform structure.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Johannes Stezenbach <js@sig21.net> Signed-off-by: Deepak SIKRI <deepak.sikri@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Original patch from Johannes Stezenbach fixed the CSR
in the stmmac_mdio. We agreed to provide this through
the platform instead of.
Also thanks to Johannes for having tested it on ARM.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Johannes Stezenbach <js@sig21.net> Signed-off-by: David S. Miller <davem@davemloft.net>