Skip to content

DLPX-91810 Merge conflict in linux-kernel-gcp #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1,496 commits into from

Conversation

manoj-joseph
Copy link

Problem

Seb: I think we have a problem with the kernel repos this morning (all except the gcp repo, which wasn't touched). It looks like upstream was merged into develop instead of having had our patch sets rebased on top of upstream (starting with the @@DELPHIX_PATCHSET_START@@ commit).

Solution

Started with upstreams/develop and cherry-picked our patches with git cherry-pick bad90fed665350913c2181f159b82550af9c9c1b^..7587a1ff58290616e30fb7abc3d212ba2bf17d79.
After that, we are left with the following.

delphix@mj-build:~/linux-kernel-gcp$ git log -20 --abbrev-commit --oneline --graph 
* 07b0f84edb1d (HEAD -> develop, origin/pr/manoj-joseph/DLPX-91810) DLPX-87970 Move Delphix annotations to linux-pkg to reduce merge conflicts (#32)
* 7daf8288e5d9 DLPX-87710 upgrade from 6.0.16.0 to 15.0.0.0 failed because disk quota error (#30)
* 9311861f9e76 DLPX-87344 Fix kernel merge conflict with upstream
* 4aa475f32f6f DLPX-86675 Disk quota exceeded when unpacking an upgrade image (#28)
* 2c62b3b10881 DLPX-86177 Azure Accelerated networking broken because Mellanox drivers absent in kernel (#27)
* dddbe058b902 DLPX-84906 Disable frame buffer drivers (#26)
* 28243bc45d09 DLPX-84995 NFSD: Never call nfsd_file_gc() in foreground paths (#25)
* 28b1b1a2ea1e DLPX-84985 target: iscsi: fix deadlock in the iSCSI login code (#23)
* 98f1ea387cde DLPX-84907 CVE-2022-3628 (#22)
* cec19fe98d4f DLPX-84469 Users unable to connect to CIFS mounts (#21)
* 46b47f00065d DLPX-83701 Make function mnt_add_count() traceable (#18)
* 5ae8b7452c2c DLPX-83697 iscsi target login should wait until tx/rx threads have properly started
* ac9e57765e60 DLPX-83442 Disable various kernel modules which we don't use (#14)
* 1f8cacb3aa00 DLPX-82827 Fix for Solaris NFSv4 client mounts (#13)
* 2c9a82a24604 DLPX-72065 Aborted iSCSI command never completes after LUN reset (#4)
* c178801203b9 DLPX-74216 nfs-server restarts fail when order-5 allocations are exhausted (#3)
* 0555dc519abf DLPX-71852 iSCSI: journal flooded with "Unable to locate Target IQN" messages (#2)
* 3263fcb31f28 @@DELPHIX_PATCHSET_START@@
* 145b221b3e20 (tag: Ubuntu-gcp-5.15-5.15.0-1065.73_20.04.1, origin/upstreams/develop, upstreams-develop) UBUNTU: Ubuntu-gcp-5.15-5.15.0-1065.73~20.04.1
* 7b93ac97f528 UBUNTU: link-to-tracker: update tracking bug
delphix@mj-build:~/linux-kernel-gcp$ 

praveenkaligineedi and others added 30 commits July 15, 2024 10:06
BugLink: https://bugs.launchpad.net/bugs/2040522

IRQs are currently requested before the netdevice is registered
and a proper name is assigned to the device. Changing interrupt
name to avoid using the format string in the name.

Interrupt name before change: eth%d-ntfy-block.<blk_id>
Interrupt name after change: gve-ntfy-blk<blk_id>@pci:<pci_name>

Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8437114)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

This patch adds/modifies helper functions needed to add XDP
support.

Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2e80aea)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

Changes to enable adding and removing TX queues without calling
gve_close() and gve_open().

Made the following changes:
1) priv->tx, priv->rx and priv->qpls arrays are allocated based on
   max tx queues and max rx queues
2) Changed gve_adminq_create_tx_queues(), gve_adminq_destroy_tx_queues(),
gve_tx_alloc_rings() and gve_tx_free_rings() functions to add/remove a
subset of TX queues rather than all the TX queues.

Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7fc2bf7)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

Add support for XDP PASS, DROP and TX actions.

This patch contains the following changes:
1) Support installing/uninstalling XDP program
2) Add dedicated XDP TX queues
3) Add support for XDP DROP action
4) Add support for XDP TX action

Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit 75eaae1)
[john-cabaj: context changes as
"gve: Secure enough bytes in the first TX desc for all TCP pkts"
already applied. Removing xdp_features as support wasn't
implemented until 6.3]
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

This patch contains the following changes:
1) Support for XDP REDIRECT action on rx
2) ndo_xdp_xmit callback support

In GQI-QPL queue format, the driver needs to allocate a fixed size
memory, the size specified by vNIC device, for RX/TX and register this
memory as a bounce buffer with the vNIC device when a queue is created.
The number of pages in the bounce buffer is limited and the pages need to
be made available to the vNIC by copying the RX data out to prevent
head-of-line blocking. The XDP_REDIRECT packets are therefore immediately
copied to a newly allocated page.

Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit 39a7f4a)
[john-cabaj: Removing xdp_features as support wasn't
implemented until 6.3]
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

Adding AF_XDP zero-copy support.

Note: Although these changes support AF_XDP socket in zero-copy
mode, there is still a copy happening within the driver between
XSK buffer pool and QPL bounce buffers in GQI-QPL format.
In GQI-QPL queue format, the driver needs to allocate a fixed size
memory, the size specified by vNIC device, for RX/TX and register this
memory as a bounce buffer with the vNIC device when a queue is
created. The number of pages in the bounce buffer is limited and the
pages need to be made available to the vNIC by copying the RX data out
to prevent head-of-line blocking. Therefore, we cannot pass the XSK
buffer pool to the vNIC.

The number of copies on RX path from the bounce buffer to XSK buffer is 2
for AF_XDP copy mode (bounce buffer -> allocated page frag -> XSK buffer)
and 1 for AF_XDP zero-copy mode (bounce buffer -> XSK buffer).

This patch contains the following changes:
1) Enable and disable XSK buffer pool
2) Copy XDP packets from QPL bounce buffers to XSK buffer on rx
3) Copy XDP packets from XSK buffer to QPL bounce buffers and
   ring the doorbell as part of XDP TX napi poll
4) ndo_xsk_wakeup callback support

Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit fd8e403)
[john-cabaj: Removing xdp_features as support wasn't
implemented until 6.3]
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

The two constants accomplish the same thing.

Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230407184830.309398-1-shailend@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
(cherry picked from commit 4de00f0)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

Add support for using IPv6 Big TCP on DQ which can handle large TSO/GRO
packets. See https://lwn.net/Articles/895398/. This can improve the
throughput and CPU usage.

Perf test result:
ip -d link show $DEV
gso_max_size 185000 gso_max_segs 65535 tso_max_size 262143 tso_max_segs 65535 gro_max_size 185000

For performance, tested with neper using 9k MTU on hardware that supports 200Gb/s line rate.

In single streams when line rate is not saturated, we expect throughput improvements.
When the networking is performing at line rate, we expect cpu usage improvements.

Tcp_stream (unidirectional stream test, T=thread, F=flow):
skb=180kb, T=1, F=1, no zerocopy: throughput average=64576.88 Mb/s, sender stime=8.3, receiver stime=10.68
skb=64kb,  T=1, F=1, no zerocopy: throughput average=64862.54 Mb/s, sender stime=9.96, receiver stime=12.67
skb=180kb, T=1, F=1, yes zerocopy:  throughput average=146604.97 Mb/s, sender stime=10.61, receiver stime=5.52
skb=64kb,  T=1, F=1, yes zerocopy:  throughput average=131357.78 Mb/s, sender stime=12.11, receiver stime=12.25

skb=180kb, T=20, F=100, no zerocopy:  throughput average=182411.37 Mb/s, sender stime=41.62, receiver stime=79.4
skb=64kb,  T=20, F=100, no zerocopy:  throughput average=182892.02 Mb/s, sender stime=57.39, receiver stime=72.69
skb=180kb, T=20, F=100, yes zerocopy: throughput average=182337.65 Mb/s, sender stime=27.94, receiver stime=39.7
skb=64kb,  T=20, F=100, yes zerocopy: throughput average=182144.20 Mb/s, sender stime=47.06, receiver stime=39.01

Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230522201552.3585421-1-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit a695641)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

Use vmalloc_array and vcalloc to protect against
multiplication overflows.

The changes were done using the following Coccinelle
semantic patch:

// <smpl>
@initialize:ocaml@
@@

let rename alloc =
  match alloc with
    "vmalloc" -> "vmalloc_array"
  | "vzalloc" -> "vcalloc"
  | _ -> failwith "unknown"

@@
    size_t e1,e2;
    constant C1, C2;
    expression E1, E2, COUNT, x1, x2, x3;
    typedef u8;
    typedef __u8;
    type t = {u8,__u8,char,unsigned char};
    identifier alloc = {vmalloc,vzalloc};
    fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@

(
      alloc(x1*x2*x3)
|
      alloc(C1 * C2)
|
      alloc((sizeof(t)) * (COUNT), ...)
|
-     alloc((e1) * (e2))
+     realloc(e1, e2)
|
-     alloc((e1) * (COUNT))
+     realloc(COUNT, e1)
|
-     alloc((E1) * (E2))
+     realloc(E1, E2)
)
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20230627144339.144478-5-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit a13de90)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

Current codebase contained the usage of two different names for this
driver (i.e., `gvnic` and `gve`), which is quite unfriendly for users
to use, especially when trying to bind or unbind the driver manually.
The corresponding kernel module is registered with the name of `gve`.
It's more reasonable to align the name of the driver with the module.

Fixes: 893ce44 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Cc: csully@google.com
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9d0aba9)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

Spotted this trivial spell mistake while casually reading
the google GVE driver code.

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 68af900)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

Handful of drivers currently expect to get xdp.h by virtue
of including netdevice.h. This will soon no longer be the case
so add explicit includes.

Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230803010230.1755386-2-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
(backported from commit 92272ec)
[john-cabaj: context changes for headers since included upstream
and not in 5.15]
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

GVE supports QPL ("queue-page-list") mode where
all data is communicated through a set of pre-registered
pages. Adding this mode to DQO descriptor format.

Add checks, abi-changes and device options to support
QPL mode for DQO in addition to GQI. Also, use
pages-per-qpl supplied by device-option to control the
size of the "queue-page-list".

Signed-off-by: Rushil Gupta <rushilg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 66ce8e6)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

Each QPL page is divided into GVE_TX_BUFS_PER_PAGE_DQO buffers.
When a packet needs to be transmitted, we break the packet into max
GVE_TX_BUF_SIZE_DQO sized chunks and transmit each chunk using a TX
descriptor.
We allocate the TX buffers from the free list in dqo_tx.
We store these TX buffer indices in an array in the pending_packet
structure.

The TX buffers are returned to the free list in dqo_compl after
receiving packet completion or when removing packets from miss
completions list.

Signed-off-by: Rushil Gupta <rushilg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a6fb8d5)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

The RX path allocates the QPL page pool at queue creation, and
tries to reuse these pages through page recycling. This patch
ensures that on refill no non-QPL pages are posted to the device.

When the driver is running low on free buffers, an ondemand
allocation step kicks in that allocates a non-qpl page for
SKB business to free up the QPL page in use.

gve_try_recycle_buf was moved to gve_rx_append_frags so that driver does
not attempt to mark buffer as used if a non-qpl page was allocated
ondemand.

Signed-off-by: Rushil Gupta <rushilg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit e7075ab)
[john-cabaj: context changes]
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

gve_rx_append_frags() is able to build skbs chained with frag_list,
like GRO engine.

Problem is that shinfo->frag_list should only be used
for the head of the chain.

All other links should use skb->next pointer.

Otherwise, built skbs are not valid and can cause crashes.

Equivalent code in GRO (skb_gro_receive()) is:

    if (NAPI_GRO_CB(p)->last == p)
        skb_shinfo(p)->frag_list = skb;
    else
        NAPI_GRO_CB(p)->last->next = skb;
    NAPI_GRO_CB(p)->last = skb;

Fixes: 9b8dd5e ("gve: DQO: Add RX path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Bailey Forrest <bcf@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Catherine Sullivan <csully@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 817c7cd)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

Add a note about QPL and RDA mode

Signed-off-by: Rushil Gupta <rushilg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5a3f8d1)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

Up until commit 46e6b99 ("rtnetlink: allow GSO maximums to
be set on device creation") the gso_max_segs and gso_max_size
of a device were not controlled from user space.

The quoted commit added the ability to control them because of
the following setup:

 netns A  |  netns B
     veth<->veth   eth0

If eth0 has TSO limitations and user wants to efficiently forward
traffic between eth0 and the veths they should copy the TSO
limitations of eth0 onto the veths. This would happen automatically
for macvlans or ipvlan but veth users are not so lucky (given the
loose coupling).

Unfortunately the commit in question allowed users to also override
the limits on real HW devices.

It may be useful to control the max GSO size and someone may be using
that ability (not that I know of any user), so create a separate set
of knobs to reliably record the TSO limitations. Validate the user
requests.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit 14d7b81)
[john-cabaj: context changes]
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

dev->gso_max_segs is written under RTNL protection, or when the device is
not yet visible, but is read locklessly.

Add netif_set_gso_max_segs() helper.

Add the READ_ONCE()/WRITE_ONCE() pairs, and use netif_set_gso_max_segs()
where we can to better document what is going on.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit 6d872df)
[john-cabaj: context changes]
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

Following patches will need to add and remove local IPv6 jumbogram
options to enable BIG TCP.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7c96d8e)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

ipv6 tcp and gro stacks will soon be able to build big TCP packets,
with an added temporary Hop By Hop header.

If GSO is involved for these large packets, we need to remove
the temporary HBH header before segmentation happens.

v2: perform HBH removal from ipv6_gso_segment() instead of
    skb_segment() (Alexander feedback)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 09f3d1a)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

IPv6/TCP and GRO stacks can build big TCP packets with an added
temporary Hop By Hop header.

Is GSO is not involved, then the temporary header needs to be removed in
the driver. This patch provides a generic helper for drivers that need
to modify their headers in place.

Tested:
Compiled and ran with ethtool -K eth1 tso off
Could send Big TCP packets

Signed-off-by: Coco Li <lixiaoyan@google.com>
Link: https://lore.kernel.org/r/20221210041646.3587757-1-lixiaoyan@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 8930046)
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2040522

Eric Dumazet suggested to allow users to modify max GRO packet size.

We have seen GRO being disabled by users of appliances (such as
wifi access points) because of claimed bufferbloat issues,
or some work arounds in sch_cake, to split GRO/GSO packets.

Instead of disabling GRO completely, one can chose to limit
the maximum packet size of GRO packets, depending on their
latency constraints.

This patch adds a per device gro_max_size attribute
that can be changed with ip link command.

ip link set dev eth0 gro_max_size 16000

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit eac1b93)
[john-cabaj: context changes]
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Portia Stephens <portia.stephens@canonical.com>
Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Ignore: yes
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2051131
Properties: no-test-build
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
…enc_status_changed()

BugLink: https://bugs.launchpad.net/bugs/2052451

27ba8c7 modified notify_page_enc_status_changed(), when it should not have.
Revert this portion of the change to correct the incorrect arguments.

Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Acked-by: Thibault Ferrante <thibault.ferrante@canonical.com>
Acked-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Ignore: yes
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2052234
Properties: no-test-build
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
CacheUseOnly and others added 24 commits July 15, 2024 12:38
BugLink: https://bugs.launchpad.net/bugs/2068360
Properties: no-test-build
Signed-off-by: Yuxuan Luo <yuxuan.luo@canonical.com>
…el-versions (main/2024.06.10)

BugLink: https://bugs.launchpad.net/bugs/1786013
Signed-off-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Signed-off-by: Yuxuan Luo <yuxuan.luo@canonical.com>
Ignore: yes
Signed-off-by: Bethany Jamison <bethany.jamison@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2072024
Properties: no-test-build
Signed-off-by: Bethany Jamison <bethany.jamison@canonical.com>
Signed-off-by: Bethany Jamison <bethany.jamison@canonical.com>
This is a placeholder commit to separate the Ubuntu kernel source and
our patches. Used by kernel_merge_with_upstream() in the linux-pkg repo.
The checks in nfsd_file_acquire() and nfsd_file_put() that directly
invoke filecache garbage collection are intended to keep cache
occupancy between a low- and high-watermark. The reason to limit the
capacity of the filecache is to keep filecache lookups reasonably
fast.

However, invoking garbage collection at those points has some
undesirable negative impacts. Files that are held open by NFSv4
clients often push the occupancy of the filecache over these
watermarks. At that point:

- Every call to nfsd_file_acquire() and nfsd_file_put() results in
  an LRU walk. This has the same effect on lookup latency as long
  chains in the hash table.
- Garbage collection will then run on every nfsd thread, causing a
  lot of unnecessary lock contention.
- Limiting cache capacity pushes out files used only by NFSv3
  clients, which are the type of files the filecache is supposed to
  help.

To address those negative impacts, remove the direct calls to the
garbage collector.
@manoj-joseph
Copy link
Author

#39

delphix-devops-bot pushed a commit that referenced this pull request Jun 27, 2025
BugLink: https://bugs.launchpad.net/bugs/2110173

[ Upstream commit efdde3d73ab25cef4ff2d06783b0aad8b093c0e4 ]

There is case as below could trigger kernel dump:
Use U-Boot to start remote processor(rproc) with resource table
published to a fixed address by rproc. After Kernel boots up,
stop the rproc, load a new firmware which doesn't have resource table
,and start rproc.

When starting rproc with a firmware not have resource table,
`memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will
trigger dump, because rproc->cache_table is set to NULL during the last
stop operation, but rproc->table_sz is still valid.

This issue is found on i.MX8MP and i.MX9.

Dump as below:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Mem abort info:
  ESR = 0x0000000096000004
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x04: level 0 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af63000
[0000000000000000] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted 6.14.0-rc7-next-20250317-dirty #38
Hardware name: NXP i.MX8MPlus EVK board (DT)
pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __pi_memcpy_generic+0x110/0x22c
lr : rproc_start+0x88/0x1e0
Call trace:
 __pi_memcpy_generic+0x110/0x22c (P)
 rproc_boot+0x198/0x57c
 state_store+0x40/0x104
 dev_attr_store+0x18/0x2c
 sysfs_kf_write+0x7c/0x94
 kernfs_fop_write_iter+0x120/0x1cc
 vfs_write+0x240/0x378
 ksys_write+0x70/0x108
 __arm64_sys_write+0x1c/0x28
 invoke_syscall+0x48/0x10c
 el0_svc_common.constprop.0+0xc0/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x30/0xcc
 el0t_64_sync_handler+0x10c/0x138
 el0t_64_sync+0x198/0x19c

Clear rproc->table_sz to address the issue.

Fixes: 9dc9507 ("remoteproc: Properly deal with the resource table when detaching")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Link: https://lore.kernel.org/r/20250319100106.3622619-1-peng.fan@oss.nxp.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.