commit 160f4124ea8b4cd6c86867e111fa55e266345a16 Author: Greg Kroah-Hartman Date: Tue Jul 11 06:31:05 2023 +0200 Linux 6.4.3 Link: https://lore.kernel.org/r/20230709111345.297026264@linuxfoundation.org Tested-by: Ronald Warsow Tested-by: Bagas Sanjaya Tested-by: Linux Kernel Functional Testing Tested-by: Chris Paterson (CIP) Tested-by: Salvatore Bonaccorso Tested-by: Guenter Roeck Tested-by: Takeshi Ogasawara Tested-by: Ron Economos Signed-off-by: Greg Kroah-Hartman commit 036666b4163d320282a627075934f1ab0de12f8b Author: Suren Baghdasaryan Date: Sat Jul 8 12:12:12 2023 -0700 fork: lock VMAs of the parent process when forking commit fb49c455323ff8319a123dd312be9082c49a23a5 upstream. When forking a child process, the parent write-protects anonymous pages and COW-shares them with the child being forked using copy_present_pte(). We must not take any concurrent page faults on the source vma's as they are being processed, as we expect both the vma and the pte's behind it to be stable. For example, the anon_vma_fork() expects the parents vma->anon_vma to not change during the vma copy. A concurrent page fault on a page newly marked read-only by the page copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the source vma, defeating the anon_vma_clone() that wasn't done because the parent vma originally didn't have an anon_vma, but we now might end up copying a pte entry for a page that has one. Before the per-vma lock based changes, the mmap_lock guaranteed exclusion with concurrent page faults. But now we need to do a vma_start_write() to make sure no concurrent faults happen on this vma while it is being processed. This fix can potentially regress some fork-heavy workloads. Kernel build time did not show noticeable regression on a 56-core machine while a stress test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~5% regression. If such fork time regression is unacceptable, disabling CONFIG_PER_VMA_LOCK should restore its performance. Further optimizations are possible if this regression proves to be problematic. Suggested-by: David Hildenbrand Reported-by: Jiri Slaby Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/ Reported-by: Holger Hoffstätte Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/ Reported-by: Jacob Young Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624 Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first") Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 890ba5c464c2a9aeb26d0e873962e5b7d401df6b Author: Liu Shixin Date: Tue Jul 4 18:19:42 2023 +0800 bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page commit 028725e73375a1ff080bbdf9fb503306d0116f28 upstream. commit dd0ff4d12dd2 ("bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem") fix an overlaps existing problem of kmemleak. But the problem still existed when HAVE_BOOTMEM_INFO_NODE is disabled, because in this case, free_bootmem_page() will call free_reserved_page() directly. Fix the problem by adding kmemleak_free_part() in free_bootmem_page() when HAVE_BOOTMEM_INFO_NODE is disabled. Link: https://lkml.kernel.org/r/20230704101942.2819426-1-liushixin2@huawei.com Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page") Signed-off-by: Liu Shixin Acked-by: Muchun Song Cc: Matthew Wilcox Cc: Mike Kravetz Cc: Oscar Salvador Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit e83e62fb1f386ee0e3e0da327660d4a4bcc2af2e Author: Peter Collingbourne Date: Mon May 22 17:43:08 2023 -0700 mm: call arch_swap_restore() from do_swap_page() commit 6dca4ac6fc91fd41ea4d6c4511838d37f4e0eab2 upstream. Commit c145e0b47c77 ("mm: streamline COW logic in do_swap_page()") moved the call to swap_free() before the call to set_pte_at(), which meant that the MTE tags could end up being freed before set_pte_at() had a chance to restore them. Fix it by adding a call to the arch_swap_restore() hook before the call to swap_free(). Link: https://lkml.kernel.org/r/20230523004312.1807357-2-pcc@google.com Link: https://linux-review.googlesource.com/id/I6470efa669e8bd2f841049b8c61020c510678965 Fixes: c145e0b47c77 ("mm: streamline COW logic in do_swap_page()") Signed-off-by: Peter Collingbourne Reported-by: Qun-wei Lin Closes: https://lore.kernel.org/all/5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com/ Acked-by: David Hildenbrand Acked-by: "Huang, Ying" Reviewed-by: Steven Price Acked-by: Catalin Marinas Cc: [6.1+] Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 18822d84fd0931825e9640d757a47a93df1c97bb Author: Hugh Dickins Date: Sat Jul 8 16:04:00 2023 -0700 mm: lock newly mapped VMA with corrected ordering commit 1c7873e3364570ec89343ff4877e0f27a7b21a61 upstream. Lockdep is certainly right to complain about (&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f but task is already holding lock: (&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db Invert those to the usual ordering. Fixes: 33313a747e81 ("mm: lock newly mapped VMA which can be modified after it becomes visible") Cc: stable@vger.kernel.org Signed-off-by: Hugh Dickins Tested-by: Suren Baghdasaryan Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 406815be903b5d7edeffff9594eabf0db2035fe1 Author: Suren Baghdasaryan Date: Sat Jul 8 12:12:11 2023 -0700 mm: lock newly mapped VMA which can be modified after it becomes visible commit 33313a747e81af9f31d0d45de78c9397fa3655eb upstream. mmap_region adds a newly created VMA into VMA tree and might modify it afterwards before dropping the mmap_lock. This poses a problem for page faults handled under per-VMA locks because they don't take the mmap_lock and can stumble on this VMA while it's still being modified. Currently this does not pose a problem since post-addition modifications are done only for file-backed VMAs, which are not handled under per-VMA lock. However, once support for handling file-backed page faults with per-VMA locks is added, this will become a race. Fix this by write-locking the VMA before inserting it into the VMA tree. Other places where a new VMA is added into VMA tree do not modify it after the insertion, so do not need the same locking. Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 10bef9542ad3f38d6fb0919a44c413ddd3222814 Author: Suren Baghdasaryan Date: Sat Jul 8 12:12:10 2023 -0700 mm: lock a vma before stack expansion commit c137381f71aec755fbf47cd4e9bd4dce752c054c upstream. With recent changes necessitating mmap_lock to be held for write while expanding a stack, per-VMA locks should follow the same rules and be write-locked to prevent page faults into the VMA being expanded. Add the necessary locking. Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman