\n\t<a id=\"abstract\" class=\"block pr-1.5 text-lg md:absolute md:p-1.5 md:opacity-0 md:group-hover:opacity-100 md:right-full\" href=\"#abstract\" rel=\"nofollow\">\n\t\t<span class=\"header-link\"><svg class=\"text-gray-500 hover:text-black dark:hover:text-gray-200 w-4\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" aria-hidden=\"true\" role=\"img\" width=\"1em\" height=\"1em\" preserveAspectRatio=\"xMidYMid meet\" viewBox=\"0 0 256 256\"><path d=\"M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z\" fill=\"currentColor\"></path></svg></span>\n\t</a>\n\t<span>\n\t\tAbstract\n\t</span>\n</h1>\n<p>Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characterizes many real-world applications. Flexible models overfit and rigid ones fail to adapt to the task at hand. We show that pretraining an in-context learner with a Perceiver-style architecture on synthetic data yields a model that can solve new tasks from a handful of labeled bags. At inference time, classification happens in a single forward pass and requires no gradient updates. We propose and investigate different synthetic data generators for bag-structured data and find that they capture complementary inductive biases. A model pretrained on a mixture of these generators inherits their per-task strengths and achieves the best average performance across twelve MIL benchmarks, outperforming supervised baselines that require task-specific training.</p>\n","updatedAt":"2026-06-10T17:07:02.918Z","author":{"_id":"63138443289cf15634c7f5c6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63138443289cf15634c7f5c6/P2pUgcSdhyv-qrmscn96-.png","fullname":"Marvin Sextro","name":"marvinsxtr","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9081824421882629},"editors":["marvinsxtr"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63138443289cf15634c7f5c6/P2pUgcSdhyv-qrmscn96-.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.06458","authors":[{"_id":"6a29938e6ae15f2243580979","name":"Alexander Möllers","hidden":false},{"_id":"6a29938e6ae15f224358097a","name":"Marvin Sextro","hidden":false},{"_id":"6a29938e6ae15f224358097b","name":"Julius Hense","hidden":false},{"_id":"6a29938e6ae15f224358097c","name":"Gabriel Dernbach","hidden":false},{"_id":"6a29938e6ae15f224358097d","name":"Klaus-Robert Müller","hidden":false}],"publishedAt":"2026-06-04T17:50:32.000Z","submittedOnDailyAt":"2026-06-10T00:00:00.000Z","title":"In-Context Multiple Instance Learning","submittedOnDailyBy":{"_id":"63138443289cf15634c7f5c6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63138443289cf15634c7f5c6/P2pUgcSdhyv-qrmscn96-.png","isPro":false,"fullname":"Marvin Sextro","user":"marvinsxtr","type":"user","name":"marvinsxtr"},"summary":"Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characterizes many real-world applications. Flexible models overfit and rigid ones fail to adapt to the task at hand. We show that pretraining an in-context learner with a Perceiver-style architecture on synthetic data yields a model that can solve new tasks from a handful of labeled bags. At inference time, classification happens in a single forward pass and requires no gradient updates. We propose and investigate different synthetic data generators for bag-structured data and find that they capture complementary inductive biases. A model pretrained on a mixture of these generators inherits their per-task strengths and achieves the best average performance across twelve MIL benchmarks, outperforming supervised baselines that require task-specific training.","upvotes":0,"discussionId":"6a29938e6ae15f224358097e","githubRepo":"https://github.com/injurise/ICMIL","githubRepoAddedBy":"user","ai_summary":"Pretraining a Perceiver-style architecture on synthetic bag-structured data enables efficient, task-adaptive classification from few labeled examples in multiple instance learning scenarios.","ai_keywords":["multiple instance learning","Perceiver-style architecture","synthetic data","in-context learning","bag-structured data","pretraining","few-shot learning","task adaptation"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":1},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.06458.md"}">
In-Context Multiple Instance Learning
Abstract
Pretraining a Perceiver-style architecture on synthetic bag-structured data enables efficient, task-adaptive classification from few labeled examples in multiple instance learning scenarios.
Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characterizes many real-world applications. Flexible models overfit and rigid ones fail to adapt to the task at hand. We show that pretraining an in-context learner with a Perceiver-style architecture on synthetic data yields a model that can solve new tasks from a handful of labeled bags. At inference time, classification happens in a single forward pass and requires no gradient updates. We propose and investigate different synthetic data generators for bag-structured data and find that they capture complementary inductive biases. A model pretrained on a mixture of these generators inherits their per-task strengths and achieves the best average performance across twelve MIL benchmarks, outperforming supervised baselines that require task-specific training.
Community
Abstract
Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characterizes many real-world applications. Flexible models overfit and rigid ones fail to adapt to the task at hand. We show that pretraining an in-context learner with a Perceiver-style architecture on synthetic data yields a model that can solve new tasks from a handful of labeled bags. At inference time, classification happens in a single forward pass and requires no gradient updates. We propose and investigate different synthetic data generators for bag-structured data and find that they capture complementary inductive biases. A model pretrained on a mixture of these generators inherits their per-task strengths and achieves the best average performance across twelve MIL benchmarks, outperforming supervised baselines that require task-specific training.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.06458 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.06458 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.06458 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.