Mmproj support?</p>\n","updatedAt":"2025-12-11T19:14:03.697Z","author":{"_id":"647f6e094da748fc4abddac4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/fRIuBNA7z9d2-PRQx_AaK.png","fullname":"Bukit Sorrento","name":"bukit","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":21,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3858118951320648},"editors":["bukit"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/fRIuBNA7z9d2-PRQx_AaK.png"],"reactions":[{"reaction":"👀","users":["hroch","fujohnwang"],"count":2},{"reaction":"👍","users":["Chaoses-Ib"],"count":1}],"isReport":false},"replies":[{"id":"693bd7e5429f88442aba1ae0","author":{"_id":"66fdde69da291ca087e65a8e","avatarUrl":"/avatars/dc871417c5c959c2ec845a185ff81348.svg","fullname":"Shawn Beltz","name":"sbeltz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2025-12-12T08:52:53.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Supported via presets.ini, where you can specify the mmproj (and other long and short arguments) per model.","html":"<p>Supported via presets.ini, where you can specify the mmproj (and other long and short arguments) per model.</p>\n","updatedAt":"2025-12-12T08:52:53.571Z","author":{"_id":"66fdde69da291ca087e65a8e","avatarUrl":"/avatars/dc871417c5c959c2ec845a185ff81348.svg","fullname":"Shawn Beltz","name":"sbeltz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3627573251724243},"editors":["sbeltz"],"editorAvatarUrls":["/avatars/dc871417c5c959c2ec845a185ff81348.svg"],"reactions":[{"reaction":"🔥","users":["bukit","echos-keeper","RooooM2"],"count":3}],"isReport":false,"parentCommentId":"693b17fb30a8fb1087cfa5f0"}},{"id":"693be048d0e9ccbdaba6382c","author":{"_id":"6449cbf4eb7db8f70fb8a396","avatarUrl":"/avatars/f7540cf1ef2370e402df13b3587384f9.svg","fullname":"grailfinder","name":"grailfinder","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-12-12T09:28:40.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"https://github.com/ggml-org/llama.cpp/tree/master/tools/server#using-multiple-models\n```\nmodels_directory\n │\n │ # single file\n ├─ llama-3.2-1b-Q4_K_M.gguf\n ├─ Qwen3-8B-Q4_K_M.gguf\n │\n │ # multimodal\n ├─ gemma-3-4b-it-Q8_0\n │ ├─ gemma-3-4b-it-Q8_0.gguf\n │ └─ mmproj-F16.gguf # file name must start with \"mmproj\"\n```","html":"<p><a href=\"https://github.com/ggml-org/llama.cpp/tree/master/tools/server#using-multiple-models\" rel=\"nofollow\">https://github.com/ggml-org/llama.cpp/tree/master/tools/server#using-multiple-models</a></p>\n<pre><code>models_directory\n │\n │ # single file\n ├─ llama-3.2-1b-Q4_K_M.gguf\n ├─ Qwen3-8B-Q4_K_M.gguf\n │\n │ # multimodal\n ├─ gemma-3-4b-it-Q8_0\n │ ├─ gemma-3-4b-it-Q8_0.gguf\n │ └─ mmproj-F16.gguf # file name must start with \"mmproj\"\n</code></pre>\n","updatedAt":"2025-12-12T09:28:40.933Z","author":{"_id":"6449cbf4eb7db8f70fb8a396","avatarUrl":"/avatars/f7540cf1ef2370e402df13b3587384f9.svg","fullname":"grailfinder","name":"grailfinder","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.2843471169471741},"editors":["grailfinder"],"editorAvatarUrls":["/avatars/f7540cf1ef2370e402df13b3587384f9.svg"],"reactions":[{"reaction":"🤝","users":["bukit"],"count":1}],"isReport":false,"parentCommentId":"693b17fb30a8fb1087cfa5f0"}},{"id":"693c1e991a2029b5be317090","author":{"_id":"63ca214abedad7e2bf1d1517","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674191139776-noauth.png","fullname":"Xuan-Son Nguyen","name":"ngxson","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":605,"isUserFollowing":false},"createdAt":"2025-12-12T13:54:33.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"mmproj is also automatically selected for cached models, downloaded via the `-hf user/model` option","html":"<p>mmproj is also automatically selected for cached models, downloaded via the <code>-hf user/model</code> option</p>\n","updatedAt":"2025-12-12T13:54:33.415Z","author":{"_id":"63ca214abedad7e2bf1d1517","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674191139776-noauth.png","fullname":"Xuan-Son Nguyen","name":"ngxson","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":605,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8438642024993896},"editors":["ngxson"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674191139776-noauth.png"],"reactions":[{"reaction":"❤️","users":["bukit"],"count":1}],"isReport":false,"parentCommentId":"693b17fb30a8fb1087cfa5f0"}},{"id":"693d0e71a45cf6ced178383b","author":{"_id":"686c460ba3fc457ad14ab6f8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/686c460ba3fc457ad14ab6f8/QFa608PO_fSX8WCmi_T1s.jpeg","fullname":"Tyler Williams","name":"unmodeled-tyler","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":109,"isUserFollowing":false},"createdAt":"2025-12-13T06:57:53.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","hiddenReason":"Off-Topic","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2025-12-13T06:58:36.909Z","author":{"_id":"686c460ba3fc457ad14ab6f8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/686c460ba3fc457ad14ab6f8/QFa608PO_fSX8WCmi_T1s.jpeg","fullname":"Tyler Williams","name":"unmodeled-tyler","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":109,"isUserFollowing":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[],"parentCommentId":"693b17fb30a8fb1087cfa5f0"}}]},{"id":"693bd9ffdb3bf2535fd0cf48","author":{"_id":"66fdde69da291ca087e65a8e","avatarUrl":"/avatars/dc871417c5c959c2ec845a185ff81348.svg","fullname":"Shawn Beltz","name":"sbeltz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2025-12-12T09:01:51.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Awesome new feature! Can model selection be done on something other than requested model name? Like maybe specify the ranking in presets.ini, and then the highest ranked model that can satisfy the request will be the default. So maybe one model is best for short context, another (or the same with other settings) for when the context gets too long, and another when image input is required.","html":"<p>Awesome new feature! Can model selection be done on something other than requested model name? Like maybe specify the ranking in presets.ini, and then the highest ranked model that can satisfy the request will be the default. So maybe one model is best for short context, another (or the same with other settings) for when the context gets too long, and another when image input is required.</p>\n","updatedAt":"2025-12-12T09:01:51.466Z","author":{"_id":"66fdde69da291ca087e65a8e","avatarUrl":"/avatars/dc871417c5c959c2ec845a185ff81348.svg","fullname":"Shawn Beltz","name":"sbeltz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8926734924316406},"editors":["sbeltz"],"editorAvatarUrls":["/avatars/dc871417c5c959c2ec845a185ff81348.svg"],"reactions":[{"reaction":"➕","users":["israellaguan","drw001","GeorgSim"],"count":3}],"isReport":false}},{"id":"693c387ddb3bf2535fd0cf56","author":{"_id":"655da2b8668b64adf1438cbc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655da2b8668b64adf1438cbc/R25eK22XbzyrhhYrhNnCt.jpeg","fullname":"v1k","name":"xbruce22","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false},"createdAt":"2025-12-12T15:45:01.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is good addition, Thank you.","html":"<p>This is good addition, Thank you.</p>\n","updatedAt":"2025-12-12T15:45:01.405Z","author":{"_id":"655da2b8668b64adf1438cbc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655da2b8668b64adf1438cbc/R25eK22XbzyrhhYrhNnCt.jpeg","fullname":"v1k","name":"xbruce22","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9924921989440918},"editors":["xbruce22"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/655da2b8668b64adf1438cbc/R25eK22XbzyrhhYrhNnCt.jpeg"],"reactions":[{"reaction":"👍","users":["RooooM2"],"count":1}],"isReport":false}},{"id":"693c57ea6107ec9c17bb2879","author":{"_id":"65a488b5224f96d8cc3754fc","avatarUrl":"/avatars/cf21cf2c8f1c9d5a8fb35761acdef04b.svg","fullname":"Emin Temiz","name":"etemiz","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":99,"isUserFollowing":false},"createdAt":"2025-12-12T17:59:06.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"what is the best way to get <think> </think> and the tokens in between? openAI library is removing them.. i want to run llama-server in console and talk to it using a python library that does not remove the thinking tokens.\n\ni checked the llama-cpp-python but it does not have that.","html":"<p>what is the best way to get <think> </think> and the tokens in between? openAI library is removing them.. i want to run llama-server in console and talk to it using a python library that does not remove the thinking tokens.</p>\n<p>i checked the llama-cpp-python but it does not have that.</p>\n","updatedAt":"2025-12-12T18:01:42.014Z","author":{"_id":"65a488b5224f96d8cc3754fc","avatarUrl":"/avatars/cf21cf2c8f1c9d5a8fb35761acdef04b.svg","fullname":"Emin Temiz","name":"etemiz","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":99,"isUserFollowing":false}},"numEdits":4,"identifiedLanguage":{"language":"en","probability":0.956532895565033},"editors":["etemiz"],"editorAvatarUrls":["/avatars/cf21cf2c8f1c9d5a8fb35761acdef04b.svg"],"reactions":[],"isReport":false},"replies":[{"id":"69418941cd121096018fdaed","author":{"_id":"655da2b8668b64adf1438cbc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655da2b8668b64adf1438cbc/R25eK22XbzyrhhYrhNnCt.jpeg","fullname":"v1k","name":"xbruce22","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false},"createdAt":"2025-12-16T16:30:57.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"llama-server by default in most implementation keeps the reasoning content in `reasoning_content` variable in response attribute. You can get it from there. Otherwise use reasoning-format flag and pass DeepSeek value to get pure <think> tokens","html":"<p>llama-server by default in most implementation keeps the reasoning content in <code>reasoning_content</code> variable in response attribute. You can get it from there. Otherwise use reasoning-format flag and pass DeepSeek value to get pure tokens</p>\n","updatedAt":"2025-12-16T16:30:57.881Z","author":{"_id":"655da2b8668b64adf1438cbc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655da2b8668b64adf1438cbc/R25eK22XbzyrhhYrhNnCt.jpeg","fullname":"v1k","name":"xbruce22","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7589544653892517},"editors":["xbruce22"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/655da2b8668b64adf1438cbc/R25eK22XbzyrhhYrhNnCt.jpeg"],"reactions":[{"reaction":"❤️","users":["etemiz","mechanicmuthu"],"count":2}],"isReport":false,"parentCommentId":"693c57ea6107ec9c17bb2879"}}]},{"id":"693cd851a1d453e27f52b22c","author":{"_id":"630c2a12910e17bbfeb1ce18","avatarUrl":"/avatars/0c1dd3ebc0e2c8ecf6c771d3728accf9.svg","fullname":"Razvan","name":"razvanab","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":21,"isUserFollowing":false},"createdAt":"2025-12-13T03:06:57.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Now I can use llama.cpp all the time. A big thank you to the devs.","html":"<p>Now I can use llama.cpp all the time. A big thank you to the devs.</p>\n","updatedAt":"2025-12-13T03:06:57.498Z","author":{"_id":"630c2a12910e17bbfeb1ce18","avatarUrl":"/avatars/0c1dd3ebc0e2c8ecf6c771d3728accf9.svg","fullname":"Razvan","name":"razvanab","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":21,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8091885447502136},"editors":["razvanab"],"editorAvatarUrls":["/avatars/0c1dd3ebc0e2c8ecf6c771d3728accf9.svg"],"reactions":[{"reaction":"😎","users":["pointaveugle"],"count":1}],"isReport":false}},{"id":"693ce302a45cf6ced1783833","author":{"_id":"66fdde69da291ca087e65a8e","avatarUrl":"/avatars/dc871417c5c959c2ec845a185ff81348.svg","fullname":"Shawn Beltz","name":"sbeltz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2025-12-13T03:52:34.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Is there currently a way to have a \"default\" model if the request doesn't specify? Could be the currently loaded model or a specific model. (Just noticed one of my apps broke because it's used to llama-server not requiring a model name.)","html":"<p>Is there currently a way to have a \"default\" model if the request doesn't specify? Could be the currently loaded model or a specific model. (Just noticed one of my apps broke because it's used to llama-server not requiring a model name.)</p>\n","updatedAt":"2025-12-13T03:52:34.545Z","author":{"_id":"66fdde69da291ca087e65a8e","avatarUrl":"/avatars/dc871417c5c959c2ec845a185ff81348.svg","fullname":"Shawn Beltz","name":"sbeltz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9123566746711731},"editors":["sbeltz"],"editorAvatarUrls":["/avatars/dc871417c5c959c2ec845a185ff81348.svg"],"reactions":[{"reaction":"➕","users":["israellaguan"],"count":1}],"isReport":false},"replies":[{"id":"69670316648f6122d5acfb82","author":{"_id":"64a3c25d73f3ad435c3d41e6","avatarUrl":"/avatars/3e1bc5b737b6f6a159197228f34815a2.svg","fullname":"Jim Jones","name":"milksteak1111","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2026-01-14T02:44:38.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"# This seems to work\n[DEFAULT]\nport = 8080\nn-gpu-layers = -1\ndevice = 0\nflash-attn = on\nchat-template = jinja\nmodels-max = 4\n","html":"<h1 class=\"relative group flex items-baseline\">\n\t<a id=\"this-seems-to-work\" class=\"block pr-1.5 text-lg md:absolute md:p-1.5 md:opacity-0 md:group-hover:opacity-100 md:right-full\" href=\"#this-seems-to-work\" rel=\"nofollow\">\n\t\t<span class=\"header-link\"><svg class=\"text-gray-500 hover:text-black dark:hover:text-gray-200 w-4\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" aria-hidden=\"true\" role=\"img\" width=\"1em\" height=\"1em\" preserveAspectRatio=\"xMidYMid meet\" viewBox=\"0 0 256 256\"><path d=\"M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z\" fill=\"currentColor\"></path></svg></span>\n\t</a>\n\t<span>\n\t\tThis seems to work\n\t</span>\n</h1>\n<p>[DEFAULT]<br>port = 8080<br>n-gpu-layers = -1<br>device = 0<br>flash-attn = on<br>chat-template = jinja<br>models-max = 4</p>\n","updatedAt":"2026-01-14T02:44:38.971Z","author":{"_id":"64a3c25d73f3ad435c3d41e6","avatarUrl":"/avatars/3e1bc5b737b6f6a159197228f34815a2.svg","fullname":"Jim Jones","name":"milksteak1111","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.895008385181427},"editors":["milksteak1111"],"editorAvatarUrls":["/avatars/3e1bc5b737b6f6a159197228f34815a2.svg"],"reactions":[],"isReport":false,"parentCommentId":"693ce302a45cf6ced1783833"}}]},{"id":"693eada2a45cf6ced1783860","author":{"_id":"6352f0877489e19b128963c0","avatarUrl":"/avatars/fe25696ece5fdf135c5809b158edeec4.svg","fullname":"Erik","name":"eribob","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2025-12-14T12:29:22.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Does it unload the current model if VRAM is full, to allow swapping to a new model? ","html":"<p>Does it unload the current model if VRAM is full, to allow swapping to a new model? </p>\n","updatedAt":"2025-12-14T12:29:22.609Z","author":{"_id":"6352f0877489e19b128963c0","avatarUrl":"/avatars/fe25696ece5fdf135c5809b158edeec4.svg","fullname":"Erik","name":"eribob","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8235758543014526},"editors":["eribob"],"editorAvatarUrls":["/avatars/fe25696ece5fdf135c5809b158edeec4.svg"],"reactions":[{"reaction":"👍","users":["Chaoses-Ib","byob75"],"count":2},{"reaction":"👀","users":["ELigoP"],"count":1}],"isReport":false}},{"id":"694037884d3a55b8d4ec7c65","author":{"_id":"64548986cd09ceba0e1709cb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64548986cd09ceba0e1709cb/muGiatjmPfzxYb3Rjcqas.jpeg","fullname":"www.minds.com/jelyazko/","name":"21world","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":817,"isUserFollowing":false},"createdAt":"2025-12-15T16:30:00.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"fun ideas , add personal avatar and p2p social network also emule p2p models storage ","html":"<p>fun ideas , add personal avatar and p2p social network also emule p2p models storage </p>\n","updatedAt":"2025-12-15T16:30:00.613Z","author":{"_id":"64548986cd09ceba0e1709cb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64548986cd09ceba0e1709cb/muGiatjmPfzxYb3Rjcqas.jpeg","fullname":"www.minds.com/jelyazko/","name":"21world","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":817,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7730856537818909},"editors":["21world"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64548986cd09ceba0e1709cb/muGiatjmPfzxYb3Rjcqas.jpeg"],"reactions":[],"isReport":false}},{"id":"6940387bc7e128b6723e5798","author":{"_id":"64548986cd09ceba0e1709cb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64548986cd09ceba0e1709cb/muGiatjmPfzxYb3Rjcqas.jpeg","fullname":"www.minds.com/jelyazko/","name":"21world","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":817,"isUserFollowing":false},"createdAt":"2025-12-15T16:34:03.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","hiddenReason":"Off-Topic","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2025-12-22T13:09:20.913Z","author":{"_id":"64548986cd09ceba0e1709cb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64548986cd09ceba0e1709cb/muGiatjmPfzxYb3Rjcqas.jpeg","fullname":"www.minds.com/jelyazko/","name":"21world","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":817,"isUserFollowing":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}},{"id":"694e6381cd7bb2956d912b9a","author":{"_id":"6758a9850e3fff481964ca6d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/EolfJfjW25hC4Bt_hCPq8.png","fullname":"Jean Louis","name":"JLouisBiz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":91,"isUserFollowing":false},"createdAt":"2025-12-26T10:29:21.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Hey there! Just wanted to drop a quick note saying I'm really digging the new router mode in llama.cpp server. It's a game-changer for me, especially when I need to switch between different models. The auto-discovery of models and LRU eviction is pretty neat – no more manual updates or restarts needed. It's like having a dynamic model manager on-the-fly. And the request routing part? Brilliant! Makes my workflow with dmenu smoother. Check out the full experience and check out my dmenu launcher script on the project's GitHub: https://gitea.com/gnusupport/LLM-Helpers/src/branch/main/bin/rcd-llm-dmenu-launcher.sh\n\nIt's a win for sure.","html":"<p>Hey there! Just wanted to drop a quick note saying I'm really digging the new router mode in llama.cpp server. It's a game-changer for me, especially when I need to switch between different models. The auto-discovery of models and LRU eviction is pretty neat – no more manual updates or restarts needed. It's like having a dynamic model manager on-the-fly. And the request routing part? Brilliant! Makes my workflow with dmenu smoother. Check out the full experience and check out my dmenu launcher script on the project's GitHub: <a href=\"https://gitea.com/gnusupport/LLM-Helpers/src/branch/main/bin/rcd-llm-dmenu-launcher.sh\" rel=\"nofollow\">https://gitea.com/gnusupport/LLM-Helpers/src/branch/main/bin/rcd-llm-dmenu-launcher.sh</a></p>\n<p>It's a win for sure.</p>\n","updatedAt":"2025-12-26T10:29:21.650Z","author":{"_id":"6758a9850e3fff481964ca6d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/EolfJfjW25hC4Bt_hCPq8.png","fullname":"Jean Louis","name":"JLouisBiz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":91,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8885526657104492},"editors":["JLouisBiz"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/EolfJfjW25hC4Bt_hCPq8.png"],"reactions":[],"isReport":false}},{"id":"695927029918266addc03a7e","author":{"_id":"641e5f295f274a0a92c3082a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641e5f295f274a0a92c3082a/5hDWXza2OsEo6r6HkfZBt.png","fullname":"Melvin Vivas","name":"melvindave","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":37,"isUserFollowing":false},"createdAt":"2026-01-03T14:26:10.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"thanks for the update! does it now behave like ollama?","html":"<p>thanks for the update! does it now behave like ollama?</p>\n","updatedAt":"2026-01-03T14:26:10.520Z","author":{"_id":"641e5f295f274a0a92c3082a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641e5f295f274a0a92c3082a/5hDWXza2OsEo6r6HkfZBt.png","fullname":"Melvin Vivas","name":"melvindave","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":37,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7073429822921753},"editors":["melvindave"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/641e5f295f274a0a92c3082a/5hDWXza2OsEo6r6HkfZBt.png"],"reactions":[],"isReport":false}},{"id":"69ad89195a16ee5ecfd75fd7","author":{"_id":"66a04da94bb7945d6aa74219","avatarUrl":"/avatars/0c5eb65c6ce9bc2fcff9b90a75b10e1b.svg","fullname":"Morgan","name":"MagicMorgan","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-03-08T14:35:05.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Thank you so much for this, it's great!","html":"<p>Thank you so much for this, it's great!</p>\n","updatedAt":"2026-03-08T14:35:05.643Z","author":{"_id":"66a04da94bb7945d6aa74219","avatarUrl":"/avatars/0c5eb65c6ce9bc2fcff9b90a75b10e1b.svg","fullname":"Morgan","name":"MagicMorgan","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9852364659309387},"editors":["MagicMorgan"],"editorAvatarUrls":["/avatars/0c5eb65c6ce9bc2fcff9b90a75b10e1b.svg"],"reactions":[],"isReport":false}},{"id":"69b1efa220bc7254cf4f318f","author":{"_id":"698a44d5e33dc8b5a68c2e1f","avatarUrl":"/avatars/69983ce5cda94234f14cdc11891ad31b.svg","fullname":"waimond fung","name":"akeni23","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-03-11T22:41:38.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"I want to specifically pin models to a specific GPU (I have multiple) is that possible?","html":"<p>I want to specifically pin models to a specific GPU (I have multiple) is that possible?</p>\n","updatedAt":"2026-03-11T22:41:38.771Z","author":{"_id":"698a44d5e33dc8b5a68c2e1f","avatarUrl":"/avatars/69983ce5cda94234f14cdc11891ad31b.svg","fullname":"waimond fung","name":"akeni23","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9762904047966003},"editors":["akeni23"],"editorAvatarUrls":["/avatars/69983ce5cda94234f14cdc11891ad31b.svg"],"reactions":[],"isReport":false}}],"status":"open","isReport":false,"pinned":false,"locked":false,"collection":"community_blogs"},"contextAuthors":["ngxson","victor"],"primaryEmailConfirmed":false,"discussionRole":0,"acceptLanguages":["en"],"withThread":true,"cardDisplay":false,"repoDiscussionsLocked":false}">
Supported via presets.ini, where you can specify the mmproj (and other long and short arguments) per model.
Awesome new feature! Can model selection be done on something other than requested model name? Like maybe specify the ranking in presets.ini, and then the highest ranked model that can satisfy the request will be the default. So maybe one model is best for short context, another (or the same with other settings) for when the context gets too long, and another when image input is required.
This is good addition, Thank you.
what is the best way to get <think> </think> and the tokens in between? openAI library is removing them.. i want to run llama-server in console and talk to it using a python library that does not remove the thinking tokens.
i checked the llama-cpp-python but it does not have that.
llama-server by default in most implementation keeps the reasoning content in reasoning_content variable in response attribute. You can get it from there. Otherwise use reasoning-format flag and pass DeepSeek value to get pure tokens
Now I can use llama.cpp all the time. A big thank you to the devs.
Is there currently a way to have a "default" model if the request doesn't specify? Could be the currently loaded model or a specific model. (Just noticed one of my apps broke because it's used to llama-server not requiring a model name.)
This seems to work
[DEFAULT]
port = 8080
n-gpu-layers = -1
device = 0
flash-attn = on
chat-template = jinja
models-max = 4
Does it unload the current model if VRAM is full, to allow swapping to a new model?
fun ideas , add personal avatar and p2p social network also emule p2p models storage
This comment has been hidden (marked as Off-Topic) Hey there! Just wanted to drop a quick note saying I'm really digging the new router mode in llama.cpp server. It's a game-changer for me, especially when I need to switch between different models. The auto-discovery of models and LRU eviction is pretty neat – no more manual updates or restarts needed. It's like having a dynamic model manager on-the-fly. And the request routing part? Brilliant! Makes my workflow with dmenu smoother. Check out the full experience and check out my dmenu launcher script on the project's GitHub: https://gitea.com/gnusupport/LLM-Helpers/src/branch/main/bin/rcd-llm-dmenu-launcher.sh
It's a win for sure.
thanks for the update! does it now behave like ollama?
Thank you so much for this, it's great!
I want to specifically pin models to a specific GPU (I have multiple) is that possible?
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.