为了安全，强烈建议开启2FA双因子认证：User Settings -> Account -> Enable two-factor authentication！！！

Tags

Tags give the ability to mark specific points in history as being important

b5353

95e18884 · CUDA: fix misaligned synchronization in FA (#13469) · May 12, 2025
b5352

df849192 · ggml : add mrope kernel for metal (#13457) · May 12, 2025
b5351

14492144 · enable dpcpp nightly builds with libraries (#13406) · May 12, 2025
b5350

c1040239 · mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459) · May 12, 2025
b5349

9a390c48 · tools : fix uninitialized llama_batch in server (#13436) · May 11, 2025
b5347

7474e00b · CUDA: fix crash with partial offloading of MoE (#13439) · May 11, 2025
b5346

7f323a58 · Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386) · May 11, 2025
b5345

3eac2093 · mtmd : support InternVL 3 38B and 78B mmproj (#13443) · May 11, 2025
b5344

a634d75d · mtmd : move helpers to dedicated file (#13442) · May 11, 2025
b5342

0208355f · CUDA: fix race conditions FlashAttention kernels (#13438) · May 10, 2025
b5341

d2a4ef05 · vocab : add ByteDance-Seed/Seed-Coder (#13423) · May 10, 2025
b5340

15e6125a · mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434) · May 10, 2025
b5338

43dfd741 · llguidance : set tokenizer slices to default (#13424) · May 10, 2025
b5336

053367d1 · mtmd : support InternVL 2.5 and 3 (#13422) · May 10, 2025
b5335

d8919424 · CUDA: fix FlashAttention on Turing (#13415) · May 10, 2025
b5334

7fef1176 · arg : add env var to control mmproj (#13416) · May 10, 2025
b5333

dc1d2adf · vulkan: scalar flash attention implementation (#13324) · May 10, 2025
b5332

7c28a74e · chore(llguidance): use tagged version that does not break the build (#13413) · May 09, 2025
b5331

33eff402 · server : vision support via libmtmd (#12898) · May 09, 2025
b5330

17512a94 · sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858) · May 09, 2025

1
2
3
4
5
…
178