Skip to content
GitLab
Explore
Sign in
Register
Admin message
为了安全,强烈建议开启2FA双因子认证:User Settings -> Account -> Enable two-factor authentication!!!
Tags
Tags give the ability to mark specific points in history as being important
b5353
95e18884
·
CUDA: fix misaligned synchronization in FA (#13469)
·
May 12, 2025
b5352
df849192
·
ggml : add mrope kernel for metal (#13457)
·
May 12, 2025
b5351
14492144
·
enable dpcpp nightly builds with libraries (#13406)
·
May 12, 2025
b5350
c1040239
·
mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459)
·
May 12, 2025
b5349
9a390c48
·
tools : fix uninitialized llama_batch in server (#13436)
·
May 11, 2025
b5347
7474e00b
·
CUDA: fix crash with partial offloading of MoE (#13439)
·
May 11, 2025
b5346
7f323a58
·
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386)
·
May 11, 2025
b5345
3eac2093
·
mtmd : support InternVL 3 38B and 78B mmproj (#13443)
·
May 11, 2025
b5344
a634d75d
·
mtmd : move helpers to dedicated file (#13442)
·
May 11, 2025
b5342
0208355f
·
CUDA: fix race conditions FlashAttention kernels (#13438)
·
May 10, 2025
b5341
d2a4ef05
·
vocab : add ByteDance-Seed/Seed-Coder (#13423)
·
May 10, 2025
b5340
15e6125a
·
mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434)
·
May 10, 2025
b5338
43dfd741
·
llguidance : set tokenizer slices to default (#13424)
·
May 10, 2025
b5336
053367d1
·
mtmd : support InternVL 2.5 and 3 (#13422)
·
May 10, 2025
b5335
d8919424
·
CUDA: fix FlashAttention on Turing (#13415)
·
May 10, 2025
b5334
7fef1176
·
arg : add env var to control mmproj (#13416)
·
May 10, 2025
b5333
dc1d2adf
·
vulkan: scalar flash attention implementation (#13324)
·
May 10, 2025
b5332
7c28a74e
·
chore(llguidance): use tagged version that does not break the build (#13413)
·
May 09, 2025
b5331
33eff402
·
server : vision support via libmtmd (#12898)
·
May 09, 2025
b5330
17512a94
·
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858)
·
May 09, 2025
Prev
1
2
3
4
5
…
178
Next