Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: bundle CUDA DLL into the release #62

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
52a48da
fix: bundle CUDA DLL into the release
louisgv Jul 2, 2023
43330d4
Merge branch 'main' into 61-bug-cuda-dlls
louisgv Jul 3, 2023
4d94a47
Merge branch 'main' into 61-bug-cuda-dlls
louisgv Jul 14, 2023
953af83
Merge branch 'main' into 61-bug-cuda-dlls
LLukas22 Jul 18, 2023
4b71716
Update `rustformers` + check gpu
LLukas22 Jul 18, 2023
4b8fe59
Set `n_batch` correctly
LLukas22 Jul 18, 2023
187b135
Copy cuda libraries
LLukas22 Jul 20, 2023
9343897
reduce feeding delay if gpu is enabled
LLukas22 Jul 21, 2023
a2a3dbf
Copy `opencl` dlls
LLukas22 Jul 21, 2023
a8b3bbf
create linux ci
LLukas22 Jul 21, 2023
21ae9e1
defaults for release infos
LLukas22 Jul 21, 2023
286574d
Fail if files aren't found
LLukas22 Jul 21, 2023
86cc051
Add windows build
LLukas22 Jul 21, 2023
47f9dfc
Macos build
LLukas22 Jul 21, 2023
7c1f25a
ci bugfixes
LLukas22 Jul 22, 2023
36e050b
More bugfixes and absolute paths
LLukas22 Jul 22, 2023
0b26205
Paths .... again
LLukas22 Jul 22, 2023
cc786f0
Make mac artifacts unique
LLukas22 Jul 22, 2023
89eb1fa
renable build for windows-cublas
LLukas22 Jul 22, 2023
0761d79
update character
louisgv Jul 30, 2023
7481edf
Slight refactor
louisgv Aug 1, 2023
9d23cfd
update character
louisgv Aug 2, 2023
5b51725
update llm
louisgv Aug 2, 2023
006cd5a
Merge branch 'main' into 61-bug-cuda-dlls
louisgv Sep 16, 2023
9b8d16d
fix build script
louisgv Sep 16, 2023
bc5edf6
use self-hosted runner for metal
louisgv Sep 16, 2023
18f04ed
remove build on push (consume too much compute atm)
louisgv Sep 16, 2023
1211cc2
Add todo
louisgv Sep 16, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Set n_batch correctly
LLukas22 committed Jul 18, 2023
commit 4b8fe59aaaae16de201df121c4995bfc969e6a04
21 changes: 17 additions & 4 deletions apps/desktop/src-tauri/src/inference/process.rs
Original file line number Diff line number Diff line change
@@ -65,9 +65,6 @@ impl InferenceThreadRequest {
fn get_inference_params(
completion_request: &CompletionRequest,
) -> InferenceParameters {
let n_threads = model::pool::get_n_threads();
let n_batch = if get_use_gpu() { 240 } else { n_threads };

InferenceParameters {
sampler: Arc::new(completion_request.to_top_p_top_k()),
}
@@ -92,7 +89,23 @@ pub fn start(req: InferenceThreadRequest) -> JoinHandle<()> {
}
};

let mut session = model.start_session(Default::default());
let n_threads = model::pool::get_n_threads();

// set the batch_size according to the accelerator
let backend = llm::ggml_get_accelerator();
let n_batch = match backend{
llm::GgmlAccelerator::Metal => if get_use_gpu() {1} else {n_threads}, // 1 is the only supported batch size for Metal
llm::GgmlAccelerator::None => n_threads,
_ => if get_use_gpu() {512} else {n_threads}
};

let session_config = llm::InferenceSessionConfig {
n_batch: n_batch,
n_threads: n_threads,
..Default::default()
};

let mut session = model.start_session(session_config);

let mut output_request = OutputRequest::default();