How to diagnose crashes in FB 5 running in container #8398
Replies: 19 comments 4 replies
-
Nick, I can't give exact advice about getting core dumps in container, but please begin with starting any trivial process in that container, next kill it with SIGABRT and make sure you get core dump from it. After it you will sooner of all get firebird's dump too. If not - please return here with your problem. |
Beta Was this translation helpful? Give feedback.
-
Thanks @AlexPeshkoff. Your help as enabled me to make some good progress =) I am now able to generate core dumps of I am getting a few warnings though. I'm not sure if they are important:
Anyway, using the bt command I get the following:
It might be time for me to open an issue now. I'll dig around a bit more before I do though. For anyone else in a similar position, here's how I went about getting core dumps working in my container:
but this would not generate core dump files. Thanks to this SO answer I found that adding
I then had to copy /tmp/firebird and the core dump from the container to WSL.
|
Beta Was this translation helpful? Give feedback.
-
On 1/21/25 07:36, Nick Barrett wrote:
Thanks @AlexPeshkoff <https://github.com/AlexPeshkoff>. Your help as
enabled me to make some good progress =)
I am now able to generate core dumps of |/opt/firebird/bin/firebird|.
I have also got gdb kind of working.
I am getting a few warnings though. I'm not sure if they are important:
|warning: Can't open file /tmp/firebird/fb50_trace during file-backed
mapping note processing|
This one is ok.
|warning: Build-id of /lib/x86_64-linux-gnu/libgcc_s.so.1 does not
match core file. warning: .dynamic section for
"/lib/x86_64-linux-gnu/libgcc_s.so.1" is not at the expected address
(wrong library or version mismatch?) warning: Build-id of
/lib/x86_64-linux-gnu/libstdc++.so.6 does not match core file.|
This 3 are not good. Such very often cause invalid stack traces.
I then had to copy /tmp/firebird and the core dump from the container
to WSL.
IMO container & WSL have different versions of system libs. That causes
mentioned warnings.
Now I can run gdb as follows
|# cd /tmp # sudo gdb /opt/firebird/bin/firebird
core-firebird.7.8dca622e3e3e.1737412835|
You should copy problematic system libraries from container into some
safe place in WSL. Do NOT overwrite existing libraries!
After it do approximately so:
set sysroot /that/place
in gdb, and it should load correct copy of libraries.
|
Beta Was this translation helpful? Give feedback.
-
Thank you again for your guidance. I've made some more progress. I tried copying the file from x86_64-linux-gnu in the container to a seperate l directory in WSL but I just couldn't get it working. In the end I came up with a simpler solution. Installing gdc in the container. Don't know why I didn't try it earlier 🤦 Anyway, now when I run gdb in the container it gives me the following output
No warnings 🥳
Pretty much the same as before. With your help though, I think we can now trust the stack trace 👍 I have internal functions loaded from a UDR I made using free pascal. Is that maybe why I'm getting the |
Beta Was this translation helpful? Give feedback.
-
I "think" I've managed to build a debug version of my udr. The normal version is 660kB and the debug version is 4.6MB. It doesn't include .so.debug though, just the .so file so mayne I'm still missing something. I'll keep investigating |
Beta Was this translation helpful? Give feedback.
-
The result of using
I think that means my debug build should be able to give me info if gdb if it was causing the seg fault. If that's true, then it might not be my udr that causing my problem. I'm not sure what to look at next. 😔 |
Beta Was this translation helpful? Give feedback.
-
On 1/22/25 08:01, Nick Barrett wrote:
Thank you again for your guidance.
I've made some more progress.
I tried copying the file from x86_64-linux-gnu in the container to a
seperate l directory in WSL but I just couldn't get it working. In the
end I came up with a simpler solution. Installing gdc in the
container. Don't know why I didn't try it earlier 🤦
Anyway, now when I run gdb in the container it gives me the following
output
|# sudo gdb /opt/firebird/bin/firebird
core-firebird.7.3a0bc0a582ee.1737520419 GNU gdb (Ubuntu
15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git Copyright (C) 2024
Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or
later <http://gnu.org/licenses/gpl.html> This is free software: you
are free to change and redistribute it. There is NO WARRANTY, to the
extent permitted by law. Type "show copying" and "show warranty" for
details. This GDB was configured as "x86_64-linux-gnu". Type "show
configuration" for configuration details. For bug reporting
instructions, please see: <https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>. For help, type
"help". Type "apropos word" to search for commands related to
"word"... Reading symbols from /opt/firebird/bin/firebird... Reading
symbols from /opt/firebird/bin/.debug/firebird.debug... [New LWP 530] |
.......
|[New LWP 495] [Thread debugging using libthread_db enabled] Using
host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/opt/firebird/bin/firebird'. Program terminated
with signal SIGSEGV, Segmentation fault. #0 0x00007f41fd39bee0 in ??
() [Current thread is 1 (Thread 0x7f41e6bbc6c0 (LWP 530))] |
No warnings 🥳
When I run the bt command Iget
|(gdb) bt #0 0x00007f41fd39bee0 in ?? () #1 0x00007f42297d9330 in
__GI___nptl_deallocate_tsd () at ./nptl/nptl_deallocate_tsd.c:73 #2
__GI___nptl_deallocate_tsd () at ./nptl/nptl_deallocate_tsd.c:22 #3
0x00007f42297dc880 in start_thread (arg=<optimized out>) at
./nptl/pthread_create.c:455 #4 0x00007f4229869c3c in clone3 () at
../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 |
Pretty much the same as before. With your help though, I think we can
now trust the stack trace 👍
Yes, trace appears correct.
I have internal functions loaded from a UDR I made using free pascal.
Is that maybe why I'm getting the |0x00007f41fd39bee0 in ?? ()|?
The UDR was not built with debugger support. I'll try and add this now
and report back if it made any difference.
UDR may cause any effect - but there is no code from it in this stack,
i.e. using debug version hardly helps. BTW, looks like you debug version
was built correctly.
In this particular cse I doubt that's due to UDR. May be if you use TLS
in it... ?
Reading this https://bugzilla.redhat.com/show_bug.cgi?id=1065695 looks
like such errorstook place in some old glibc versions. May be your
container is using one?
Also try to do
thr apply all bt full
in gdb. Will cause a lo-o-ong output - but may be gives some idea what
happens.
|
Beta Was this translation helpful? Give feedback.
-
I ran
This would seem to be okay? I tried
|
Beta Was this translation helpful? Give feedback.
-
I haven't really commented much on when the crash is occuring. Can I discuss that here, or should I open an issue? |
Beta Was this translation helpful? Give feedback.
-
On 1/23/25 00:10, Nick Barrett wrote:
I ran |ldd --version| and the output was
|# ldd --version ldd (Ubuntu GLIBC 2.39-0ubuntu8.3) 2.39 |
This would seem to be okay?
Yes, more than enough fresh, just to make sure - is it in container or
in WSL?
I tried |thr apply all bt full| in gdb and you're correct. That is a
very long output 😅
Here's what came out
I see ine very interesting thread:
|Thread 13 (Thread 0x7f8bb37fe6c0 (LWP 368)): #0 0x00007f8c151642a4 in
SYSTEM_$$_FPSYSCALL$INT64$INT64$INT64$$INT64 () from
/opt/firebird/plugins/udr/libops_udr.so No symbol table info
available. #1 0x00007f8c15165395 in
SYSTEM_$$_FPMUNMAP$POINTER$QWORD$$LONGINT () from
/opt/firebird/plugins/udr/libops_udr.so No symbol table info
available. #2 0x00007f8bb37fea18 in ?? () No symbol table info
available. #3 0x00007f8c1518a8bf in CTHREADS_$$_CRELEASETHREADVARS ()
from /opt/firebird/plugins/udr/libops_udr.so No symbol table info
available. #4 0x00007f8bb37fea18 in ?? () No symbol table info
available. #5 0x00007f8c1517ce7b in SYSTEM_$$_DONETHREAD () from
/opt/firebird/plugins/udr/libops_udr.so No symbol table info
available. #6 0x00007f8bb37fde40 in ?? () No symbol table info
available. #7 0x00007f8c1518a7e9 in CTHREADS_$$_CTHREADCLEANUP$POINTER
() from /opt/firebird/plugins/udr/libops_udr.so No symbol table info
available. #8 0x00007f8bb37fde40 in ?? () No symbol table info
available. #9 0x00007f8c1ab8f330 in __GI___nptl_deallocate_tsd () at
./nptl/nptl_deallocate_tsd.c:73|
|==============================================================================================|
|data = <optimized out> inner = <optimized out> level2 =
0x7f8bb37fe9d0 idx = <optimized out> round = <optimized out> cnt = 0
self = <optimized out> just_free = <optimized out> __value =
<optimized out> #10 __GI___nptl_deallocate_tsd () at
./nptl/nptl_deallocate_tsd.c:22 self = <optimized out> just_free =
<optimized out> round = <optimized out> cnt = <optimized out> idx =
<optimized out> level2 = <optimized out> __value = <optimized out>
inner = <optimized out> data = <optimized out> __value = <optimized
out> level2 = <optimized out> __value = <optimized out> #11
0x00007f8c1ab92880 in start_thread (arg=<optimized out>) at
./nptl/pthread_create.c:455 pd = <optimized out> out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140237988685504,
5452558714362015249, 140237988685504, -256, 0, 140735915134752,
5452558714382986769, 5450537313475505681}, mask_was_saved = 0}}, priv
= {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0,
canceltype = 0}}} not_first_call = <optimized out> #12
0x00007f8c1ac1fc3c in clone3 () at
../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 No locals. |
Pay attention at underlined frame in thread 13 (up) and in failed thread
(down):
|Thread 1 (Thread 0x7f8be99f66c0 (LWP 352)): #0 0x00007f8be9b8f7d0 in
?? () No symbol table info available. #1 0x00007f8c1ab8f330 in
__GI___nptl_deallocate_tsd () at ./nptl/nptl_deallocate_tsd.c:73|
|==============================================================================================|
|data = <optimized out> inner = <optimized out> level2 =
0x7f8be99f69d0 idx = <optimized out> round = <optimized out> cnt = 0
self = <optimized out> just_free = <optimized out> __value =
<optimized out> #2 __GI___nptl_deallocate_tsd () at
./nptl/nptl_deallocate_tsd.c:22 self = <optimized out> just_free =
<optimized out> round = <optimized out> cnt = <optimized out> idx =
<optimized out> level2 = <optimized out> __value = <optimized out>
inner = <optimized out> data = <optimized out> __value = <optimized
out> level2 = <optimized out> __value = <optimized out> #3
0x00007f8c1ab92880 in start_thread (arg=<optimized out>) at
./nptl/pthread_create.c:455 pd = <optimized out> out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140238896719552,
5452685428782149137, 140238896719552, -256, 0, 140735915134752,
5452685428803120657, 5450537313475505681}, mask_was_saved = 0}}, priv
= {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0,
canceltype = 0}}} not_first_call = <optimized out> #4
0x00007f8c1ac1fc3c in clone3 () at
../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 No locals. |
Looks like I have to change my mind re libops_udr.so - afraid it's
somehow related with what happens. May be you have not provided
appropriate switched for MT support when building it?
I haven't really commented much on when the crash is occuring.
Can I discuss that here, or should I open an issue?
Telling true I'm far unsure that this is firebird issue. Would like to
take a look at one (better two) more full traces.
|
Beta Was this translation helpful? Give feedback.
-
I'm doing all my testing and investigation inside the container. Yes I saw the reference to
The library ops_udr;
{$IFDEF FPC}
{$MODE DELPHI}
{$H+}
{$ENDIF}
uses
{$IFDEF unix}
cthreads, // the c memory manager is on some systems much faster for multi-threading
cmem,
{$ENDIF}
ops_udr_init in 'ops_udr_init.pas';
exports firebird_udr_plugin;
begin
IsMultiThread := True;
end. I believe this is what is required for MT support. I'll post a couple more full traces next. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Not sure why some of the
|
Beta Was this translation helpful? Give feedback.
-
In all cases, all I get in
In all cases, Firebird crashes after my api has terminated abnormally. The api usually has a few pooled connections open at the time. If my api crashes, a minute or so later, Firebird crashes. Not always though so it's quite confusing. |
Beta Was this translation helpful? Give feedback.
-
For me it looks like a massive buffer overflow that destroyed some internal structures of system memory allocator. Perhaps memory sanitizers or Valgrind can help to investigate that. |
Beta Was this translation helpful? Give feedback.
-
On 1/24/25 02:10, Nick Barrett wrote:
I'm doing all my testing and investigation inside the container.
Yes I saw the reference to |/opt/firebird/plugins/udr/libops_udr.so|
and wondered if it was the true cause. Not sure if you know much about
fpc but this is the command I'm using to build the udr.
|fpc ops_udr.lpr -obin/ -Flfirebird -FUlib -O3 -CX -XX -Tlinux
-Px86_64 -gl |
Seems so :-(
gcc needs -pthread option to build MT correctly, looks like fpc does not
have similar.
The |-gl| is only for the debug build.
|ops_udr.lpr| contains the following code
library ops_udr;
{$IFDEF FPC}
{$MODE DELPHI}
{$H+}
{$ENDIF}
uses
{$IFDEF unix}
cthreads,// the c memory manager is on some systems much faster for multi-threading
cmem,
{$ENDIF}
ops_udr_initin 'ops_udr_init.pas';
exports firebird_udr_plugin;
begin
IsMultiThread := True;
end.
I believe this is what is required for MT support.
I'll post a couple more full traces next.
Quite random. I have to agree with DS:
… For me it looks like a massive buffer overflow that destroyed some
internal structures of system memory allocator.
|
Beta Was this translation helpful? Give feedback.
-
Some progress to report 👍 I happened to notice this one time that restarting the docker container created a core dump. The crash is now repeatable 🥳🥳🥳 I'm now working on the udr and its build process to try and figure out if I can get it working. A very big thank you to everyone for your help. I'll report back the outcome of our udr. FYI, This is what the core dump looks like now
|
Beta Was this translation helpful? Give feedback.
-
On 1/25/25 05:02, Nick Barrett wrote:
#1 0x00007f532c42e330 in __GI___nptl_deallocate_tsd () at
./nptl/nptl_deallocate_tsd.c:73
#2 __GI___nptl_deallocate_tsd () at ./nptl/nptl_deallocate_tsd.c:22
Familiar places.
|
Beta Was this translation helpful? Give feedback.
-
Thank you @AlexPeshkoff, @hvlad and @aafemt for your help. Especially @AlexPeshkoff for the help getting core dumps worked out. I am happy to report that we have finally fixed our issue. To create the UDR we followed the guidance in the docs Writing Firebird UDRs in Pascal. In the docs it states the .lpr project file should look something like this library MyUdr;
{$IFDEF FPC}
{$MODE DELPHI}{$H+}
{$ENDIF}
uses
{$IFDEF unix}
cthreads,
// the c memory manager is on some systems much faster for multi-threading
cmem,
{$ENDIF}
UdrInit in 'UdrInit.pas',
SumArgsFunc in 'SumArgsFunc.pas';
exports firebird_udr_plugin;
end. In the docs for Free Pascal it says the following about referencing the
Referencing It seemed to me that both sets of docs were telling me to include I got the idea from the udr-lkJSON project. There was no reference to Thanks again for all the help =) |
Beta Was this translation helpful? Give feedback.
-
Hi folks,
I'm looking for some guidance on how I can get core dumps in my container to assist diagnosing an issue we are having.
We have FB v5.0.1 running in a container. We are using ubuntu:24.04 as the base image. You can see the details at firebird-ubuntu-chiselled
When running locally for dev, we don't use a chiselled version of Ubuntu, so not quite exactly what's in the repo, but close.
We set
BugcheckAbort = 1
in firebird.conf and we run the container withdocker run -it --ulimit core=-1...
As for the clients, we are using .net with NETProvider
The issue we are having is that "sometimes" when the client application terminates (usually with about 10 connections to the DB), FireBird will terminate with the message
/opt/firebird/bin/fbguard: /opt/firebird/bin/firebird terminated abnormally (-1)
We don't seem to get any core dumps though, so I'm not sure how to work out what's going wrong.
Any help will be much appreciated
Nick
Beta Was this translation helpful? Give feedback.
All reactions