-
-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack Overflow loading html document #499
Comments
Hello @johnkarbonhq , Thank you for reporting, My developer will look at it. Best Regards, Jon |
Hello @johnkarbonhq , Which version are you using? When my developer tried it, he successfully got the error message raised on this line: internal void CloseNode(HtmlNode endnode, int level = 0)
{
if (level > HtmlDocument.MaxDepthLevel)
{
throw new ArgumentException(HtmlNode.DepthLevelExceptionMessage);
} Perharp is there something we are missing to reproduce it? Best Regards, Jon |
I suspect the real code of @johnkarbonhq already being quite deep in the call stack when calling HtmlDocument.LoadHtml, thus resulting in the call stack becoming exhausted. With a simple console application that more or less directly calls HtmlDocument.LoadHtml this is probably not observable simply because the call stack is pretty much mostly empty and doesn't overflow. However, it's still just my speculation i am unable to either prove or disprove because i can't analyse the call stack at the time of the observed Stackoverflow exception (only @johnkarbonhq can do that). If my suspicion is correct, then reducing the value HtmlDocument.MaxDepthLevel further (perhaps like Why? Because if and at which internal HtmlAgilityPack method call a stack overflow happens is dependent on how full the call stack already is. Quite possibly, right now with the suggested patch @johnkarbonhq's application code will avoid the stack overflow using Ideal solution -- and also most expensive, sadly -- would be for the library to avoid deep recursion and converting potentially deep recursions into iterations or loops wherever practically feasible. For the SetChanged() method, this could for example look like this without losing much readability and maintainability (Warning! Untested code!): internal void SetChanged()
{
var currentNode = this;
do
{
currentNode._changed = true;
currentNode = currentNode.ParentNode;
}
while (currentNode is not null);
} And a HtmlDocument.MaxDepthLevel check should then apply only to the remaining potentially deep recursions related to document traversal and to other functions which have to enforce HtmlDocument.MaxDepthLevel. And in case the application code itself keeps eating up the call stack like there is no tomorrow where even the best and most thorough recursion optimization in HtmlAgilityPack wouldn't completely prevent call stack exhaustion, then the solution lies with the application: Either reduce its call stack usage or (as a workaround) execute call stack-intensive code in individual background thread (as each thread has its own dedicated call stack). |
Here is what to include in your request to make sure we implement a solution as quickly as possible.
1. Description
Parsing html emails (which are often not well formed), I got a stack overflow. Setting HtmlDocument.MaxDepthLevel did not prevent the stack overflow.
2. Exception
If you are seeing an exception, include the full exception details (message and stack trace).
Html.SetChanged... called recursively 5,000 times
3. Fiddle or Project
Repro is here: https://dotnetfiddle.net/mKe6jk
The text was updated successfully, but these errors were encountered: