You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HtmlAttributeCollection: Hashitems and therefore the this[string name] indexer getting out of sync when removing one of multiple attributes with same name
#464
Open
elgonzo opened this issue
Jan 15, 2022
· 3 comments
The this[string name] indexer of the HtmlAttributeCollection does not operate on the internal items list, but operates on a separate internal dictionary Hashitems instead. For this[string name] to behave correctly, it is therefore crucial to keep Hashitems accurately in sync whenever and however the items collection is being modified.
Unfortunately, the Hashitems dictionary is not maintained properly when removing items from the HtmlAttributeCollection. I already filed an issue about one such problem scenario yesterday (#463). Later i found another problem scenario involving other methods of HtmlAttributeCollection that lead to Hashitems getting out of sync with items and therefore the this[string name] indexer operating incorrectly, and which i am going to describe and demonstrate below.
This problem scenario relies on an element/node having multiple attributes with the same name, like for example:
Removing just one of these class attributes will cause the this[string name] indexer to act as if there were no other class attributes existing in the <div> node, incorrectly returning null despite the HtmlAttributeCollection still possessing class attributes.
This problem scenario can be reproduced using either the Remove(HtmlAttribute) or the RemoveAt(int) method.
usingHtmlAgilityPack;usingSystem;namespaceHtmlAttributeCollection_HashItemsGettingOutOfSyncWithRemove{classProgram{staticvoidMain(string[]args){varhtml=@"<div class=""One"" class=""Two"" class=""Three"">Hello World</div>";varhtmlDoc=newHtmlDocument();htmlDoc.LoadHtml(html);vardivNode=htmlDoc.DocumentNode.SelectSingleNode("//div");vara1=divNode.Attributes["class"];varstr1=(a1isnull)?"<NULL>":$"{a1.Name}={a1.Value}";Console.WriteLine($"divNode.Attributes[\"class\"] before Remove(HtmlAttribute): {str1}");varsecondClassAttribute=divNode.Attributes[1];Console.WriteLine($"Second \"class\" attribute in collection: {secondClassAttribute.Name}={secondClassAttribute.Value}");Console.WriteLine();Console.WriteLine("Executing divNode.Attributes.Remove(secondClassAttribute)");Console.WriteLine();divNode.Attributes.Remove(secondClassAttribute);vara2=divNode.Attributes["class"];varstr2=(a2isnull)?"<NULL>":$"{a2.Name}={a2.Value}";Console.WriteLine($"divNode.Attributes[\"class\"] after Remove(HtmlAttribute): {str2}");Console.WriteLine();Console.WriteLine("All attributes in divNode.Attributes after removal of the second \"class\" attribute:");foreach(varattrindivNode.Attributes)Console.WriteLine($"\t{attr.Name}={attr.Value}");}}}
usingHtmlAgilityPack;usingSystem;namespaceHtmlAttributeCollection_HashItemsGettingOutOfSyncWithRemoveAt{classProgram{staticvoidMain(string[]args){varhtml=@"<div class=""One"" class=""Two"" class=""Three"">Hello World</div>";varhtmlDoc=newHtmlDocument();htmlDoc.LoadHtml(html);vardivNode=htmlDoc.DocumentNode.SelectSingleNode("//div");vara1=divNode.Attributes["class"];varstr1=(a1isnull)?"<NULL>":$"{a1.Name}={a1.Value}";Console.WriteLine($"divNode.Attributes[\"class\"] before Remove(HtmlAttribute): {str1}");varsecondClassAttribute=divNode.Attributes[1];Console.WriteLine($"Second \"class\" attribute in collection: {secondClassAttribute.Name}={secondClassAttribute.Value}");Console.WriteLine();Console.WriteLine("Executing divNode.Attributes.RemoveAt(1)");Console.WriteLine();divNode.Attributes.RemoveAt(1);vara2=divNode.Attributes["class"];varstr2=(a2isnull)?"<NULL>":$"{a2.Name}={a2.Value}";Console.WriteLine($"divNode.Attributes[\"class\"] after Remove(HtmlAttribute): {str2}");Console.WriteLine();Console.WriteLine("All attributes in divNode.Attributes after removal of the second \"class\" attribute:");foreach(varattrindivNode.Attributes)Console.WriteLine($"\t{attr.Name}={attr.Value}");}}}
In both demos, divNode.Attributes["class"] yields the class="Three" attribute before the removal operation is attempted.
Then both demos proceed to remove the second class attribute class="Two".
After the removal of the second class attribute, divNode.Attributes["class"] incorrectly yields null, despite the <div> node possessing two class attributes at that point (class="One" and class="Three").
3. Any further technical details
HAP version: 1.11.40
NET 5.0
Side note: It seems to me the behavior of the this[string name] indexer is not completely and unambiguously specified when dealing with multiple attributes having the same name. What is its intended behavior here? Should it always return the last in the order of the attributes with the same name? The first? Should it return the last accessed attribute of the same name (but what if the last such access was removal of such an attribute, and what would be the initial state when there have been no attributes accessed yet)? Should it remain random or undefined, with you basically knowing only that you are getting some attribute with the specified name, but you are rolling the dice on which exactly you get, and hope and cross fingers that it is the one you actually wanted? Some other behavior my weekend-addled brain can't think of right now....? ;-)
The text was updated successfully, but these errors were encountered:
Having multiple attributes with the same name is not really legal in HTML.
Even major browsers will only read the first attribute and don't show other ones.
So even if we currently have weird behavior, unless I miss something, I do not think that we really need to support this case as this is not a valid case. So getting weird behavior is expected here.
I strongly disagree with keeping the current behavior. Here is why:
HtmlAttributeCollection is able and does maintain multiple attributes of the same name.
Iterating over the HtmlAttributeCollection will yield all of the attributes of the same name remaining in the collection, even after removing one of those attributes.
Accessing each of the attributes through the numeric indexer works just fine, even after removing one of the attributes having the same name.
The point is not whether multiple attributes of the same name are illegal in HTML, and thus weird behavior should be accepted. (If that were the point, why was #461 being addressed and fixed, even though the very same argument could have been made there?)
No, the point is that the behavior of the string indexer this[string name] is inconsistent with the other functions of HtmlAttributeCollection itself. That's the real issue here. Instead of "weird behavior" i would prefer to either see the this[string name] indexer fixed or disallow HtmlAttributeCollection from having multiple attributes with the same name. Keeping HtmlAttributeCollection APIs with unexpected, surprising, and most importantly inconsistent behavior is the worst choice to make in my opinion.
Even major browsers will only read the first attribute and don't show other ones.
If that's the case, i would suggest the this[string name] indexer always provide the left-most attribute with that name (i.e., the one with the lowest index in the internal items collection.)
1. Description
The
this[string name]
indexer of the HtmlAttributeCollection does not operate on the internal items list, but operates on a separate internal dictionary Hashitems instead. Forthis[string name]
to behave correctly, it is therefore crucial to keep Hashitems accurately in sync whenever and however the items collection is being modified.Unfortunately, the Hashitems dictionary is not maintained properly when removing items from the HtmlAttributeCollection. I already filed an issue about one such problem scenario yesterday (#463). Later i found another problem scenario involving other methods of HtmlAttributeCollection that lead to Hashitems getting out of sync with items and therefore the
this[string name]
indexer operating incorrectly, and which i am going to describe and demonstrate below.This problem scenario relies on an element/node having multiple attributes with the same name, like for example:
Removing just one of these class attributes will cause the
this[string name]
indexer to act as if there were no other class attributes existing in the <div> node, incorrectly returning null despite the HtmlAttributeCollection still possessing class attributes.This problem scenario can be reproduced using either the
Remove(HtmlAttribute)
or theRemoveAt(int)
method.2. Fiddle or Project
Demo using
Remove(HtmlAttribute)
: https://dotnetfiddle.net/5y7cH7Click to expand the source code
Demo using
RemoveAt(int)
: https://dotnetfiddle.net/BJQQkTClick to expand the source code
In both demos,
divNode.Attributes["class"]
yields theclass="Three"
attribute before the removal operation is attempted.Then both demos proceed to remove the second class attribute
class="Two"
.After the removal of the second class attribute,
divNode.Attributes["class"]
incorrectly yields null, despite the <div> node possessing two class attributes at that point (class="One"
andclass="Three"
).3. Any further technical details
Side note: It seems to me the behavior of the
this[string name]
indexer is not completely and unambiguously specified when dealing with multiple attributes having the same name. What is its intended behavior here? Should it always return the last in the order of the attributes with the same name? The first? Should it return the last accessed attribute of the same name (but what if the last such access was removal of such an attribute, and what would be the initial state when there have been no attributes accessed yet)? Should it remain random or undefined, with you basically knowing only that you are getting some attribute with the specified name, but you are rolling the dice on which exactly you get, and hope and cross fingers that it is the one you actually wanted? Some other behavior my weekend-addled brain can't think of right now....? ;-)The text was updated successfully, but these errors were encountered: