Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to insert list in ets:insert, ets:lookup refactor #1405

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

TheSobkiewicz
Copy link
Contributor

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

Changes:

  • Enabled ets:insert/2 to accept lists for bulk insertion.
  • Extracted helper functions for ets:lookup/2 and ets:insert/2 that do not apply table locks.

Use Cases for the Helper Functions:

The new helper functions can be utilized in the following ETS operations to reduce code duplication:

  • ets:update_element/3
  • ets:insert_new/2
  • ets:update_counter/3
  • ets:update_counter/4
  • ets:take/2
  • ets:delete_object/2

Every mentioned function will be implemented after merging of this PR.

}
EtsErrorCode result = ets_table_insert(ets_table, tuple, ctx);
if (result != EtsOk) {
AVM_ABORT(); // Abort because operation might not be atomic.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually don't do VM abort: calling AVM_ABORT() means that an unrecoverable happened, such as memory corruption, a bad internal bug and any other kind of situation that required an entire VM crash and reboot.
Are we in this specific situation?

Copy link
Contributor Author

@TheSobkiewicz TheSobkiewicz Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we don't have any other tool to ensure atomicity here. In case the insert fails at the Nth element, elements (0,N -1) will be inserted into the list, which could result in hard-to-debug behavior. It is unlikely to happen.

Copy link
Contributor

@jakub-gonet jakub-gonet Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand: without this abort, if we're short on memory, we'd leave list partially inserted. If someone tries to persist inserts someday we'd leave the system in inconsistent state.

To avoid that we need to either abort or allocate the list of previous values and rollback in case of error (ensuring that nothing allocates in rollback path since we're most likely dealing with OOM). Abort is easier to do here.

This check needs to have UNLIKELY.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fair point, there is another feasible approach:
table nodes can be pre-allocated before making any change to the list, so in case of allocation failure freeing up allocated nodes can be easily done before making any actual change.

ets_hashtable_insert will need an additional node parameter, and a dedicated allocation function might be created (e.g. ets_hashtable_new_node). Furthermore key and and entry parameters can be moved to the ets_hashtable_new_node function if it can help.
This change will have a very small impact since ets_hashtable_insert is used in just one or two places.
I suggest doing this with an additional commit inside this PR, so we can make the review easier and separate this activity in 2 tasks.

This change will remove any implicit allocation and make abort not necessary.

src/libAtomVM/ets.c Outdated Show resolved Hide resolved
src/libAtomVM/ets.c Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
tests/erlang_tests/test_ets.erl Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
@TheSobkiewicz TheSobkiewicz force-pushed the thesobkiewicz/nifs/ets/refactor_insert branch 4 times, most recently from 76774f0 to 6ac7831 Compare January 9, 2025 15:44
src/libAtomVM/ets.c Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
return EtsTableNotFound;
}

EtsErrorCode result = ets_table_lookup(ets_table, key, ret, ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While working on this code recently, I noticed that hashtable lookup take keypos arg which isn't needed (we have node->key and keypos can't change after table creation). May be worth to do it in this PR or in the followup.

@TheSobkiewicz TheSobkiewicz force-pushed the thesobkiewicz/nifs/ets/refactor_insert branch from 6ac7831 to c2bc9d2 Compare January 12, 2025 03:18
@TheSobkiewicz TheSobkiewicz force-pushed the thesobkiewicz/nifs/ets/refactor_insert branch from c2bc9d2 to 01456a3 Compare January 12, 2025 03:29
@TheSobkiewicz TheSobkiewicz force-pushed the thesobkiewicz/nifs/ets/refactor_insert branch from f54ef62 to 45eccdc Compare January 15, 2025 16:13
}
if (!term_is_nil(iter)) {
return EtsBadEntry;
}

struct HNode **hnode_list = malloc(size * sizeof(struct HNode *));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct HNode? I don't see it exposed via header file and we have few structures that are named HNode in the project. Small nit: maybe hnodes or nodes? This is an array, not a list.

return EtsAllocationFailure;
}

int cur = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not i?

list = term_get_list_tail(list);
}

for (size_t i = 0; i < size; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ++i

return EtsAllocationFailure;
}

int cur = 0;
while (term_is_nonempty_list(list)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we do it in the previous loop instead of iterating twice?

return NULL;
}
size_t size = (size_t) memory_estimate_usage(entry);
if (memory_init_heap(heap, size) != MEMORY_GC_OK) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering, why we create new heap instead of piggybacking on owner process' heap?


void free_hashtable_node_array(struct HNode **allocated, size_t size, GlobalContext *global)
{
for (size_t j = 0; j < size; j++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why j instead of i? Also change post-increment to pre-increment please.

void free_hashtable_node_array(struct HNode **allocated, size_t size, GlobalContext *global)
{
for (size_t j = 0; j < size; j++) {
if (allocated[j]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check it?

return new_node;
}

void free_hashtable_node_array(struct HNode **allocated, size_t size, GlobalContext *global)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this should live here or in ets.c instead.

Comment on lines +147 to +148
memory_destroy_heap(new_node->heap, global);
free(new_node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this, we shouldn't take ownership of the node here.

memory_destroy_heap(node->heap, global);
node->heap = heap;
node->heap = new_node->heap;
free(new_node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should either swap the node entirely or pass contents of it instead of using it as impromptu container, especially when we do swap it on hash collision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants