Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SapModel caches the Hessian factorization #21421

Merged
merged 1 commit into from
May 13, 2024

Conversation

amcastro-tri
Copy link
Contributor

@amcastro-tri amcastro-tri commented May 9, 2024

Towards the support of gradients through SAP, see #21370.

We need the Hessian to propagate gradients. The best way to ensure it's up-to-date, is by caching it in the model's context as we do with every other quantity that depends on v.

As a quick reminder, everything SAP computes is a function of generalized velocities v. Thefore the "state" of the SAP model is generalized velocities v stored in the context. Every other quantity, say impulses, cost, gradients and even Hessian, are functions of v that we cache in the context.


This change is Reviewable

@amcastro-tri amcastro-tri added priority: high release notes: none This pull request should not be mentioned in the release notes labels May 9, 2024
Copy link
Contributor Author

@amcastro-tri amcastro-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves standalone unit tests for SapHessianFactorization, but I figured I could assing a reviewer to get a first pass before commiting to those tests.

+@sherm1 for feature review please.

Reviewable status: LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers

Copy link
Contributor Author

@amcastro-tri amcastro-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that it matters, but now SapHessianFactorization was replace by DenseSuperNodalSolver.

Reviewable status: LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers

Copy link
Member

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature review checkpoint.

Reviewed 3 of 6 files at r1, all commit messages.
Reviewable status: 2 unresolved discussions, LGTM missing from assignee sherm1(platform), needs at least two assigned reviewers


multibody/contact_solvers/sap/dense_supernodal_solver.h line 38 at r1 (raw file):

  bool DoSetWeightMatrix(
      const std::vector<Eigen::MatrixXd>& block_diagonal_G) final;
  Eigen::MatrixXd DoMakeFullMatrix() const final { return H_; }

BTW is there any guarantee H_ has been calculated when this is called? Is there a way to check with a DRAKE_DEMAND or whatever? Also I'm wondering why this returns a copy rather than a reference?


multibody/contact_solvers/sap/dense_supernodal_solver.h line 52 at r1 (raw file):

  MatrixX<double> H_;
  // N.B. This instantiates an in-place LDLT solver that uses the memory in H_.
  Eigen::LDLT<MatrixX<double>> Hldlt_{H_};

BTW Eigen's documentation suggests it won't do this in place unless the MatrixType template argument is an EigenRef. Have you checked that this is actually doing what you want it to?
image.png

Copy link
Member

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature :lgtm: with a few minor comments.

Reviewed 3 of 6 files at r1.
Reviewable status: 13 unresolved discussions, needs at least two assigned reviewers


multibody/contact_solvers/sap/dense_supernodal_solver.h line 24 at r1 (raw file):

  // outlive this object.
  // @throws std::runtime_error if A or J is nullptr.
  // @throws std::runtime_error if a block in A is not square.

nit: should just promise std::exception here


multibody/contact_solvers/sap/dense_supernodal_solver.cc line 42 at r1 (raw file):

  const MatrixX<double> Jdense = J_->MakeDenseMatrix();

  // Make dense Hessian matrix G.

minor: Hessian -> Weight


multibody/contact_solvers/sap/dense_supernodal_solver.cc line 54 at r1 (raw file):

    Gdense.block(offset, offset, ni, ni) = Gi;
    offset += ni;
  }

BTW if G doesn't have enough rows it will be left with zeroes to fill in the rest. Is that OK, or is that also a kind of incompatibility with the Jacobian?


multibody/contact_solvers/sap/sap_model.h line 123 at r1 (raw file):

// SuperNodalSolver, with the underlying type specified by
// SapHessianFactorizationType.
class HessianFactorizationCache {

BTW this is a "cache entry" rather than a "cache". Probably should be renamed HessianFactorizationCacheEntry to make that clear.


multibody/contact_solvers/sap/sap_model.h line 132 at r1 (raw file):

  HessianFactorizationCache& operator=(HessianFactorizationCache&&) = default;

  // Creates an empty cache, with no heap allocation. This is used to delay the

nit: cache -> cache entry


multibody/contact_solvers/sap/sap_model.h line 145 at r1 (raw file):

  //
  // @warning This is a potentially expensive constructor, performing the
  // necessary symbolic analysys for the case of sparse factorizations.

typo: analysys -> analysis


multibody/contact_solvers/sap/sap_model.h line 179 at r1 (raw file):

  // avoided at all costs. Therefore this implementation always throws. We can
  // do this because in the well controlled lifespan of a context generated by a
  // SapModel, we know that a HessianFactorizationCache is default construted

typo: construted -> constructed

BTW great comment, thanks!


multibody/contact_solvers/sap/sap_model.h line 410 at r1 (raw file):

  //
  // @note This is an expensive computation and therefore only T = double is
  // supported.

BTW not clear why being expensive means only T=double (we have other expensive things that can be Autodiffed). Can you elaborate?


multibody/contact_solvers/sap/sap_model.h line 412 at r1 (raw file):

  // supported.
  //
  // @trows std::runtime_error when attempting to perform the computation on T

typo: trows -> throws


multibody/contact_solvers/sap/sap_model.cc line 192 at r1 (raw file):

  // N.B. The default constructible SapHessianFactorizationCache is empty. Since
  // the computation of the factorization is costly, we dealy its construction

typo: dealy -> delay


multibody/contact_solvers/sap/sap_model.cc line 193 at r1 (raw file):

  // N.B. The default constructible SapHessianFactorizationCache is empty. Since
  // the computation of the factorization is costly, we dealy its construction
  // to the very first time is computed.

typo: time is -> time it is

Copy link
Member

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@SeanCurtis-TRI for platform review per rotation, please

Reviewable status: 13 unresolved discussions, LGTM missing from assignee SeanCurtis-TRI(platform)

Copy link
Contributor

@SeanCurtis-TRI SeanCurtis-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:LGTM: I agree with @sherm1's comments and have added a few of my own.

Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: 18 unresolved discussions


multibody/contact_solvers/sap/dense_supernodal_solver.h line 38 at r1 (raw file):

Previously, sherm1 (Michael Sherman) wrote…

BTW is there any guarantee H_ has been calculated when this is called? Is there a way to check with a DRAKE_DEMAND or whatever? Also I'm wondering why this returns a copy rather than a reference?

The documentation in supernodal_solver.h on MakeFullMatrix() says that it throws if SetWeightMatrix() hasn't been called. So, the proposed DRAKE_DEMAND is rolled into the parent class.


multibody/contact_solvers/sap/dense_supernodal_solver.h line 43 at r1 (raw file):

  int DoGetSize() const final { return size(); }

  const std::vector<MatrixX<double>>* A_{nullptr};

BTW Given that this isn't movable or copyable, A_ and J_ could be const references instead of pointers to const. It eliminates the need to default construct values and guarantees that they are non-null (spelled as is, the line A_ = nullptr; is still valid.


multibody/contact_solvers/sap/sap_model.h line 134 at r1 (raw file):

  // Creates an empty cache, with no heap allocation. This is used to delay the
  // creation of the expensive Hessian structure to when it is strictly needed.
  // After instantantiations with this constructor is_empty() is `true`.

nit: typo

Suggestion:

 instantiations

multibody/contact_solvers/sap/sap_model.h line 147 at r1 (raw file):

  // necessary symbolic analysys for the case of sparse factorizations.
  //
  // @pre A and J must not be nullptr.

nit: Slight grammar tweak.

Suggestion:

are not

multibody/contact_solvers/sap/sap_model.cc line 45 at r1 (raw file):

  throw std::runtime_error(
      "Attempting to clone an expensive Hessian factorization.");
  return nullptr;

nit: Presence of throw negates the need for return.


multibody/contact_solvers/sap/sap_model.cc line 63 at r1 (raw file):

  for (int i = 0; i < num_rhs; ++i) {
    auto rhs_i = rhs->col(i);
    rhs_i = factorization()->Solve(rhs_i);

BTW It looks ever so slightly odd that this doesn't call factorization()->SolveInPlace().

Copy link
Contributor Author

@amcastro-tri amcastro-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick review guys!
I added the promissed unit tests on DenseSuperNodalSolver. I believe that also helps mitiage some of your questionns in the first round of reviews.

Reviewable status: :shipit: complete! all discussions resolved, LGTM from assignees SeanCurtis-TRI(platform),sherm1(platform)


multibody/contact_solvers/sap/dense_supernodal_solver.h line 38 at r1 (raw file):

Previously, sherm1 (Michael Sherman) wrote…

BTW is there any guarantee H_ has been calculated when this is called? Is there a way to check with a DRAKE_DEMAND or whatever? Also I'm wondering why this returns a copy rather than a reference?

added comment.


multibody/contact_solvers/sap/dense_supernodal_solver.h line 43 at r1 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

BTW Given that this isn't movable or copyable, A_ and J_ could be const references instead of pointers to const. It eliminates the need to default construct values and guarantees that they are non-null (spelled as is, the line A_ = nullptr; is still valid.

Thanks, I like that.


multibody/contact_solvers/sap/dense_supernodal_solver.h line 52 at r1 (raw file):

Previously, sherm1 (Michael Sherman) wrote…

BTW Eigen's documentation suggests it won't do this in place unless the MatrixType template argument is an EigenRef. Have you checked that this is actually doing what you want it to?
image.png

Great catch Sherm! It only works if you explictly spell the Eigen::Ref in the template argument.


multibody/contact_solvers/sap/dense_supernodal_solver.cc line 54 at r1 (raw file):

Previously, sherm1 (Michael Sherman) wrote…

BTW if G doesn't have enough rows it will be left with zeroes to fill in the rest. Is that OK, or is that also a kind of incompatibility with the Jacobian?

nice, I added a check for this.


multibody/contact_solvers/sap/sap_model.h line 123 at r1 (raw file):

Previously, sherm1 (Michael Sherman) wrote…

BTW this is a "cache entry" rather than a "cache". Probably should be renamed HessianFactorizationCacheEntry to make that clear.

right now I am naming it consistently with the rest of the cache structs in this file.


multibody/contact_solvers/sap/sap_model.h line 410 at r1 (raw file):

Previously, sherm1 (Michael Sherman) wrote…

BTW not clear why being expensive means only T=double (we have other expensive things that can be Autodiffed). Can you elaborate?

added comment.


multibody/contact_solvers/sap/sap_model.cc line 45 at r1 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

nit: Presence of throw negates the need for return.

I sometimes forget how smart the compiler is.


multibody/contact_solvers/sap/sap_model.cc line 63 at r1 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

BTW It looks ever so slightly odd that this doesn't call factorization()->SolveInPlace().

I agree. Added note.

@amcastro-tri amcastro-tri force-pushed the sap_caches_hessian branch 2 times, most recently from 07fcb79 to d303c75 Compare May 13, 2024 19:34
Copy link
Contributor

@SeanCurtis-TRI SeanCurtis-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 6 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: 7 unresolved discussions


multibody/contact_solvers/sap/dense_supernodal_solver.h line 32 at r1 (raw file):

  // @returns the size of the system.
  int size() const { return H_.rows(); }

nit: Why this method when the parent class already has GetSize()? It really just feels like the implementation of DoGetSize(). But adding a redundant public API when the inherited API already includes something doesn't seem like a good idea.


multibody/contact_solvers/sap/dense_supernodal_solver.h line 48 at r3 (raw file):

  int DoGetSize() const final { return size(); }

  static int CalcSize(const std::vector<MatrixX<double>>& A) {

nit: this function does not appear to be invoked.


multibody/contact_solvers/sap/sap_model.cc line 63 at r3 (raw file):

    auto rhs_i = rhs->col(i);
    // N.B. Unfortunately SolveInPlace() only accepts VectorXd*, while here the
    // return of col() is a block object. Therefore, we are    forced

nit: Looks like your spacebar ran wild.

Suggestion:

we are forced

multibody/contact_solvers/sap/test/dense_supernodal_solver_test.cc line 81 at r3 (raw file):

  DRAKE_DEMAND(block_positions.size() == dense_positions.size());
  DRAKE_DEMAND(block_positions.size() == block_sizes.size());
  const int num_blocks = block_positions.size();

nit: I'm surprised you're not getting a signed-unsigned mismatch.

Suggestion:

const int num_blocks = ssize(block_positions);

multibody/contact_solvers/sap/test/dense_supernodal_solver_test.cc line 121 at r3 (raw file):

  BlockSparseMatrixBuilder<double> builder(num_row_blocks_of_J,
                                           num_col_blocks_of_J, 3);
  for (auto& block : Jtriplets) {

nit: You're not actually using its mutable nature, so better not to imply such.

Suggestion:

for (const auto& block 

multibody/contact_solvers/sap/test/dense_supernodal_solver_test.cc line 129 at r3 (raw file):

  DenseSuperNodalSolver solver(&blocks_of_A, &J);
  EXPECT_EQ(solver.GetSize(), 6);

BTW As to my earlier point about the two size APIs, this part of the test makes it look particularly silly.


multibody/contact_solvers/sap/test/dense_supernodal_solver_test.cc line 223 at r3 (raw file):

  }

  // Calling Solve() before Factor() throws.

nit: This doesn't feel like a test of DenseSuperNodalSolver. The dut doesn't take responsibility for this throw, that's part of the parent class's responsibilities.

Same applies to order of operations on MakeFullMatrix() and Factor() below.

Copy link
Contributor Author

@amcastro-tri amcastro-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all discussions resolved, LGTM from assignees SeanCurtis-TRI(platform),sherm1(platform)


multibody/contact_solvers/sap/dense_supernodal_solver.h line 48 at r3 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

nit: this function does not appear to be invoked.

good catch, thanks


multibody/contact_solvers/sap/test/dense_supernodal_solver_test.cc line 81 at r3 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

nit: I'm surprised you're not getting a signed-unsigned mismatch.

I've sen these mismatch errors when comparing an int to the result of size().
I belive the assignment always works.


multibody/contact_solvers/sap/test/dense_supernodal_solver_test.cc line 223 at r3 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

nit: This doesn't feel like a test of DenseSuperNodalSolver. The dut doesn't take responsibility for this throw, that's part of the parent class's responsibilities.

Same applies to order of operations on MakeFullMatrix() and Factor() below.

I agree. But it seemed like with very little you get a nice coverage of all cases making this very self contained.
Do you think it's a defect and I should remove them?

Copy link
Contributor

@SeanCurtis-TRI SeanCurtis-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r4, all commit messages.
Reviewable status: :shipit: complete! all discussions resolved, LGTM from assignees SeanCurtis-TRI(platform),sherm1(platform)


multibody/contact_solvers/sap/test/dense_supernodal_solver_test.cc line 81 at r3 (raw file):

Previously, amcastro-tri (Alejandro Castro) wrote…

I've sen these mismatch errors when comparing an int to the result of size().
I belive the assignment always works.

Nice to have things explicit. :) Thanks.


multibody/contact_solvers/sap/test/dense_supernodal_solver_test.cc line 223 at r3 (raw file):

Previously, amcastro-tri (Alejandro Castro) wrote…

I agree. But it seemed like with very little you get a nice coverage of all cases making this very self contained.
Do you think it's a defect and I should remove them?

I can see the appeal behind the argument.

The derived class relies on behavior documented by the parent class to be correct. To make sure the derived class doesn't accidentally get broken if someone changes the parent class, we can put in a couple of regression tests so we know if such a thing happens.

However, I consider that to be death by a thousand cuts. I'd put my reliance on the review process -- if a documented invariant gets changed on a virtual interface in a PR, the reviewers should notice if there is no downstream accommodation or not. The likelihood and frequency of such a change is so small/low that it doesn't seem to be worth bulking up the unit tests for it, even by such a small amount.

And, in a unit test, it misleadingly implies responsibilities where they don't lie.

So, I'd argue that there's no real upside to keep these tests on the parent's implementation.

Copy link
Member

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Reviewed 2 of 6 files at r2, 4 of 4 files at r4, all commit messages.
Reviewable status: :shipit: complete! all discussions resolved, LGTM from assignees SeanCurtis-TRI(platform),sherm1(platform)

@sherm1 sherm1 merged commit f0a46ba into RobotLocomotion:master May 13, 2024
10 checks passed
RussTedrake pushed a commit to RussTedrake/drake that referenced this pull request Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high release notes: none This pull request should not be mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants