Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

least-squares: matrix-solve contract violation #14

Open
bennn opened this issue Mar 24, 2023 · 6 comments
Open

least-squares: matrix-solve contract violation #14

bennn opened this issue Mar 24, 2023 · 6 comments

Comments

@bennn
Copy link
Contributor

bennn commented Mar 24, 2023

What program did you run?

#lang racket

(require data-frame)

(define df
  (let* ((df (make-data-frame))
         (xs (make-series "x" #:data '#(0 1)))
         (ys (make-series "y" #:data '#(2 3))))
    (df-add-series! df xs)
    (df-add-series! df ys)
    df))

(df-least-squares-fit df "x" "y" #:mode 'polynomial)

What happened?

matrix-solve: contract violation
  expected: matrix-invertible?
  given: (array #[#[2 1 1] #[1 1 1] #[1 1 1]])
  argument position: 1st
  other arguments...:
   (array #[#[5] #[3] #[3]])

What did you expect to happen?

No error.

I'm not sure what the problem is. Too few points? I would rather have a poor fit (straight line) than an error. But if an error is the right thing, can we clear up the message?

@alex-hhh
Copy link
Owner

The problem is that you are trying to fit a second-degree polynomial with only 2 points. Depending on how you look at it, this either has no solution or has an infinite number of solutions :-)

You can either use a linear fit, or specify the degree of the polynomial (which defaults to 2)

(df-least-squares-fit df "x" "y" #:mode 'linear)
(df-least-squares-fit df "x" "y" #:mode 'polynomial #:polynomial-degree 1)

I agree, however that the error message could be improved.

@bennn
Copy link
Contributor Author

bennn commented Mar 26, 2023

Is the general rule that the degree can't be smaller than the number of points?

@alex-hhh
Copy link
Owner

Is the general rule that the degree can't be smaller than the number of points?

You need at least N + 1 data points to fit a polynomial of degree N. This is because for a N degree polynomial, there are N + 1 coefficients, so you need N + 1 equations, in the form y = P(x) to find these coefficients. This is easiest to understand for a first-degree polynomial which is a line:

  • for a single data point, there is an infinite number of lines that go through that point
  • for exactly 2 data points, there is a single line that goes through both points
  • for 3 or more data points, there is one line, such that the sum of the squared distance between each of these points to the line is minimized (this is the meaning of "least squares fit")

Unfortunately, we cannot simply add a contract to check that the polynomial degree must be less than the number of points: we need N + 1 distinct data points. For example, you cannot find a "least squares fit" line for a data set that contains the 3 points (0, 0), (0, 0), (0, 0).

A contract for distinct points would not work either: Depending on how matrix solve algorithms are implemented in the racket/math library, it might be possible to have points which are different (i.e., not equal in the = or fl= sense), yet close to each other so that the matrix cannot be inverted, and a solution cannot be found.

@bennn
Copy link
Contributor Author

bennn commented Mar 26, 2023

Hmm, ok. What about catching exceptions from math and re-raising them with a brief message like "unable to solve, may be too few distinct points in the dataset" ?

@alex-hhh
Copy link
Owner

a brief message like "unable to solve, may be too few distinct points in the dataset" ?

Such a message might confuse the user, and I had too many bad experiences with such messages. If an exception is to be thrown, I would rather have the original exception, rather than a higher level routine's interpretation of it -- for example the exception might not be related to "solving" at all.

Here is a dataset with two clearly distinct data points that will fail to find a linear fit:

#lang racket

(require data-frame)

(define df
  (let* ((df (make-data-frame))
         (xs (make-series "x" #:data '#(0 "1")))
         (ys (make-series "y" #:data '#(2 "3"))))
    (df-add-series! df xs)
    (df-add-series! df ys)
    df))

(df-least-squares-fit df "x" "y" #:mode 'polynomial #:polynomial-degree 1)

While the above example is somewhat contrived, this situation can easily happen in real life: some programs export data as CSV files and sometimes they quote numbers inside these files. For such CSV files, numbers will he read as strings, and the user needs to pass #t to #:quoted-numbers? in df-read/csv to read them as numbers.

@bennn
Copy link
Contributor Author

bennn commented Apr 1, 2023

Oh, by re-raise I meant to keep the original exception message. In other words, throw the actual exception and add a commentary on top.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants