Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global and local spatial autocorrelation chapters - standardization? #294

Open
iamwfx opened this issue Mar 4, 2023 · 3 comments
Open

Comments

@iamwfx
Copy link

iamwfx commented Mar 4, 2023

Hi, in both chapters you mention standardizing the Pct_leave and the w_Pct_leave. In the global spatial autocorrelation chapter, the standardization is described as only subtracting the mean (which is what the code reflects), while the local spatial autocorr chapter describes standardization as subtracting the mean and dividing by std. dev., though the code only shows subtracting means.

@iamwfx
Copy link
Author

iamwfx commented Mar 4, 2023

Also, typo here in the original text:
db["w_Pct_Leave_std"] = db["w_Pct_Leave"] - db["Pct_Leave"].mean() should be db["w_Pct_Leave"].mean()

@iamwfx iamwfx changed the title Global and local spatial autocorrelation chapters. Global and local spatial autocorrelation chapters - standardization? Mar 4, 2023
@ljwolf
Copy link
Member

ljwolf commented Mar 6, 2023

The top one we will definitely fix, thanks!

The bottom, though, is indeed correct but unclear in the "Local" chapter, and technically wrong in the global chapter. We've tried to edit this before & clearly failed. I'll change both to be consistent and compute the "spatial lag of centered % leave," as this is what we use and also intend to discuss.

To explain (for @darribas and @sjsrey in future edits...), the original statistic is stated only in terms of z and w. There's no mean(Wz) in the statistic, just in the scatterplot. In the original LISA paper, the mean of W.TOTCON and TOTCON are dashed lines. But, the dashed line for W.TOTCON is above zero on the y-axis:

Screenshot 2023-03-06 at 14 05 19

So, we definitely plot centered x vs. W(centered x)... but what about the "axis" we're plotting onto it?

Well... this is where we may differ from GeoDa (and from Anselin (1995)) is that we also classify based on the original mean of X. spdep also does this afaict. I think this is the correct approach, too: in this scheme, "high-high" classifications reflect observations with higher-than-average x and also higher-than-average x nearby. Classifying with respect to mean(Wx) shifts this latter part to "higher-than-average spatial lag" which is less intuitive...

None of it affects the actual slope of the line in the plot, just the intercept. Our version will give the intercept of the line as the average of the spatial lag when x is at its mean, while the other version would force a regression through the origin.

@darribas
Copy link
Member

darribas commented Mar 7, 2023

Paging this thread which also discussed the matter:

#32 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants