Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.x] Local dependency invalidation improvement #1528

Open
wants to merge 1 commit into
base: 1.10.x
Choose a base branch
from

Conversation

rochala
Copy link
Contributor

@rochala rochala commented Jan 3, 2025

Hello,
This is an attempt to fix undercompilation which can happen with local dependencies.

This PR is basically trying to solve 2 separate issues:

  1. The invariant that Relations.memberRef is a superset of Relations.inheritance is no longer valid.
    * This relation properly accounts for that so the invariant that `memberRef` is a superset
    * of `inheritance` is preserved.

    val memberRef = processDependency(DependencyByMemberRef, false) _
    val inheritance = processDependency(DependencyByInheritance, true) _
    val localInheritance = processDependency(LocalDependencyByInheritance, true) _
    val scala2MacroExpansion = processDependency(DependencyByMacroExpansion, false) _
    @deprecated("Use processDependency that takes allowLocal.", "1.1.0")
    def processDependency(context: DependencyContext)(dep: ClassDependency): Unit =
    processDependency(context, true)(dep)
    /*
    * Handles dependency on given symbol by trying to figure out if represents a term
    * that is coming from either source code (not necessarily compiled in this compilation
    * run) or from class file and calls respective callback method.
    */
    def processDependency(context: DependencyContext, allowLocal: Boolean)(
    dep: ClassDependency

    Basically, since we don't include local dependencies in member references any more, we won't include local inheritance either, so the assumption is no longer true.
    This leads to both local inheritance and local member reference to not invalidate correctly, which can be seen in both added test cases.
    My attempt at solving this is to actually include all recompiled classes coming from the compilation unit at https://github.com/rochala/zinc/blob/8e6861fc77750b8acf484234ff07082a9785f940/internal/zinc-core/src/main/scala/sbt/internal/inc/IncrementalCommon.scala#L170-L176
    This allows us to include API changes for local dependencies and solves the first issue.
  2. Because we don't include local dependencies in Relations.memberRef our logic for
    val firstClassTransitiveInvalidation = includeTransitiveInitialInvalidations(
    initial,
    IncrementalCommon.transitiveDeps(initial, log)(dependsOnClass),
    dependsOnClass
    )
    log.debug("Invalidate by brute force:\n\t" + firstClassTransitiveInvalidation)
    firstClassTransitiveInvalidation ++ secondClassInvalidation ++ thirdClassInvalidation ++ recompiledClasses
    is not invalidating whole dependency tree. This can be improved a little by computing transitive dependencies based on firstClassInvalidations instead of initialInvalidations. This will be a bit more heavy to compute, but the result will be something that was in my intended in the first place.
    Side note: We've actually identified infinite compilation happening, because it turns out that firstClassTransitiveInvalidations was not a subset of firstClassInvalidations, the reason why is the same as in point 1. The latter includes local inheritance, whereas the former does not.

After some internal testing, we've gone from 12 cycles to 4 in one problematic module. The reason of such high number of cycles was that we did not include local dependencies (and we had a lot of classes defined in a single source) thus they were not considered recompiled, while in fact they were.

Now the question time:
Q1. I see also another solution to those problems, we could try to meet the membersRef requirement and include local dependencies. This would also solve the issue with transitive invalidations. Do you know the reason why we ignored these kinds of dependencies ?
Q2. Right now, transitive invalidation may actually result in a lot more compilation happening (it actually recompiles all sources once again). Maybe a whole different approach should be considered, e.g. invalidate everything after 4th cycle or something.

…re aggresive first class transitive invalidaton
log.debug("Invalidate by brute force:\n\t" + firstClassTransitiveInvalidation)
firstClassTransitiveInvalidation ++ secondClassInvalidation ++ thirdClassInvalidation ++ recompiledClasses
firstClassInvalidation ++ firstClassTransitiveInvalidation ++ secondClassInvalidation ++
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think of it, it may actually be enough to have
firstClassTransitiveInvalidation ++ secondClassInvalidation ++ thirdClassInvalidation ++ recompiledClasses as firstClassTransitiveInvalidations is now computed from firstClassInvalidation

@eed3si9n
Copy link
Member

eed3si9n commented Jan 3, 2025

@rochala Thanks for the contribution!
Would you mind opening a bug retroactively, for the future record while you're memory is fresh plz?

I also want to ping @Friendseeker for his views since he's been contributing to Zinc lately.

@eed3si9n
Copy link
Member

eed3si9n commented Jan 3, 2025

Q2. Right now, transitive invalidation may actually result in a lot more compilation happening (it actually recompiles all sources once again). Maybe a whole different approach should be considered, e.g. invalidate everything after 4th cycle or something.

Could you elaborate? Zinc already invalidates everything after hitting certain threshold, namely 50% of invalidated sources, or 3 cycle count

public static int defaultTransitiveStep() {
return 3;
}
public static double defaultRecompileAllFraction() {
return 0.5;
}

Are you suggesting that we should increase the threshold to 4 instead?

Q1. I see also another solution to those problems, we could try to meet the membersRef requirement and include local dependencies. This would also solve the issue with transitive invalidations. Do you know the reason why we ignored these kinds of dependencies ?

It seems like we had similar issue reported in 2017 ('Inherited trait definitions do not seem to be handled gracefully by Zinc' #417), and I sent #424 to start tracking previously untracked (?) inheritance within same source. The PR introduced allowLocal:

def processDependency(context: DependencyContext, allowLocal: Boolean)(
dep: ClassDependency

I guess my thinking back then was that transitive invalidation of inheritance needs to be strict, but transitive invalidation of member ref doesn't matter, which is the basis of name hashing theory. So if the same-source object B and object C are recompiled together, I didn't see the need to track that member-ref dependencies?

val recompiledClasses = classesToRecompile ++ getClasses(previous) ++ getClasses(analysis)
val recompiledClasses = classesToRecompile ++
getClasses(previous) ++ getClasses(analysis) ++
invalidatedSources.flatMap(previous.relations.classNames)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense.

@@ -184,13 +184,14 @@ trait Relations {
private[inc] def externalDependencies: ExternalDependencies

/**
* The class dependency relation between classes introduced by member reference.
* The class dependency relation between classes introduced by member reference excluding local references
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Zinc, the term local often means within the same subproject, i.e. local-to-Analysis, so "excluding same-source references" might be less ambiguous.

*
* NOTE: All inheritance dependencies are included in this relation because in order to
* inherit from a member you have to refer to it. If you check documentation of `inheritance`
* you'll see that there's small oddity related to traits being the first parent of a
* class/trait that results in additional parents being introduced due to normalization.
* This relation properly accounts for that so the invariant that `memberRef` is a superset
*
* Because `inheritance` includes local relations, `memberRef` is not a superset of `inheritance`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@eed3si9n
Copy link
Member

eed3si9n commented Jan 3, 2025

Q1. I see also another solution to those problems, we could try to meet the membersRef requirement and include local dependencies.

Also regarding this point, one complication might be the fact that both Scala 3.x and 2.13.x have decided to split out the compiler bridge to a binary artifact published on Maven Central, as opposed to on-the-fly compiled-from-source. This means some range of the Scala version would always embed this characteristics (or issue, depending on how we look at it). So if possible, fixing at at the Zinc level like in IncrementalCommon would be preferable, all else being equal.

@@ -149,11 +148,11 @@ private[inc] abstract class IncrementalCommon(
invalidatedSources: Set[VirtualFile],
classFileManager: XClassFileManager,
pruned: Analysis,
override val previousAnalysis: Analysis,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change relevant to the fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, my thinking was to clean up as this parameter that was not necessary and introduced the confusion by duplication of parameters previous in CycleState vs previousAnalysis here.

@Friendseeker
Copy link
Member

Friendseeker commented Jan 4, 2025

@rochala Thanks for the contribution! Would you mind opening a bug retroactively, for the future record while you're memory is fresh plz?

I also want to ping @Friendseeker for his views since he's been contributing to Zinc lately.

Shall review later today. My new school semester is starting in Jan 6 so things got a little busy on my end.

@eed3si9n
Copy link
Member

eed3si9n commented Jan 5, 2025

@Friendseeker No worries, if you're busy. I think the TLDR of this PR is that in an attempt to fix same-file inheritance invalidation, I managed to create a logical loop hole in Zinc in 2017 via #424.

@rochala
Copy link
Contributor Author

rochala commented Jan 7, 2025

Could you elaborate? Zinc already invalidates everything after hitting certain threshold, namely 50% of invalidated sources, or 3 cycle count

It seems like entering a transitive invalidation path may actually extend compilation instead of making it faster. Let me explain:

  1. If we can't correctly find transitive dependencies for given class and I think that we can't achieve it without same-source relations stored in analysis.
  2. This can go into the following state (with my current changes):
// source source1
class A
class B:
  val a: A = ???
// source source2
class C:
  val b: B = ???
class D()
  val c: C = ???
// source source3
class E:
  val d: D = ???
class F()
  val e :e = ???
// source sourceN...
class G:
  val f: F = ???
class H()
  val g: G = ???

And now if you want to make change in lets say A, we will need to recompile every source n (transitive invalidation also includes all recompiled classes)
If we consider for the sake of this example that we're using only transitive invalidations, this would basically do the following:

// source1 invalidates source2       // cycle 1 | 2 classes compiled
// source2 invalidates source3       // cycle 2 | 3 clasess compiled (2 recompiled , 1 new class) 
// sourceX invalidates source(X + 1) // cycle X | X + 1 clasess compiled (X recompiled , 1 new class) 
  1. Basically we have now recompiled same classes over and over again, thus I find it that there are (maybe not that rare) cases that transitive invalidation may have negative impact on your compilation time. Especially that it triggers after 3rd cycle which usually is not a few sources but a dozen.
  2. It seems to me like the initial idea of transitive invalidations was to do it all in a single cycle, but that may not be possible right now

I'm not proposing to remove it, I'm rather suggesting that at the current state, we may be better to change it a bit as it is not working as intended. My only worry is that this change will not be possible without including local references.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants