Skip to content

Latest commit

 

History

History
343 lines (262 loc) · 22.2 KB

performance.md

File metadata and controls

343 lines (262 loc) · 22.2 KB

Performance Overview

Portal performance from a customer's perspective is seen as all experiences throughout the product. As an extension author you have a duty to uphold your experience to the performance bar at a minimum.

For all our performance metrics we measure using real user monitoring (RUM) from production only traffic.

Area Goal Telemetry Action How is it measured?
Extension < 2 secs @ 95th percentile ExtensionLoad The time it takes for your extension's home page to be loaded and initial scripts, the initialize call to complete within your Extension definition file
Blade - MsPortalFx < 4 secs @ 95th percentile BladeFullReady The time it takes for the blade's onInitialize or onInputsSet to resolve and all the parts on the blade to become ready
Blade - ReactView > 90 @ 5th percentile BladeLighthouse Using standard performance insights, see below for the full details.
Network requests < 1 secs @ 95th percentile ClientAjax (table) The time it takes for the client to complete the request. This is only measure for interactive requests.
Part < 4 secs @ 95th percentile PartReady Time it takes for the part to be rendered and then the part's OnInputsSet to resolve

Extension-loading performance

Extension-loading performance effects both Blade and Part performance, as your extension is loaded and unloaded as and when it is required. In the case where a user is visiting your resource blade for the first time, the Fx will load up your extension and then request the view model, consequently your Blade/Part performance is affected. If the user were to browse away from your experience and browse back before your extension is unloaded, obviously the user's second visit will be faster, as they don't pay the cost of loading the extension.

Blade performance

Depending on the authoring model, blade performance is measured by either the Lighthouse metrics or BladeFullReady.

Lighthouse

Lighthouse performance score is a weighted average of the metric scores ranging from 0 - 100. (> 90 being a good score).

The Portal team does not set goals for each discrete metric. Rather, the goal is achieve a Lighthouse score > 90 @ 5th percentile.

Lighthouse is a de facto industry standard, composed of multiple, user-centric metrics:

Aspect Metric Units
How fast to render initial content (static and shimmers)? First Contentful Paint (FCP) Seconds from t0
How fast to render main content (loaded data)? Largest Contentful Paint (LCP) Seconds from t0
Does UI shift during rendering? Cumulative Layout Shift (CLS) Custom units
When is the UI responsive to user interaction? Time to Interactive (TTI) Seconds from t0
Is the UI ever unresponsive to user interaction? Total Blocking Time (TBT) Seconds (absolute)
<Not implemented, lab-only measurement> Speed Index (SI) Custom units

Note: Speed Index is not include in the Azure portal calculations because it is intrinsically a lab-measured metric

The metrics are weighted slightly different to the standard Lighthouse calculations due to the lack of the 6th (SI) metric.

Audit Weight
FCP 11%
LCP 28%
TTI 11%
TBT 33%
CLS 17%

You can see the function used to calculate the Ligthhouse score here: https://aka.ms/portalfx/kusto/lighthousefunction The per page (view or blade) lighthouse score is a 5th percentile calculation of all individual loads of a given page (view or blade). You can see all lighthouse scores for the last 7 days here: https://aka.ms/portalfx/kusto/lighthouse

Why Lighthouse?

The Lighthouse approach is an evolution on the Portal's previous performance measures. When assessing page (view or blade) performance, performance can be broken down in to various aspects; capturing the page's total loading time, the customer's perceived loading time, UI stability, and more.. There is no single metric that captures everything and BladeFullReady only captures the initial page's loading time as deemed by the extension author. Moving towards a weighted approach based on various metrics allows us to better represent the various approaches to delivering our customers a 'fast' experience.

BladeFullReady

BladeFullReady can be broken down into 4 stages:

  1. If the extension isn't loaded, load the extension
  2. Download and parse the required dependencies for the blade
  3. Execute and wait for the blade's onInitialize() promise to resolve
  4. Process promise resolution from the main thread and complete the initial rendering of the Blade.

All of these perf costs are represented under the one BladeFullReady action and the full end to end duration is tracked under the duration column.

For an additional breakdown of the time spent you can inspect a native performance profile or the data column of the BladeFullReady telemetry event to find the following properties:

Stage Native marker identifier Data property name Description
0 ExtLoadBladeBundles bundleLoadingTime The async time spent requiring the BladeDefinition (which today is co-bundled with the Blade class’ module). This covers the time downloading and processing your Blade’s bundles.
1 ExtInstantiateBladeClass Not Tracked The async time spent diContainer.getAsync’ing the Blade class. This and the following ‘ExtBladeOnInitializeSync’ show up as insignificantly small, which itself can help refocus on larger time-slices.
2 ExtBladeOnInitializeSync Not Tracked The sync time spent in the Blade’s ‘onInitialize’ method.
3 ExtBladeOnInitializeAsync onInitializeAsyncTime The async time from the point ‘onInitialized’ is called to the point where the Promise returned from ‘onInitialize’ is resolved. All these are measured in the extension web worker.
* ExtBladePrepareFirstAjax prepareFirstAjaxTime The time spent from the point ‘onInitialized’ is called to the point where the first ajax call is sent from the extension web worker. This is fuzzy because the FX ajax client isn’t explicitly bound to a Blade, but inaccuracies should be outlier cases and should be easy to exclude based on knowledge of the Blade.

If your blade is a FrameBlade or AppBlade there is an additional initialization message from your iframe to your viewmodel which is also tracked, see the samples extension framepage.js for an example of what messages are required.

Network requests

This KPI is only held for interactive network requests. Interactive is defined as after initial load.

The goal of the network request tracking is to proxy customer task or interaction performance.

This time is strictly measuring from the start to the end of the request as processed by the client. This is to best represent what the customer is experiencing.

Part performance

Similar to Blade performance, Part performance is spread across a couple of areas:

  1. Part's constructor
  2. Part's 'onInitialize' or 'onInputsSet'

If your part is a FramePart there is an additional initialization message from your iframe to your viewmodel which is also tracked, see the samples extension framepage.js for an example of what messages are required.

All of these perf costs are represented under the one PartReady action.

How to assess your performance

There are two methods to assess your performance:

  1. Visit the IbizaFx provided PowerBi report Extension performance/reliability report
  2. Run Kusto queries locally to determine your numbers, see below for the individual queries

If you have permission issues with either the PowerBi dashboard or Kusto cluster follow the telemetry onboarding guide

Extension-loading

database('Framework').ExtensionPerformance(ago(1h), now())

ExtensionPerformance will return a table with the following columns:

  • Extension
    • The name of the extension
  • Loads
    • How many times the extension was loaded within the given date range
  • 50th, 80th, 95th
    • The time it takes for your extension to initialize. This is captured under the ExtensionLoad action in telemetry
  • HostingServiceloads
    • The number of loads from the hosting service
  • UsingTheHostingService
    • If the extension is predominantly using the hosting service in production

Lighthouse query

database('Framework').LighthousePerformance(ago(1h), now(), "")

You can filter the lighthouse performance by passing in the blade/extension identifier as the third parameter

LighthousePerformance will return a table with the following columns:

  • name
    • The view identifier, includes the extension name
  • OverallScore
    • This is the metric that the view is measure against
  • Lighthouse_Loads
    • How many loads were recorded for that view

Then there is a section of investigation metrics, which can be used to prioritise areas of investment. These are gathered by assessing any load which was worse than the 5th percentile OverallScore, then taking the 50th percentile of that sample.

  • FirstContentfulPaint
  • LargestContentfulPaint
  • TimeToInteractive
  • TotalBlockingTime
  • CumulativeLayoutShift
  • Lighthouse_Details
    • This provideds a breakdown of insights for each metric;
      • Value
      • Score
      • Potential gain
      • Utilised %
      • Max potential score
    • Using the Potential gain you can prioritise which metric to invest in to improve your overall score

BladeFullReady query

database('Framework').BladePerformance(ago(1h), now())

The subtle difference with the standard BladeFullReady marker is that if the blade is opened within a resource menu blade we will attribute the time it takes to resolve the getMenuConfig promise as the resource menu blade is loaded to the 95th percentile of the 'BladeFullReady' duration. This is attributed using a proportional calculation based on the number of times the blade is loaded inside the menu blade.

For example, a blade takes 2000ms to complete its BladeFullReady and 2000ms to return its getMenuConfig. It is only loaded once (1) in the menu blade out of its 10 loads. Its overall reported FullDuration would be 2200ms.

BladePerformance will return a table with the following columns:

  • FullBladeName, Extension, BladeName
    • Blade/Extension identifiers
  • BladeCount
    • The number of blade loads within the given date range
  • InMenuLoads
    • The number of in menu blade loads within the given date range
  • PctOfMenuLoads
    • The percentage of in menu blade loads within the given date range
  • Samples
    • The number of loads which were tracking the number of XHR requests
  • StaticMenu
    • If the getMenuConfig call returns within < 10ms, only applicable to ResourceMenu cases
  • MenuConfigDuration95
    • The 95th percentile of the getMenuConfig call
  • LockedBlade
    • If the blade is locked, ideally blades are now template blades or no-pdl
    • All no-pdl and template blades are locked, pdl blades can be made locked by setting the locked property to true
  • FullDuration50, 80, 95, 99
    • The time it takes for the BladeFullReady + (PctOfMenuLoads * the getMenuConfig to resolve)

Network requests query

database('Framework').InteractiveNetworkPerformance(ago(1d), now(), "Extension/YOUR_EXTENSION_NAME/Blade/YOUR_BLADE_NAME")

Update or remove the BladeName filter to match your needs, only use your extension name or only scope to a single blade.

InteractiveNetworkPerformance will return a table with the following columns:

  • Date
    • End date for the rolling 7 days calculation
  • Extenion, BladeName
    • Extension/Blade identifier
  • Name
    • Identifier for the network request
  • Occurrences
    • Number of times the request was issued (Note: Batch requests are expanded from 1 to N, increasing the occurrences count by N vs 1)
  • Requests
    • Number of unique requests (Note: Batched requests count as 1)
  • BladeInstances
    • Total number of unique blades
  • UniqueCustomers
    • Total number of unique customers
  • 50th, 80th, 95th, 99th
    • The percentile duration time recorded for the given request
  • KPI Classification
    • The KPI is measured against the 95th percentile duration - Green: <= 1s, Yellow: <= 2s, and Red: > 2s

Part

database('Framework').PartPerformance(ago(1h), now())

PartPerformance will return a table with the following columns:

  • FullPartName, Extension, PartName
    • Part/Extension identifiers
  • PartCount
    • How many times the part was loaded within the given date range
  • 50th, 80th, 95th, 99th
    • The time it takes for your part to resolve its onInputsSet or onInitialize promise. This is captured under the PartReady action in telemetry
  • RedScore Number of violations for tracked bars

Performance Frequently Asked Questions (FAQ)

My Extension 'load' is above the bar. What should I do?

  1. Profile what is happening in your extension load. Profile your scenario
  2. Are you using the Portal's ARM token? If no, verify if you can use the Portal's ARM token and if yes, follow: Using the Portal's ARM token
  3. Are you on the hosting service? If no, migrate to the hosting service: Hosting service documentation
  4. Are you using obsolete bundles?
    • If yes, remove your dependency to them and then remove the obsolete bitmask. This is a blocking download before your extension load. See below for further details.
  5. See our best practices

My Lighthouse score is below the bar. What should I do?

See Improving Lighthouse scores.

My network request is shown as "[UNNAMED]". What should I do?

See Naming network requests.

My Blade 'FullReady' is above the bar. What should I do?

  1. Assess what is happening in your Blades's onInitialize (no-PDL) or constructor and onInputsSet (PDL). Profile your scenario
    1. Can that be optimized?
  2. If there are any AJAX calls;
    1. Can they use batch? If so, migrate over to use the batch api.
    2. Wrap them with custom telemetry and ensure they you aren't spending a large amount of time waiting on the result. If you are to do this, please only log one event per blade load, this will help correlate issues but also reduce unneccesary load on telemetry servers.
  3. Are you using an old PDL "Blade containing Parts"? How many parts are on the blade?
    • If there is only a single part, if you're not using a no-pdl blade or <TemplateBlade> migrate your current blade to a no-pdl blade.
    • If there are multiple parts, migrate over to use a no-pdl blade.
    • Ensure to support any old pinned parts when you migrate.
  4. Does your blade open within a resource menu blade?
    • If it does, ensure the getMenuConfig call is returned statically/synchronously (< 10ms). You can make use of the enabled/disabled observable property on menu items, if you need to asynchronously determine to enable a menu item.
  5. See our best practices

My Part 'Ready' is above the bar. What should I do?

  1. Assess what is happening in your Part's onInitialize (no-PDL) or constructor and onInputsSet (PDL), including time taken in any async operations associated with the returned Promise. Profile your scenario
    1. Can that be optimized?
  2. If there are any AJAX calls;
    1. Can they use batch? If so, migrate over to use the batch api.
    2. Wrap them with custom telemetry and ensure they you aren't spending a large amount of time waiting on the result. If you are to do this, please only log one event per part load, this will help correlate issues but also reduce unneccesary load on telemetry servers.
  3. See our best practices

Performance office hours

Sure! Book in some time in the Azure performance office hours.

Don't forget to include context for the meeting, which blade or view you're wanting to optimise.

  • When? Wednesdays from 13:00 to 16:00
  • Where? Teams meeting
  • Contacts: Azure Portal Performance Office Hours (apperfofficehours)
  • Goals
    • Help extensions to meet the performance bar
    • Help extensions to measure performance
    • Help extensions to understand their current performance status
  • How to book time: Send a meeting request with the following
    • TO: apperfofficehours;
    • Subject: YOUR_EXTENSION_NAME: Azure performance office hours
    • Location: Teams meeting