Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

Latency_nice proposal for use in TT #8

Open
smarkusg opened this issue Nov 20, 2021 · 9 comments
Open

Latency_nice proposal for use in TT #8

smarkusg opened this issue Nov 20, 2021 · 9 comments
Labels
enhancement New feature or request

Comments

@smarkusg
Copy link

smarkusg commented Nov 20, 2021

Good morning
First of all, I want to apologize for my English

Additionally, I want to thank you for your hard work on Linux Scheduler and more
especially for BABY-CPU-SCHEDULE, thanks to which I am constantly learning
various algorithms.
I was forced to do so by the COVID-19 pandemic to bring in my 10-year-old
son's laptop for remote work.
I am not a programmer and I also want to mention it at the outset.

Back to the point
I am currently checking my stock kernel 5.13 with your Cachy Scheduler
v5.9-Idle with additional patches some solution.Parth Shah latency_nice series of patches

https://lkml.org/lkml/2020/5/7/575

Unfortunately I don't have benchmarks but implemented it alongside
MLFQ for the classification of latency_nice tasks. After a few modifications, the kernel
classifies
only user processes with no children, leaving the system alone. As
for me the usability experience for normal system operation is very much
promising.
If you are able and have time to experiment, you can take a look at
series of patches
Patha Shah. Maybe the idea will be useful for the development of TT?

Currently I'm not going to switch to kernel> 5.13, I don't
I don't know why, but for me it works weird on the desktop (subjectively
feeling).

I did a general test of my kernels with a backport to 5.13 of your solutions, but
as you know, what's good for desktops doesn't always equal performance.
Mine is cachyb2. I know compiling Clang 13 with LTO is always faster than GCC
even with the GCC LTO, but the overall picture shows that 5.14 and 5.15 after the changes
weirdly poorly with the SCHED_CORE changes.

https://openbenchmarking.org/result/2111136-IB-SCHEDULER42&export=pdf

Same kernel settings as for BABY-CPU: CONFIG_HZ_803 and https://github.com/hamadmarri/cacule-cpu-scheduler/blob/master/scripts/apply_suggested_configs.sh

I can reveal a binary version of my kernel.
I'm ashamed of my code.
I don't want to publish this to you at the moment.

Thank you for your interest

@raykzhao
Copy link

raykzhao commented Nov 21, 2021

Hi @smarkusg @hamadmarri,

If my understanding about the patch is correct, what that patch seems to do is:

  1. Introduce a latency_nice value for each task that can be set from userspace. This value will also be propagated to the children of that task when fork.
  2. For each CPU, if there are tasks with latency_nice==-20, then it prevents that CPU from entering the IDLE state.

I remember those still open issues in CacULE with regards to IDLE/wakeup in NO_HZ settings. Since tt-scheduler can detect the type of tasks, we probably do not need the userspace to manually set the latency_nice value for realtime/interactive tasks. Couldn't we try something like:

  1. For REALTIME or INTERACTIVE tasks, set latency_sensitive=1. This lentency_sensitive replaces the latency_nice value from that patch, since for the use case in that patch, the scaling in [-20, 19] never used as it only checked whether latency_nice is -20. This value will also be propagated to the children of that task when fork.
  2. For each CPU, if there is a task with latency_sensitive=1, then prevent that CPU from entering the IDLE state.

@hamadmarri, What do you think?

@hamadmarri
Copy link
Owner

Hi @smarkusg

Thank you so much for your experiments. I am reading Part Shah's patch. I will check if we can integrate the patch with TT as @raykzhao showed.

Regarding the benchmarks in (https://openbenchmarking.org/result/2111136-IB-SCHEDULER42&export=pdf), I am seeing TT performs poorly in those tests. Are there any differences in kernel configs or patches?

Thank you so much for the proposal.

@hamadmarri
Copy link
Owner

hamadmarri commented Nov 21, 2021

@smarkusg @raykzhao

I am having a difficulty to find the full patch from https://lkml.org/lkml/2020/5/7/575
If anyone can help send me the link that contains the whole patch, or teach me how the lkml.org patch navigation works :/

Sorry about that

@hamadmarri hamadmarri added the enhancement New feature or request label Nov 21, 2021
@hamadmarri
Copy link
Owner

hamadmarri commented Nov 21, 2021

@raykzhao @smarkusg

Please check this patch
latsens.patch.zip

I am running it right now. It is more likely similar to hz_periodic even though I have nohz_full set, I got very similar ticks numbers

cat /proc/interrupts | grep -i local
LOC:     761632     700412     701693     695537   Local timer interrupts

And the fan is crying with 1666Hz

Please let me know if any performance gain in your tests, I will run some test soon

Thank you

@hamadmarri
Copy link
Owner

Notice that realtime tasks can be assigned to all cpus since the boot up. There must be a fine way to decay nr_lat_sensitive for old sleeping realtime tasks. For now it is just behaving like periodic ticks.

@hamadmarri
Copy link
Owner

hamadmarri commented Nov 21, 2021

R2:

latsens-r2.patch.zip

Every 19ms, the nr_lat_sens gets decremented by 1. This at least relaxes the ticks for idle cpus:

cat /proc/interrupts | grep -i local
LOC:     656412     121611      99728      85649   Local timer interrupts

@ptr1337
Copy link
Contributor

ptr1337 commented Nov 21, 2021

Here is the latency patch the original one

From d6fa9e1bc40d6c563ad56718ae1813fec361c943 Mon Sep 17 00:00:00 2001
From: "P. Jung" <[email protected]>
Date: Sun, 21 Nov 2021 11:15:01 +0000
Subject: [PATCH] latency-test

Signed-off-by: P. Jung <[email protected]>
---
 include/linux/sched.h            |  1 +
 include/uapi/linux/sched.h       |  4 +++-
 include/uapi/linux/sched/types.h | 19 +++++++++++++++++++
 init/init_task.c                 |  1 +
 kernel/sched/core.c              | 26 ++++++++++++++++++++++++++
 kernel/sched/debug.c             |  1 +
 kernel/sched/sched.h             | 18 ++++++++++++++++++
 tools/include/uapi/linux/sched.h |  4 +++-
 8 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c1a927ddec64..2acfec4589e2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -774,6 +774,7 @@ struct task_struct {
 	int				static_prio;
 	int				normal_prio;
 	unsigned int			rt_priority;
+	int				latency_nice;
 
 	const struct sched_class	*sched_class;
 	struct sched_entity		se;
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 3bac0a8ceab2..b2e932c25be6 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -132,6 +132,7 @@ struct clone_args {
 #define SCHED_FLAG_KEEP_PARAMS		0x10
 #define SCHED_FLAG_UTIL_CLAMP_MIN	0x20
 #define SCHED_FLAG_UTIL_CLAMP_MAX	0x40
+#define SCHED_FLAG_LATENCY_NICE		0x80
 
 #define SCHED_FLAG_KEEP_ALL	(SCHED_FLAG_KEEP_POLICY | \
 				 SCHED_FLAG_KEEP_PARAMS)
@@ -143,6 +144,7 @@ struct clone_args {
 			 SCHED_FLAG_RECLAIM		| \
 			 SCHED_FLAG_DL_OVERRUN		| \
 			 SCHED_FLAG_KEEP_ALL		| \
-			 SCHED_FLAG_UTIL_CLAMP)
+			 SCHED_FLAG_UTIL_CLAMP		| \
+			 SCHED_FLAG_LATENCY_NICE)
 
 #endif /* _UAPI_LINUX_SCHED_H */
diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h
index f2c4589d4dbf..0aa4e3b6ed59 100644
--- a/include/uapi/linux/sched/types.h
+++ b/include/uapi/linux/sched/types.h
@@ -10,6 +10,7 @@ struct sched_param {
 
 #define SCHED_ATTR_SIZE_VER0	48	/* sizeof first published struct */
 #define SCHED_ATTR_SIZE_VER1	56	/* add: util_{min,max} */
+#define SCHED_ATTR_SIZE_VER2	60	/* add: latency_nice */
 
 /*
  * Extended scheduling parameters data structure.
@@ -98,6 +99,22 @@ struct sched_param {
  * scheduled on a CPU with no more capacity than the specified value.
  *
  * A task utilization boundary can be reset by setting the attribute to -1.
+ *
+ * Latency Tolerance Attributes
+ * ===========================
+ *
+ * A subset of sched_attr attributes allows to specify the relative latency
+ * requirements of a task with respect to the other tasks running/queued in the
+ * system.
+ *
+ * @ sched_latency_nice	task's latency_nice value
+ *
+ * The latency_nice of a task can have any value in a range of
+ * [LATENCY_NICE_MIN..LATENCY_NICE_MAX].
+ *
+ * A task with latency_nice with the value of LATENCY_NICE_MIN can be
+ * taken for a task with lower latency requirements as opposed to the task with
+ * higher latency_nice.
  */
 struct sched_attr {
 	__u32 size;
@@ -120,6 +137,8 @@ struct sched_attr {
 	__u32 sched_util_min;
 	__u32 sched_util_max;
 
+	/* latency requirement hints */
+	__s32 sched_latency_nice;
 };
 
 #endif /* _UAPI_LINUX_SCHED_TYPES_H */
diff --git a/init/init_task.c b/init/init_task.c
index 2d024066e27b..048d3a932e81 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -78,6 +78,7 @@ struct task_struct init_task
 	.prio		= MAX_PRIO - 20,
 	.static_prio	= MAX_PRIO - 20,
 	.normal_prio	= MAX_PRIO - 20,
+	.latency_nice	= 0,
 	.policy		= SCHED_NORMAL,
 	.cpus_ptr	= &init_task.cpus_mask,
 	.user_cpus_ptr	= NULL,
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index aea60eae21a7..fe7d49c12176 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4341,6 +4341,9 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 	 */
 	p->prio = current->normal_prio;
 
+	/* Propagate the parent's latency requirements to the child as well */
+	p->latency_nice = current->latency_nice;
+
 	uclamp_fork(p);
 
 	/*
@@ -4357,6 +4360,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 		p->prio = p->normal_prio = p->static_prio;
 		set_load_weight(p, false);
 
+		p->latency_nice = DEFAULT_LATENCY_NICE;
 		/*
 		 * We don't need the reset flag anymore after the fork. It has
 		 * fulfilled its duty:
@@ -7191,6 +7195,9 @@ static void __setscheduler_params(struct task_struct *p,
 	p->rt_priority = attr->sched_priority;
 	p->normal_prio = normal_prio(p);
 	set_load_weight(p, true);
+
+	if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE)
+		p->latency_nice = attr->sched_latency_nice;
 }
 
 /*
@@ -7317,6 +7324,17 @@ static int __sched_setscheduler(struct task_struct *p,
 			return retval;
 	}
 
+	if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE) {
+		if (attr->sched_latency_nice > MAX_LATENCY_NICE)
+			return -EINVAL;
+		if (attr->sched_latency_nice < MIN_LATENCY_NICE)
+			return -EINVAL;
+		/* Use the same security checks as NICE */
+		if (attr->sched_latency_nice < p->latency_nice &&
+		    !capable(CAP_SYS_NICE))
+			return -EPERM;
+	}
+
 	if (pi)
 		cpuset_read_lock();
 
@@ -7351,6 +7369,9 @@ static int __sched_setscheduler(struct task_struct *p,
 			goto change;
 		if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)
 			goto change;
+		if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE &&
+		    attr->sched_latency_nice != p->latency_nice)
+			goto change;
 
 		p->sched_reset_on_fork = reset_on_fork;
 		retval = 0;
@@ -7649,6 +7670,9 @@ static int sched_copy_attr(struct sched_attr __user *uattr, struct sched_attr *a
 	    size < SCHED_ATTR_SIZE_VER1)
 		return -EINVAL;
 
+	if ((attr->sched_flags & SCHED_FLAG_LATENCY_NICE) &&
+	    size < SCHED_ATTR_SIZE_VER2)
+		return -EINVAL;
 	/*
 	 * XXX: Do we want to be lenient like existing syscalls; or do we want
 	 * to be strict and return an error on out-of-bounds values?
@@ -7886,6 +7910,8 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
 	get_params(p, &kattr);
 	kattr.sched_flags &= SCHED_FLAG_ALL;
 
+	kattr.sched_latency_nice = p->latency_nice;
+
 #ifdef CONFIG_UCLAMP_TASK
 	/*
 	 * This could race with another potential updater, but this is fine
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 17a653b67006..b11a32f21164 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1038,6 +1038,7 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
 #endif
 	P(policy);
 	P(prio);
+	P(latency_nice);
 	if (task_has_dl_policy(p)) {
 		P(dl.runtime);
 		P(dl.deadline);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3d3e5793e117..ea478879e67d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -106,6 +106,24 @@ extern void call_trace_sched_update_nr_running(struct rq *rq, int count);
  */
 #define NS_TO_JIFFIES(TIME)	((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
 
+/*
+ * Latency nice is meant to provide scheduler hints about the relative
+ * latency requirements of a task with respect to other tasks.
+ * Thus a task with latency_nice == 19 can be hinted as the task with no
+ * latency requirements, in contrast to the task with latency_nice == -20
+ * which should be given priority in terms of lower latency.
+ */
+#define MAX_LATENCY_NICE	19
+#define MIN_LATENCY_NICE	-20
+
+#define LATENCY_NICE_WIDTH	\
+	(MAX_LATENCY_NICE - MIN_LATENCY_NICE + 1)
+
+/*
+ * Default tasks should be treated as a task with latency_nice = 0.
+ */
+#define DEFAULT_LATENCY_NICE	0
+
 /*
  * Increase resolution of nice-level calculations for 64-bit architectures.
  * The extra resolution improves shares distribution and load balancing of
diff --git a/tools/include/uapi/linux/sched.h b/tools/include/uapi/linux/sched.h
index 3bac0a8ceab2..ecc4884bfe4b 100644
--- a/tools/include/uapi/linux/sched.h
+++ b/tools/include/uapi/linux/sched.h
@@ -132,6 +132,7 @@ struct clone_args {
 #define SCHED_FLAG_KEEP_PARAMS		0x10
 #define SCHED_FLAG_UTIL_CLAMP_MIN	0x20
 #define SCHED_FLAG_UTIL_CLAMP_MAX	0x40
+#define SCHED_FLAG_LATENCY_NICE		0X80
 
 #define SCHED_FLAG_KEEP_ALL	(SCHED_FLAG_KEEP_POLICY | \
 				 SCHED_FLAG_KEEP_PARAMS)
@@ -143,6 +144,7 @@ struct clone_args {
 			 SCHED_FLAG_RECLAIM		| \
 			 SCHED_FLAG_DL_OVERRUN		| \
 			 SCHED_FLAG_KEEP_ALL		| \
-			 SCHED_FLAG_UTIL_CLAMP)
+			 SCHED_FLAG_UTIL_CLAMP		| \
+			 SCHED_FLAG_LATENCY_NICE)
 
 #endif /* _UAPI_LINUX_SCHED_H */
-- 
2.34.0

@smarkusg
Copy link
Author

smarkusg commented Nov 22, 2021

Hi @hamadmarri

Benchamark for TT marked as git20211 for kernel 5.15 came from Xanmod repository.
The difference in configuration was as in the current Xanmod release:

edge -> tt

CONFIG_IRQ_WORK=y
 CONFIG_BUILDTIME_TABLE_SORT=y
 CONFIG_THREAD_INFO_IN_TASK=y
+CONFIG_TT_SCHED=y
+CONFIG_TT_ACCOUNTING_STATS=y
 
 #
 # General setup
@@ -122,7 +124,6 @@ CONFIG_USERMODE_DRIVER=y
 # CONFIG_PREEMPT_NONE is not set
 CONFIG_PREEMPT_VOLUNTARY=y
 # CONFIG_PREEMPT is not set
-CONFIG_SCHED_CORE=y
 
 #
 # CPU/Task time and stats accounting
@@ -167,8 +168,7 @@ CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
 #
 # Scheduler features
 #
-CONFIG_UCLAMP_TASK=y
-CONFIG_UCLAMP_BUCKETS_COUNT=5
+# CONFIG_UCLAMP_TASK is not set
 # end of Scheduler features
 
 CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
@@ -184,11 +184,6 @@ CONFIG_MEMCG_SWAP=y
 CONFIG_MEMCG_KMEM=y
 CONFIG_BLK_CGROUP=y
 CONFIG_CGROUP_WRITEBACK=y
-CONFIG_CGROUP_SCHED=y
-CONFIG_FAIR_GROUP_SCHED=y
-CONFIG_CFS_BANDWIDTH=y
-# CONFIG_RT_GROUP_SCHED is not set
-CONFIG_UCLAMP_TASK_GROUP=y
 CONFIG_CGROUP_PIDS=y
 CONFIG_CGROUP_RDMA=y
 CONFIG_CGROUP_FREEZER=y
@@ -210,8 +205,6 @@ CONFIG_USER_NS=y
 CONFIG_PID_NS=y
 CONFIG_NET_NS=y
 CONFIG_CHECKPOINT_RESTORE=y
-CONFIG_SCHED_AUTOGROUP=y
-CONFIG_SCHED_AUTOGROUP_DEFAULT_ENABLED=y
 # CONFIG_SYSFS_DEPRECATED is not set
 CONFIG_RELAY=y
 CONFIG_BLK_DEV_INITRD=y
@@ -513,9 +506,9 @@ CONFIG_EFI_MIXED=y
 # CONFIG_HZ_100 is not set
 # CONFIG_HZ_250 is not set
 # CONFIG_HZ_300 is not set
-CONFIG_HZ_500=y
-# CONFIG_HZ_1000 is not set
-CONFIG_HZ=500
+# CONFIG_HZ_500 is not set
+CONFIG_HZ_1000=y
+CONFIG_HZ=1000
 CONFIG_SCHED_HRTICK=y
 CONFIG_KEXEC=y
 CONFIG_KEXEC_FILE=y

Kernel marked as 5.13.19*tt was compiled like rest of my 5.13 kernels - same LTO=full clang "CONFIG_HZ_803=y" configuration and simple settings like for Baby-CPU-Scheduler.

As I find a moment I will reinstall the current tt 5.15 release for xanmod from the site and compare in a simple benchmark.

I will also check the patch for nr_lat_sensitive and let you know too.

Thanks again for your work.

@hamadmarri
Copy link
Owner

R2:

latsens-r2.patch.zip

Every 19ms, the nr_lat_sens gets decremented by 1. This at least relaxes the ticks for idle cpus:

cat /proc/interrupts | grep -i local
LOC:     656412     121611      99728      85649   Local timer interrupts

Hi @everyone,

Any testing updates/findings about latsens-r2.patch?

Thank you

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants