Wrong temperature reading #25

warped-rudi · 2014-12-20T09:33:11Z

On one of my CuBox-i4p (early model), the CPU temperature exposed via hwmon is bogus (i.e. lower than the ambient). I'd guess that it's off by about 20°C. This does not happen on a second CuBox-i4p (newer model) as well as on HB-i2ex and C1-solo

linux4kix · 2014-12-20T10:12:24Z

What are the temperatures it is reporting? Does it move up and down
properly when in use. Please use something like cpuburn to increase the
temp.

On Sat, Dec 20, 2014 at 10:33 AM, Rudi Ihle [email protected]
wrote:

On one of my CuBox-i4p (early model), the CPU temperature exposed via
hwmon is bogus (i.e. lower than the ambient). I'd guess that it's off by
about 20°C. This does not happen on a second CuBox-i4p (newer model) as
well as on HB-i2ex and C1-solo

—
Reply to this email directly or view it on GitHub
#25.

warped-rudi · 2014-12-20T11:58:21Z

Yes, it does move up and down. I simulated the same load situation on both CuBoxes and the difference pretty much exactly 20°K.

warped-rudi · 2014-12-21T13:30:56Z

This seems to be a problem of this particular box. I reverted to the kernel to a version before the thermal changes and the behavior is the same. Probably I didn't pay attention to this in the past. I even got negative temperature readings right after a cold boot. So either the sensor calibration data is bad, the sensor hardware is broken or the clocks are wrong...

susisstrolch · 2014-12-22T11:02:30Z

Same problem here - with an OE5RC3...
Old CBi4pro (Feb '14) - approx 33° when viewing HD content
New CBi4pro (Dec '14) - approx 64° - same content.
Seems we should chease Rabeeh about...

linux4kix · 2014-12-22T11:56:34Z

Are the Older CBi's dev units or purchased units?

On Mon, Dec 22, 2014 at 12:02 PM, susisstrolch [email protected]
wrote:

Same problem here - with an OE5RC3...
Old CBi4pro (Feb '14) - approx 33° when viewing HD content
New CBi4pro (Dec '14) - approx 64° - same content.
Seems we should chease Rabeeh about...

—
Reply to this email directly or view it on GitHub
#25 (comment)
.

susisstrolch · 2014-12-22T14:17:27Z

Both CBis are purchased ones...

Just did some testing:
Idling OE5RC5, HDMI, 1920x1080, 50Hz
old: core: 44°, case: 34°, clock 396000
new: core: 64°, case: 42°, clock 396000

case measured with PT100 probe.

dtb's are identical, kernel is 3.14.25

On 12/22/2014 12:56 PM, Jon Nettleton wrote:

Are the Older CBi's dev units or purchased units?

On Mon, Dec 22, 2014 at 12:02 PM, susisstrolch [email protected]
wrote:

Same problem here - with an OE5RC3...
Old CBi4pro (Feb '14) - approx 33° when viewing HD content
New CBi4pro (Dec '14) - approx 64° - same content.
Seems we should chease Rabeeh about...

—
Reply to this email directly or view it on GitHub

#25 (comment)
.

—
Reply to this email directly or view it on GitHub
#25 (comment).

warped-rudi · 2014-12-22T14:50:16Z

@linux4kix: Mine is probably a dev unit. At least it was shipped as such in Jan'14.

@susisstrolch; Is there really such a difference in the case temperature or is that a typo?

@susisstrolch; Can you do a 'devmem 0x21bc4e0 32' That should show the sensor calibration data. I don't have mine handy right now, but I'd like to compare. If they are the same, there is a chance that they are wrong...

linux4kix · 2014-12-22T15:15:55Z

My guess is that the 25c calibration fuse is wrong.

On Mon, Dec 22, 2014 at 3:50 PM, Rudi Ihle [email protected] wrote:

@linux4kix https://github.com/linux4kix: Mine is probably a dev unit.
At least it was shipped as such in Jan'14.

@susisstrolch https://github.com/susisstrolch; I there really such a
difference in the case temperature or is that a typo?

@susisstrolch https://github.com/susisstrolch; Can you do a 'devmem
0x21bc4e0 32' That should show the sensor calibration data. I don't have
mine handy right now, but I'd like to compare. If they are the same, there
is a chance that they are wrong...

—
Reply to this email directly or view it on GitHub
#25 (comment)
.

susisstrolch · 2014-12-23T08:23:23Z

Yes, thats the real temp diff. Checked twice...

Calibration data is identical on both boxes:

CuBox-i4pro:~ # devmem 0x21bc4e0 32
0x5624D869

Cubox-I4:~ # ./devmem 0x21bc4e0 32
0x5694D969

On 12/22/2014 03:50 PM, Rudi Ihle wrote:

@linux4kix https://github.com/linux4kix: Mine is probably a dev
unit. At least it was shipped as such in Jan'14.

@susisstrolch https://github.com/susisstrolch; I there really such a
difference in the case temperature or is that a typo?

@susisstrolch https://github.com/susisstrolch; Can you do a 'devmem
0x21bc4e0 32' That should show the sensor calibration data. I don't
have mine handy right now, but I'd like to compare. If they are the
same, there is a chance that they are wrong...

—
Reply to this email directly or view it on GitHub
#25 (comment).

warped-rudi · 2014-12-23T12:00:16Z

@linux4kix I think the problem is with 2b1601b . To me it looks like the new 'universal formula' should only be applied to parts that were calibrated according to this. Given the time of the commit, I suspect this applies to chips manufactured after Feb'14. I did a manual calculation with the calibration data of my 'bad' unit and got exactly the difference I observed. Now, the question is how to auto-detect that. The commit message says: 'there will be no hot point calibration data in fuse map from now on'. However, the 'good' units do have hot point data as well! Have not (yet) checked if they are valid and if the old formula will still work.

@susisstrolch 0x5624D869 != 0x5694D969, still puzzled about the temperature difference also 64°C looks a bit high for an idle device

linux4kix · 2014-12-23T12:08:42Z

Rudi, if you revert that patch does temperature reading work properly for
both units? Perhaps we need to test if there is hot point calibration data
and then only use the new formula if there isn't. We should also report
this to fsl as this is a patch pushed to upstream as well and could damage
older chips if pushed hard enough.

64C looks hot for idle, but it may be XBMC running idle which is far from
idle. In those circumstances 64C looks just about right to me.

On Tue, Dec 23, 2014 at 1:00 PM, Rudi Ihle [email protected] wrote:

@linux4kix https://github.com/linux4kix I think the problem is with
2b1601b
2b1601b
. To me it looks like the new 'universal formula' should only be applied to
parts that were calibrated according to this. Given the time of the commit,
I suspect this applies to chips manufactured after Feb'14. I did a manual
calculation with the calibration data of my 'bad' unit and got exactly the
difference I observed. Now, the question is how to auto-detect that. The
commit message says: 'there will be no hot point calibration data in fuse
map from now on'. However, the 'good' units do have hot point data as well!
Have not (yet) checked if they are valid and if the old formula will still
work.

@susisstrolch https://github.com/susisstrolch 0x5624D869 != 0x5694D969,
still puzzled about the temperature difference also 64°C looks a bit high
for an idle device

—
Reply to this email directly or view it on GitHub
#25 (comment)
.

warped-rudi · 2014-12-23T12:15:18Z

Will test when I'm at home this evening.

linux4kix · 2014-12-23T12:17:02Z

great thanks.

On Tue, Dec 23, 2014 at 1:15 PM, Rudi Ihle [email protected] wrote:

Will test when I'm at home this evening.

—
Reply to this email directly or view it on GitHub
#25 (comment)
.

susisstrolch · 2014-12-23T13:14:49Z

Ooops - sloppy pattern matching w/o glasses...

With KODI stopped I get the following values:

CBi Old: 25,00°C @396MHz, i.MX6Q, silicon rev 1.2
CBi New: 40,68°C @396MHz, i.MX6Q, silicon rev 1.5

On 12/23/2014 01:00 PM, Rudi Ihle wrote:

@linux4kix https://github.com/linux4kix I think the problem is with
2b1601b
2b1601b
. To me it looks like the new 'universal formula' should only be
applied to parts that were calibrated according to this. Given the
time of the commit, I suspect this applies to chips manufactured after
Feb'14. I did a manual calculation with the calibration data of my
'bad' unit and got exactly the difference I observed. Now, the
question is how to auto-detect that. The commit message says: 'there
will be no hot point calibration data in fuse map from now on'.
However, the 'good' units do have hot point data as well! Have not
(yet) checked if they are valid and if the old formula will still work.

@susisstrolch https://github.com/susisstrolch 0x5624D869 !=
0x5694D969, still puzzled about the temperature difference also 64°C
looks a bit high for an idle device

—
Reply to this email directly or view it on GitHub
#25 (comment).

linux4kix · 2014-12-23T13:27:32Z

Okay those numbers look better. If reverting the patch fixes the older
silicon rev then we may be able to use that as the identifier for the
algorithm used.
On Dec 23, 2014 2:14 PM, "susisstrolch" [email protected] wrote:

Ooops - sloppy pattern matching w/o glasses...

With KODI stopped I get the following values:

CBi Old: 25,00°C @396MHz, i.MX6Q, silicon rev 1.2
CBi New: 40,68°C @396MHz, i.MX6Q, silicon rev 1.5

On 12/23/2014 01:00 PM, Rudi Ihle wrote:

@linux4kix https://github.com/linux4kix I think the problem is with
2b1601b
<
2b1601b6976a838029fd7695dabab189358acbc0>

. To me it looks like the new 'universal formula' should only be
applied to parts that were calibrated according to this. Given the
time of the commit, I suspect this applies to chips manufactured after
Feb'14. I did a manual calculation with the calibration data of my
'bad' unit and got exactly the difference I observed. Now, the
question is how to auto-detect that. The commit message says: 'there
will be no hot point calibration data in fuse map from now on'.
However, the 'good' units do have hot point data as well! Have not
(yet) checked if they are valid and if the old formula will still work.

@susisstrolch https://github.com/susisstrolch 0x5624D869 !=
0x5694D969, still puzzled about the temperature difference also 64°C
looks a bit high for an idle device

—
Reply to this email directly or view it on GitHub
<
#25 (comment)
.

—
Reply to this email directly or view it on GitHub
#25 (comment)
.

susisstrolch · 2014-12-23T14:22:12Z

Uuups...
strolch@strolch:~/Development/OpenELEC/tools/mkpg/linux-imx_3.14.x.git> git revert 2b1601b
error: could not revert 2b1601b... thermal: imx: update formula for thermal sensor
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add ' or 'git rm '
hint: and commit the result with 'git commit'

Are you talking about the linux-linaro-lsk-v3.14-mx6 branch?
How can I read the raw data (devmem xxx) of sensor values?

linux4kix · 2014-12-23T14:38:50Z

It probably doesn't revert cleanly. I can write a quick patch when I get
home for you to test with.
On Dec 23, 2014 3:22 PM, "susisstrolch" [email protected] wrote:

Uuups...
strolch@strolch:~/Development/OpenELEC/tools/mkpg/linux-imx_3.14.x.git>
git revert 2b1601b
2b1601b
error: could not revert 2b1601b
2b1601b...
thermal: imx: update formula for thermal sensor
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add ' or 'git rm '
hint: and commit the result with 'git commit'

Are you talking about the linux-linaro-lsk-v3.14-mx6 branch?

—
Reply to this email directly or view it on GitHub
#25 (comment)
.

susisstrolch · 2014-12-23T14:45:45Z

Not necessary - fixed - only comments are affected...

susisstrolch · 2014-12-23T15:37:43Z

Same as before with slightly different values...
Old: 25,0° / 34,6°C (no KODI / KODI sitting in OSD)
New: 40,6° / 50,8°C
So it would be really interesting to see the raw value from the thermal sensor...

warped-rudi · 2014-12-23T22:59:52Z

Obviously my manual calculation was flawed. My test showed the same result as @susisstrolch experienced. There is only a small difference between the two formulas. Also both of my CuBoxes are of 'silicon revision 1.2'. So the question remains why the temperature readout of the older one is so low.

@susisstrolch: raw data are at 0x20c8180, bits [19:8] i.e, ((val >> 8) & 0xfff)

susisstrolch · 2014-12-24T07:31:25Z

Here are the raw ones:S
Old: 44°C - 0x4e954106 -> 65
New: 60°C - 0x4ef52906 -> 41

HBi2ex: 33° - 0x51657806 -> 120

Sure about the 0x20c8180?

On 12/23/2014 11:59 PM, Rudi Ihle wrote:

Obviously my manual calculation was flawed. My test showed the same
result as @susisstrolch https://github.com/susisstrolch experienced.
There is only a small difference between the two formulas. Also both
of my CuBoxes are of 'silicon revision 1.2'. So the question remains
why the temperature readout of the older one is so low.

@susisstrolch https://github.com/susisstrolch: raw data are at
0x20c8180, bits [19:8] i.e, ((val >> 8) & 0xfff)

—
Reply to this email directly or view it on GitHub
#25 (comment).

warped-rudi · 2014-12-24T12:56:31Z

The data field is 12bits wide, not only 8.

commit 6ffa30d3f734d4f6b478081dfc09592021028f90 upstream. Bruce reported seeing this warning pop when mounting using v4.1: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0() do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90 Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ linux4kix#25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78 0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000 Call Trace: [<ffffffff8186ac78>] dump_stack+0x4c/0x65 [<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0 [<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70 [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90 [<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0 [<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430 [<ffffffff810d941e>] ? groups_alloc+0x3e/0x130 [<ffffffff810d941e>] groups_alloc+0x3e/0x130 [<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc] [<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc] [<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc] [<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc] [<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4] [<ffffffff810ff970>] ? wait_woken+0xc0/0xc0 [<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4] [<ffffffff810d45bf>] kthread+0x11f/0x140 [<ffffffff810ea815>] ? local_clock+0x15/0x30 [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 [<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0 [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250 ---[ end trace 675220a11e30f4f2 ]--- nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE, which is just wrong. Fix that by finishing the wait immediately if we've found that the list has something on it. Also, we don't expect this kthread to accept signals, so we should be using a TASK_UNINTERRUPTIBLE sleep instead. That however, opens us up hung task warnings from the watchdog, so have the schedule_timeout wake up every 60s if there's no callback activity. Reported-by: "J. Bruce Fields" <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong temperature reading #25

Wrong temperature reading #25

warped-rudi commented Dec 20, 2014

linux4kix commented Dec 20, 2014

warped-rudi commented Dec 20, 2014

warped-rudi commented Dec 21, 2014

susisstrolch commented Dec 22, 2014

linux4kix commented Dec 22, 2014

susisstrolch commented Dec 22, 2014

warped-rudi commented Dec 22, 2014

linux4kix commented Dec 22, 2014

susisstrolch commented Dec 23, 2014

warped-rudi commented Dec 23, 2014

linux4kix commented Dec 23, 2014

warped-rudi commented Dec 23, 2014

linux4kix commented Dec 23, 2014

susisstrolch commented Dec 23, 2014

linux4kix commented Dec 23, 2014

susisstrolch commented Dec 23, 2014

linux4kix commented Dec 23, 2014

susisstrolch commented Dec 23, 2014

susisstrolch commented Dec 23, 2014

warped-rudi commented Dec 23, 2014

susisstrolch commented Dec 24, 2014

warped-rudi commented Dec 24, 2014

Wrong temperature reading #25

Wrong temperature reading #25

Comments

warped-rudi commented Dec 20, 2014

linux4kix commented Dec 20, 2014

warped-rudi commented Dec 20, 2014

warped-rudi commented Dec 21, 2014

susisstrolch commented Dec 22, 2014

linux4kix commented Dec 22, 2014

susisstrolch commented Dec 22, 2014

warped-rudi commented Dec 22, 2014

linux4kix commented Dec 22, 2014

susisstrolch commented Dec 23, 2014

warped-rudi commented Dec 23, 2014

linux4kix commented Dec 23, 2014

warped-rudi commented Dec 23, 2014

linux4kix commented Dec 23, 2014

susisstrolch commented Dec 23, 2014

linux4kix commented Dec 23, 2014

susisstrolch commented Dec 23, 2014

linux4kix commented Dec 23, 2014

susisstrolch commented Dec 23, 2014

susisstrolch commented Dec 23, 2014

warped-rudi commented Dec 23, 2014

susisstrolch commented Dec 24, 2014

warped-rudi commented Dec 24, 2014