Skip to content

记录一次 Epic(史诗级)崩溃

发布于  at 10:28 AM

背景

故事是这样的,有一个交互游戏项目原本已经在某个地方上线过了。后来改到另外一个地方,(有一定变化),但是崩溃的部分是没有什么更改的。每天频繁崩溃、卡死,写程序的人员都找不出原因。然后,我又被迫被叫去查找这种问题。

他们一开始发给我的日志看到了 oodle 解压错误,我一开始叫他们检查资产完整性,考虑可能是拷贝出现问题,甚至可能是硬盘问题。

硬件问题

造成崩溃的原因是由硬件硬件引起的,主要功率和倍频共同导致的。

Oodle Data 解压缩的时候,问题就非常明显,因为与大多数游戏、模拟、音频或渲染代码不一样,解压缩需要执行额外的完整性检查来处理意外或恶意损坏的数据,因此很可能在不一致后检查发现问题。解码失败通常会导致错误消息。

(可能)的原因:

错误日志

(和这个问题明确相关的现场日志)

[2024.07.11-10.46.34:666][325]OodleDataCompression: Display: Oodle: OODLE ERROR : LZ corruption : DecodeOneQuantum fail!

[2024.07.11-10.46.34:666][325]OodleDataCompression: Display: Oodle: OODLE ERROR : LZ corruption : OodleLZ_Decompress failed (0 != 65536)

[2024.07.11-10.46.34:666][325]LogCompression: Error: FCompression::UncompressMemory - Failed to uncompress memory (39086/65536) from address 0000012DF8BD1EF0 using format Oodle, this may indicate the asset is corrupt!
[2024.07.11-10.46.34:666][325]LogIoDispatcher: Warning: Failed decompressing block
[2024.07.11-10.46.34:666][325]LogStreaming: Warning: StartBundleIoRequests: FailedRead: None (0xFA858880C4C1A011) None (0xFA858880C4C1A011) - Failed reading chunk for package:  (Read Error)
[2024.07.11-10.46.34:773][325]LogWindows: Error: appError called: Assertion failed: (Index >= 0) & (Index < ArrayNum) [File:D:\build\++UE5\Sync\Engine\Source\Runtime\Core\Public\Containers\ArrayView.h] [Line: 293]
Array index out of bounds: 587206400 from an array of size 4

[2024.07.11-10.46.34:773][325]LogWindows: Windows GetLastError: 操作成功完成。 (0)

还有 Shader 错误

Missing global shader FDiaphragmDOFReduceCS's permutation 1,

CPU 问题显示出的虚假显存不足:

实际上是由 CPU 导致的

解决方法

后续

时间线

2023 年中期

2024 年

Intel 确认问题

Based on extensive analysis of Intel Core 13th/14th Gen desktop processors returned to us due to instability issues, we have determined that elevated operating voltage is causing instability issues in some 13th/14th Gen desktop processors. Our analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor. Intel is delivering a microcode patch which addresses the root cause of exposure to elevated voltages. We are continuing validation to ensure that scenarios of instability reported to Intel regarding its Core 13th/14th Gen desktop processors are addressed. Intel is currently targeting mid-August for patch release to partners following full validation. Intel is committed to making this right with our customers, and we continue asking any customers currently experiencing instability issues on their Intel Core 13th/14th Gen desktop processors reach out to Intel Customer Support for further assistance.

根据我们对因不稳定问题而退回的英特尔第 13/14 代台式机处理器进行的广泛分析,我们确定了部分 13 代/14 代台式机处理器出现不稳定问题是由于运行电压升高造成的。我们对退回处理器的分析证实,运行电压升高是源于一个微码算法,该算法导致向处理器发出不正确的电压请求。

英特尔正在提供一个微码补丁,以解决导致电压升高的根本原因。我们正在继续进行验证,以确保解决了报告给英特尔的关于其第 13/14 代台式机处理器不稳定的各种情况。英特尔目前计划在完成全面验证后,于 8 月中旬向合作伙伴发布补丁。

英特尔致力于为我们的客户纠正这一问题,我们继续呼吁目前在使用英特尔第 13/14 代台式机处理器时遇到不稳定问题的任何客户,请联系英特尔客户支持以获得进一步的帮助。

参考

  1. [PC] How to Resolve Data Corruption Errors
  2. Intel Processor Instability Causing Oodle Decompression Failures
  3. 英特尔 13/14 代 CPU 崩溃事件时间线
  4. July 2024 Update on Instability Reports on Intel Core 13th and 14th Gen Desktop Processors
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自小谷的随笔

上一篇
爱伦·坡:哥特式神秘悲寂的《海中之城》
下一篇
Haskell 中的 IO 类型