基于冗余工作的COTS微控制器纠错系统

R. M. França, Érico L. Marques, Tiago S. Da Silva, V. Parro
{"title":"基于冗余工作的COTS微控制器纠错系统","authors":"R. M. França, Érico L. Marques, Tiago S. Da Silva, V. Parro","doi":"10.1109/AERO53065.2022.9843682","DOIUrl":null,"url":null,"abstract":"This paper presents the implementation, testing and analysis of an error detection and correction architecture for critical systems based on majority voting of several commercial off-the-shelf (COTS) microcontroller units (MCU). The architecture is based on at least three MCUs operating simultaneous and exchanging information by a controller area network (CAN) bus without the concept of master and slave. All MCUs must run the same software and maintain a Health Table with relevant information regarding each other. They can only differentiate themselves by a unique identifier set by hardware or by software at compile time. At any point during the code execution a vote involving all MCUs can be initiated to verify a result. The vote always uses only three of the system MCUs to try to reach a majority. This allows high flexibility regarding what units are used in the voting process. An error tracking system allows MCUs that are injecting too many errors to be isolated, so the system always uses only the most reliable units for the final vote. Once a majority is reached, all units can compare their own results to it and correct themselves if an error was detected. A real implementation of this architecture was created using four ARM-M4 MCUs for testing and use in an academic CubeSat. A flexible software implementation is presented and allows an error verification to be executed as many times as needed and in any number of variables. For different votes with different variables to be completely independent of each other, each votable variable needs to have its own Health Table. By creating a flexible Health Table structure in software, it's possible to add or remove votable variables easily, and to use the exact same voting function for any of them. The messages exchanged by CAN bus contains information regarding the sender, destination, message type and variable to vote, allowing the same message structure to be re-used. Combined with several internal security mechanisms, the system can detect faulty messages and keep operating even if the CAN bus itself corrupts the messages. To test the architecture and its implementation, an error generator was created using a pseudorandom number generator (PRNG). Each MCU can generate corrupted results for the vote process according to an individual user-defined probability. This allowed several test cases to be prepared, where the error rate was increased individually for each MCU in increments of 5%. Starting with only one MCU generating errors, and then adding the others one-by-one until all four were generating errors. The task chosen for the MCUs was to calculate the first one hundred prime numbers. Each test case was repeated a hundred times to reduce the PRNG influence over the test result. The different test cases and results are presented and analyzed. The architecture proposed proved itself fully functional, allowing the system to detect and correct most of the errors injected by the MCUs.","PeriodicalId":219988,"journal":{"name":"2022 IEEE Aerospace Conference (AERO)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Error Correction System Based on COTS Microcontrollers Working in Redundancy\",\"authors\":\"R. M. França, Érico L. Marques, Tiago S. Da Silva, V. Parro\",\"doi\":\"10.1109/AERO53065.2022.9843682\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the implementation, testing and analysis of an error detection and correction architecture for critical systems based on majority voting of several commercial off-the-shelf (COTS) microcontroller units (MCU). The architecture is based on at least three MCUs operating simultaneous and exchanging information by a controller area network (CAN) bus without the concept of master and slave. All MCUs must run the same software and maintain a Health Table with relevant information regarding each other. They can only differentiate themselves by a unique identifier set by hardware or by software at compile time. At any point during the code execution a vote involving all MCUs can be initiated to verify a result. The vote always uses only three of the system MCUs to try to reach a majority. This allows high flexibility regarding what units are used in the voting process. An error tracking system allows MCUs that are injecting too many errors to be isolated, so the system always uses only the most reliable units for the final vote. Once a majority is reached, all units can compare their own results to it and correct themselves if an error was detected. A real implementation of this architecture was created using four ARM-M4 MCUs for testing and use in an academic CubeSat. A flexible software implementation is presented and allows an error verification to be executed as many times as needed and in any number of variables. For different votes with different variables to be completely independent of each other, each votable variable needs to have its own Health Table. By creating a flexible Health Table structure in software, it's possible to add or remove votable variables easily, and to use the exact same voting function for any of them. The messages exchanged by CAN bus contains information regarding the sender, destination, message type and variable to vote, allowing the same message structure to be re-used. Combined with several internal security mechanisms, the system can detect faulty messages and keep operating even if the CAN bus itself corrupts the messages. To test the architecture and its implementation, an error generator was created using a pseudorandom number generator (PRNG). Each MCU can generate corrupted results for the vote process according to an individual user-defined probability. This allowed several test cases to be prepared, where the error rate was increased individually for each MCU in increments of 5%. Starting with only one MCU generating errors, and then adding the others one-by-one until all four were generating errors. The task chosen for the MCUs was to calculate the first one hundred prime numbers. Each test case was repeated a hundred times to reduce the PRNG influence over the test result. The different test cases and results are presented and analyzed. The architecture proposed proved itself fully functional, allowing the system to detect and correct most of the errors injected by the MCUs.\",\"PeriodicalId\":219988,\"journal\":{\"name\":\"2022 IEEE Aerospace Conference (AERO)\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Aerospace Conference (AERO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AERO53065.2022.9843682\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Aerospace Conference (AERO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AERO53065.2022.9843682","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文提出了一种基于几个商用现货(COTS)微控制器(MCU)的多数投票的关键系统错误检测和纠正体系结构的实现、测试和分析。该架构基于至少三个mcu同时工作,并通过控制器局域网(CAN)总线交换信息,没有主从概念。所有mcu必须运行相同的软件,并维护包含彼此相关信息的健康表。它们只能通过在编译时由硬件或软件设置的唯一标识符来区分自己。在代码执行期间的任何时候,都可以启动涉及所有mcu的投票来验证结果。投票总是只使用三个系统mcu来达到多数。这使得在投票过程中使用的单位具有很高的灵活性。错误跟踪系统允许将注入过多错误的mcu隔离,因此系统始终只使用最可靠的单元进行最终投票。一旦达到多数,所有单位都可以将自己的结果与之比较,如果检测到错误,则可以自行纠正。该架构的实际实现是使用四个ARM-M4 mcu进行测试并在学术立方体卫星上使用。提供了一个灵活的软件实现,并允许根据需要在任意数量的变量中执行多次错误验证。为了使具有不同变量的不同投票彼此完全独立,每个投票变量都需要有自己的Health Table。通过在软件中创建灵活的Health Table结构,可以轻松地添加或删除可投票变量,并对其中任何一个使用完全相同的投票功能。CAN总线交换的消息包含有关发送方、目的地、消息类型和要投票的变量的信息,从而允许重用相同的消息结构。结合多种内部安全机制,即使can总线本身损坏了消息,系统也可以检测到错误消息并保持运行。为了测试体系结构及其实现,使用伪随机数生成器(PRNG)创建了一个错误生成器。每个MCU可以根据用户自定义的概率为投票过程生成损坏结果。这允许准备几个测试用例,其中错误率以5%的增量为每个MCU单独增加。从只有一个MCU产生错误开始,然后一个接一个地添加其他MCU,直到所有四个都产生错误。mcu的任务是计算前100个素数。每个测试用例重复100次,以减少PRNG对测试结果的影响。给出并分析了不同的测试用例和结果。所提出的架构证明了其功能齐全,允许系统检测和纠正由mcu注入的大多数错误。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Error Correction System Based on COTS Microcontrollers Working in Redundancy
This paper presents the implementation, testing and analysis of an error detection and correction architecture for critical systems based on majority voting of several commercial off-the-shelf (COTS) microcontroller units (MCU). The architecture is based on at least three MCUs operating simultaneous and exchanging information by a controller area network (CAN) bus without the concept of master and slave. All MCUs must run the same software and maintain a Health Table with relevant information regarding each other. They can only differentiate themselves by a unique identifier set by hardware or by software at compile time. At any point during the code execution a vote involving all MCUs can be initiated to verify a result. The vote always uses only three of the system MCUs to try to reach a majority. This allows high flexibility regarding what units are used in the voting process. An error tracking system allows MCUs that are injecting too many errors to be isolated, so the system always uses only the most reliable units for the final vote. Once a majority is reached, all units can compare their own results to it and correct themselves if an error was detected. A real implementation of this architecture was created using four ARM-M4 MCUs for testing and use in an academic CubeSat. A flexible software implementation is presented and allows an error verification to be executed as many times as needed and in any number of variables. For different votes with different variables to be completely independent of each other, each votable variable needs to have its own Health Table. By creating a flexible Health Table structure in software, it's possible to add or remove votable variables easily, and to use the exact same voting function for any of them. The messages exchanged by CAN bus contains information regarding the sender, destination, message type and variable to vote, allowing the same message structure to be re-used. Combined with several internal security mechanisms, the system can detect faulty messages and keep operating even if the CAN bus itself corrupts the messages. To test the architecture and its implementation, an error generator was created using a pseudorandom number generator (PRNG). Each MCU can generate corrupted results for the vote process according to an individual user-defined probability. This allowed several test cases to be prepared, where the error rate was increased individually for each MCU in increments of 5%. Starting with only one MCU generating errors, and then adding the others one-by-one until all four were generating errors. The task chosen for the MCUs was to calculate the first one hundred prime numbers. Each test case was repeated a hundred times to reduce the PRNG influence over the test result. The different test cases and results are presented and analyzed. The architecture proposed proved itself fully functional, allowing the system to detect and correct most of the errors injected by the MCUs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信