1
PhD Student: Carlos Arthur Lang Lisbôa Advisor: Luigi Carro VLSI-SoC 2007 - PhD Forum Low overhead system level approaches to deal with multiple and long duration transient faults in future technologies INFORMÁTICA Universidade Federal do Rio Grande do Sul - UFRGS Universidade Federal do Rio Grande do Sul - UFRGS Instituto de Informática, Pós-Graduação em Ciência da Computação Grupo de Microeletrônica (GME) - Laboratório de Sistemas Embarcados (LSE) http://www.inf.ufrgs.br/gme, http://www.inf.ufrgs.br/~lse Porto Alegre - RS BRAZIL Phone +55 51 33086165 e-mail [email protected] r [email protected] CMOS technologies beyond the 45 nm node will present devices that will be subject to radiation induced transients lasting longer than the predicted clock cycle of circuits. In this scenario, techniques based on temporal redundancy will no longer succeed, while those based on spatial redundancy will still imply high overheads. Therefore, innovative low cost techniques, working at system or algorithm level, will be required to cope with this type of faults. Prediction of Long Duration Transient (LDTs) [1] Vertical bars show predicted transient widths for 20 Mev-cm 2 /mg Lines show predicted cycle times for different inverter chains Duration of transients extracted from [2] and [3] Even low energy particles may cause long duration transients Why temporal redundancy schemes, such as [4, 5], will no longer succeed ? check the outputs twice samples separated by a delay delay must be longer than expected transient width long transients imply heavy penalties [1] Lisboa, C. A., and Carro, L. “System Level Approaches for Mitigation of Long Duration Transient Faults in Future Technologies”, Proc. of 12 th European Test Symposium – ETS 2007. [2] Dodd, P. E., et al., “Production and propagation of Single-Event Transients in High-Speed Digital Logic ICs”, IEEE Tr. on Nuc. Science, Vol 51, No 6, Part 2, IEEE Comp. Soc., Los Alamitos, CA, Dec. 2004. [3] Statistical Analysis of the Charge Collected in SOI and Bulk Devices Under Heavy Ion and Proton Irradiation - Implications for Digital SETs, Ferlet-Cavrois et al, IEEE Tr. on Nuc. Sci., Vol. 53 No. 6, Nov. 2006. [4] Anghel, L. and Nicolaidis, M., “Cost Reduction and Evaluation of a Temporary Faults Detection Technique”, in Proc. of Design, Automation and Test in Europe Conference (DATE 2000), ACM Press, New York, NY, USA, March, 2000. [5] S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, "Robust system design with built-in soft-error resilience", Computer, Vol. 38, No 2, 2005. A case study: low overhead error detection in matrix multiplication [1] (a) Hardware implementation (sequential circuit, 1120 lines of VHDL code, parameterized by n) (b) Software implementation Calculate: Vector Cr, where Cr i = C i1 + C i2 + ... + C in (1) Vector Br, where Br i = B i1 + B i2 + ... + B in (2) Vector ABr, where ABr i = n k=1 A ik . Br k (3) If ABr Cr, there was an error Future work Low cost recomputation techniques for matrix multiplication upon error detection Use of similar approaches to harden other frequently used algorithms for embedded systems Validation of the proposed techniques through application to harden a complete SoC M atrix w idth ( n) 3 8 16 32 M ultiplication 45 960 7,936 64,512 Verification 27 232 976 4,000 O verhead (% ) 60% 24% 12% 6% Time comparison - software (number of * and + operations) M atrix w idth ( n) 3 8 16 32 M ultiplication 1.94 25.94 184.98 1,350 Verification 0.50 2.90 10.90 40 O verhead (% ) 25.8% 11.2% 5.9% 3.0% Time comparison in s (*)

PhD Student: Carlos Arthur Lang Lisbôa Advisor: Luigi Carro VLSI-SoC 2007 - PhD Forum

  • Upload
    olin

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

INFORMÁTICA. Low overhead system level approaches to deal with multiple and long duration transient faults in future technologies. PhD Student: Carlos Arthur Lang Lisbôa Advisor: Luigi Carro VLSI-SoC 2007 - PhD Forum. - PowerPoint PPT Presentation

Citation preview

Page 1: PhD Student:  Carlos  Arthur Lang Lisbôa           Advisor:  Luigi Carro VLSI-SoC 2007 - PhD Forum

PhD Student: Carlos Arthur Lang Lisbôa Advisor: Luigi Carro

VLSI-SoC 2007 - PhD Forum

Low overhead system level approaches to deal with multipleand long duration transient faults in future technologies

INFORMÁTICA

Universidade Federal do Rio Grande do Sul - UFRGSUniversidade Federal do Rio Grande do Sul - UFRGSInstituto de Informática, Pós-Graduação em Ciência da Computação

Grupo de Microeletrônica (GME) - Laboratório de Sistemas Embarcados (LSE)http://www.inf.ufrgs.br/gme, http://www.inf.ufrgs.br/~lse

Porto Alegre - RSBRAZIL

Phone +55 51 33086165

[email protected]

[email protected]

CMOS technologies beyond the 45 nm node will present devices that will be subject to radiation induced transients lasting longer than the predicted clock cycle of circuits. In this scenario,

techniques based on temporal redundancy will no longer succeed, while those based on spatial redundancy will still imply high overheads. Therefore, innovative low cost techniques,working at system or algorithm level, will be required to cope with this type of faults.

Prediction of Long Duration Transient (LDTs) [1]

• Vertical bars show predicted transient widths for 20 Mev-cm2/mg

• Lines show predicted cycle times for different inverter chains

• Duration of transients extracted from [2] and [3]

• Even low energy particles may cause long duration transients

Why temporal redundancy schemes, such as [4, 5], will no longer succeed ?• check the outputs twice

• samples separated by a delay

• delay must be longer than expected transient width

• long transients imply heavy penalties

[1] Lisboa, C. A., and Carro, L. “System Level Approaches for Mitigation of Long Duration Transient Faults in Future Technologies”, Proc. of 12 th European Test Symposium – ETS 2007.

[2] Dodd, P. E., et al., “Production and propagation of Single-Event Transients in High-Speed Digital Logic ICs”, IEEE Tr. on Nuc. Science, Vol 51, No 6, Part 2, IEEE Comp. Soc., Los Alamitos, CA, Dec. 2004.

[3] Statistical Analysis of the Charge Collected in SOI and Bulk Devices Under Heavy Ion and Proton Irradiation - Implications for Digital SETs, Ferlet-Cavrois et al, IEEE Tr. on Nuc. Sci., Vol. 53 No. 6, Nov. 2006.

[4] Anghel, L. and Nicolaidis, M., “Cost Reduction and Evaluation of a Temporary Faults Detection Technique”, in Proc. of Design, Automation and Test in Europe Conference (DATE 2000), ACM Press, New York, NY, USA, March, 2000.

[5] S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, "Robust system design with built-in soft-error resilience", Computer, Vol. 38, No 2, 2005.

A case study: low overhead error detection in matrix multiplication [1]

(a) Hardware implementation(sequential circuit, 1120 lines of VHDL code, parameterized by n)

(b) Software implementation

Calculate:

Vector Cr, where Cri = Ci1 + Ci2 + ... + Cin (1)

Vector Br, where Bri = Bi1 + Bi2 + ... + Bin (2)

Vector ABr, where ABri = nk=1 Aik . Brk (3)

If ABr Cr, there was an error

Future work

• Low cost recomputation techniques for matrix multiplication upon error detection

• Use of similar approaches to harden other frequently used algorithms for embedded systems

• Validation of the proposed techniques through application to harden a complete SoC

Matrix width (n) 3 8 16 32

Multiplication 45 960 7,936 64,512

Verification 27 232 976 4,000

Overhead (%) 60% 24% 12% 6%

Time comparison - software(number of * and + operations)

Matrix width (n) 3 8 16 32

Multiplication 1.94 25.94 184.98 1,350

Verification 0.50 2.90 10.90 40

Overhead (%) 25.8% 11.2% 5.9% 3.0%

Time comparison in s(*)