UCR



CS 223 Reconfigurable Computing (Spring 2014)


Soren Kierkegaard Statue                                                                                         Soren Kierkegaard Tomb

CS.223 Reconfigurable Computing

Spring 2014, Basic Information

Lecture:             Tues./Thurs.      2:10 pm -3:30 pm     WCH 139

Instructor:         Philip Brisk (first_name@cs.ucr.edu)
Office Hours:    Mon.                  2:30 pm - 4:30 pm     WCH 339 (or 464)            

Final Exam:      In-class, during lecture (Week 10)

The course will use iLearn extensively!

Course Catalog Information

Catalog Entry:    CS 223 Computer Architecture and Embedded Systems (4 units)
Lecture:              3 hours
Written Work:     3 hours
Prerequisites:    CS 202 or CS 203A; consent of instructor

Covers reconfigurable computing, a novel computational model that is fast becoming part of the mainstream in high-performance computing. Addresses architectures, software tools and compilers, programming models, and applications. May be taken Satisfactory (S) or No Credit (NC) with consent of instructor and graduate advisor.

Academic Integrity

Important note: If any member of your group commits an act of academic dishonesty, all members of the group will receive a failing grade! It is your responsibility to know UCRs policy on academic dishonesty.

Student Expectations

Although it is your responsibility to know the details of the above, the instructor explicitly states here that using a single sentence from another source, without acknowledging that source, is plagiarism and will get you a failing grade. 

You are responsible for any announcements or changes the instructor makes in classes, even if you are not there; most announcements or changes will be made on iLearn.  

The instructor does not like to be distracted by cell phones ringing in class. If your cell phone or beeper goes off in class you will be fined one letter grade (first offense), or give you an F (second offense). You should also turn off your cell phone before visiting the instructor’s in his office. (Note: The instructor wilI make an exception to the above rules if you explain in advance why this stance creates a hardship for you).

Course Requirements and Grading

Students are expected to work in groups ranging from 2-4 students, depending on the scope of the project. Remember, you only have 10 weeks (plus the final exam week)! 

10%:     Midterm Exam
15%:     Final Exam
75%:     Course Project

Required Textbooks

None; there will be weekly reading assignments, weekly reading assignments, typically papers published by the IEEE, ACM, or AAAS.

Optional Textbook

Architecture and CAD for Deep Submicron FPGAs
Vaughn Betz, Jonathan Rose, Sandy Marquardt
Springer (1999)

Equipment

Intel has generously provided UCR with a generous equipment donation and financial support for this offering of CS 223. The equipment includes one Terasic DE2i-150 development kit per student. The DE2i-150 includes an Intel embedded N2600 processor with an Altera Cyclone IV GX FPGA, as well as all standard I/O and communication peripherals (USB, Ethernet, WiFi, VGA, etc.). The board comes with Yocto Linux preinstalled, and includes all necessary software from Altera to program the FPGA.

General Project Information

Descriptions of possible student projects can be found here.

●  UCR dictates roughly 3-4 hrs/week per unit. 4 units = 12-16 hrs per week. Expect to spend time on this graduate-level project. The project work begins right away. 

●  Each group will do an individual project      

●  The best group projects are often set up where each student in the group works on an individual project; all of the projects can be collectively assembled to form a larger fully functional system toward the end of the quarter.      

●  Group work is a requirement for the course. Depending on the nature of the project, the interaction within the group may be tightly or loosely coupled.      

●  All students are expected to participate in the design, implementation, and documentation of the project. As examples, the following divisions of labor are absolutely unacceptable!        

             ●  one student wrote code, the other debugged the code         
             ●  one student did the implementation work; the other the documentation 

●  Appropriate assistance of team members (as well as other classmates) is allowed, including providing advice/tips, helping debug, testing another student's design, and     cooperatively learning to use new parts/tools.  

●  All code in a group’s project must be written by members of that group. Any code used that is not written by the student must be approved by the instructor or TA and clearly cited in the code. 

●  Students may choose from predefined projects or define their own similar project with teacher approval.  

●  Students will submit successively more complete project versions during the quarter. Each version must function (within reason). Monolithic projects that produce no observable output until the very end of the course period are discouraged.  

●  Each version will include documentation, presentation, test information, engineering logs, source code/schematics, demonstration, and report. The final report must include all intermediate reports as distinct chapters to demonstrate progression of the project throughout the quarter. 

●  The final version of the project includes an oral presentation using PowerPoint (or equivalent) slides. All students in each project team are required to participate.      

●  Students have different levels of comfort with public speaking; students will not be graded on this factor. That being said, professional well-rehearsed presentations with high quality meaningful slides are expected. The talk should last plus or minus 1 minute of the allotted time. 

●  Under most circumstances, all students in a group will receive the same grade for the project; however, the instructor reserves the right to give different grades to students in the same group if he feels that it is warranted.

Technical Project Information

Introductory Files
DE2i-150 Overview (Slides)
DE2i-150 Manual
PCI Express Basics
VHDL Introduction (State Machines)

Altera Quartus II and Qsys Documentation
Quartus II Introduction (VHDL)
Quartus II Introduction (Verilog)
Qsys Introduction
Making Qsys Components

External Content
Very useful webpage from Virginia Tech., including a PCIExpress communication link example and an AES co-processor. Make sure to check out the Virginia Tech. ECE 4530 Hardware Software Codesign Challenge Webpage.

If you are interested in more details about the Hardware Software Codesign Challenge, and how it fits into embedded systems architecture education, take a look at the following WESE 2008 paper 

If you want to implement your own DMA drivers (not for the feint of heart), the following webpage and source code may be useful.

A useful link to a webpage that shows how to get the SDCard slot up and running with an Altera Nios II soft processor.

Advice from a student with DE2i-150 experience:
"... And if somebody gets annoyed at the buzzer, the resistor can be changed out to something that doesn't sound like a [redacted].  However, the resistor number listed on the schematic is incorrect and the target resistor can easily be found by following the board traces."

Instructions to Install Quartus II on Ubunto 12.04
1. Download the latest version of Quartus II Web-edition (FREE) from the following site:
http://dl.altera.com/13.1/?edition=subscription&platform=linux&download_manager=direct
2. Extract the files into a temporary folder
3. look for setup.sh in the temporary folder
4. type: chmod +x setup.sh
5. if are running UBUNTU on a 64-bit system, and MOST  LIKELY YOU ARE, you must install 32-bit compatibility libraries before installing Quartus II.
6. then type ./setup.sh and follow the directions

Specific Information for Rev. C Boards
The Virginia Tech. demos apparently do not work exactly with the Rev. C boards (presumably they were designed for Rev. A boards). The CD that came with the Rev. C boards includes a demo program that basically does what the hellopci program does, but in a manner that is compatible with the Rev. C boards. The program is located in /Demonstrations/FPGA/PCIE_Fundamental/ and has a sof file for the fpga and source code and makefile (that can compile on the board). It also has all the fpga project files so it can be used as a base for other programs. There is also a fpga driver in /Demonstrations/PCIe_SW_KIT/linux/PCIe_DriverInstall that I loaded on the board before running the program.

Course Schedule

UC Riverside is based on a 10 week quarter system (+1 week of finals). This course has been designed for a 10 week implementation. Instructors who teach a 14-16 week semester are encouraged to contact Dr. Brisk for suggestions regarding additional course material that goes beyond what is scheduled here.

Students are expected to read all of the papers listed here, in order to participate in an informed discussion in the class. As this is a senior design project, there will be no quizzes or exams; however, students must establish active participation in the course, in addition to simply doing their projects.

 

Week 1: Course Introduction

Objective: Introduce FPGAs to the students. Provide information about course projects and requirements and the DE2i-150 development boards. Light coverage of VHDL programming. Introduce the FPGA Stratix I-V FPGA Family

Lecture Slides
01-Introduction to FPGAs
02-Altera Stratix Family

Papers
J. Serrano: Introduction to FPGAs. CERN-2008-003: 17 pages

David M. Lewis, Vaughn Betz, David Jefferson, Andy Lee, Christopher Lane, Paul Leventis, Sandy Marquardt, Cameron McClintock, Bruce Pedersen, Giles Powell, Srinivas Reddy, Chris Wysocki, Richard Cliff, Jonathan Rose: The StratixTM routing and logic architecture. FPGA 2003: 12-20

Michael Hutton, Jay Schleicher, David M. Lewis, Bruce Pedersen, Richard Yuan, Sinan Kaptanoglu, Gregg Baeckler, Boris Ratchev, Ketan Padalia, Mark Bourgeault, Andy Lee, Henry Kim, Rahul Saini: Improving FPGA Performance and Area Using an Adaptive Logic Module. FPL 2004: 135-144

David M. Lewis, Elias Ahmed, Gregg Baeckler, Vaughn Betz, Mark Bourgeault, David Cashman, David R. Galloway, Mike Hutton, Christopher Lane, Andy Lee, Paul Leventis, Sandy Marquardt, Cameron McClintock, Ketan Padalia, Bruce Pedersen, Giles Powell, Boris Ratchev, Srinivas Reddy, Jay Schleicher, Kevin Stevens, Richard Yuan, Richard Cliff, Jonathan Rose: The Stratix II logic and routing architecture. FPGA 2005: 14-20

David M. Lewis, Elias Ahmed, David Cashman, Tim Vanderhoek, Christopher Lane, Andy Lee, Philip Pan: Architectural enhancements in Stratix-IIITM and Stratix-IVTM. FPGA 2009: 33-42

David M. Lewis, David Cashman, Mark Chan, Jeffrey Chromczak, Gary Lai, Andy Lee, Tim Vanderhoek, Haiming Yu: Architectural enhancements in Stratix VTM. FPGA 2013: 147-156

 
Week 2: FPGA Routing Architecture

Objective: Introduce the core concepts of FPGA global routing architectures. Cover different switch box topologies and discuss their tradeoffs

Lecture Slides
03-Global Routing Architecture
04-Switch Box Design

Papers
Vaughn Betz, Jonathan Rose: Effect of the prefabricated routing track distribution on FPGA area-efficiency. IEEE Trans. VLSI Syst. 6(3): 445-456 (1998)

Vaughn Betz, Jonathan Rose: FPGA Routing Architecture: Segmentation and Buffering to Optimize Speed and Density. FPGA 1999: 59-68

Charles Chiasson, Vaughn Betz: Should FPGAS abandon the pass-gate? FPL 2013: 1-8

Guy Lemieux, Edmund Lee, Marvin Tom, Anthony J. Yu: Directional and single-driver wires in FPGA interconnect. FPT 2004: 41-48

Vaughn Betz, Jonathan Rose: Automatic generation of FPGA routing architectures from high-level descriptions. FPGA 2000: 175-184

Jonathan Rose, Stephen Brown: Flexibility of interconnection structures for field programmable gate arrays. IEEE Journal of Solid-State Circuits 26(3): 277-282 (1991)

Yao-Wen Chang, D. F. Wong, C. K. Wong: Universal switch modules for FPGA design. ACM Trans. Design Autom. Electr. Syst. 1(1): 80-101 (1996)

Steven Wilton: Architectures and Algorithms for Field-Programmable Gate Arrays with Embedded Memory. Ph.D. Thesis, University of Toronto (1997) (Section 6.1.2 only)

M. Imran Masud: FPGA Routing Structures: A Novel Switch Block and Depopulated Interconnect Matrix Architectures. M.S. Thesis, University of British Columbia (1998)

 
Week 3: FPGA Logic Block Architecture

Objective: Introduce and evaluate tradeoffs in FPGA logic block design, with special emphasis on the intra-cluster routing crossbar

Lecture Slides
05-Logic Cluster Design
06-Intra-cluster Routing Crossbar Design

Papers
Vaughn Betz, Jonathan Rose: How Much Logic Should Go in an FPGA Logic Block? IEEE Design & Test of Computers 15(1): 10-15 (1998)

Elias Ahmed, Jonathan Rose: The effect of LUT and cluster size on deep-submicron FPGA performance and density. IEEE Trans. VLSI Syst. 12(3): 288-298 (2004)

Charles Clos: A study of non-blocking switch networks. Bell System Technical Journal 32(2): 406-424 (1953)

Guy G. Lemieux, Paul Leventis, David M. Lewis: Generating highly-routable sparse crossbars for PLDs. FPGA 2000: 155-164

Guy G. Lemieux, David M. Lewis: Using sparse crossbars within LUT. FPGA 2001: 59-68

Wenyi Feng, Sinan Kaptanoglu: Designing Efficient Input Interconnect Blocks for LUT Clusters Using Counting and Entropy. TRETS 1(1) (2008)

 

Week 4: FPGA CAD I: Technology Mapping, Packing, and Placement

Objective: Introduce the technology mapping, packing, and placement stages of an FPGA CAD flow.

Lecture Slides
07-FPGA Technology Mapping
08-Packing and Placement

Papers
Jason Cong, Yuzheng Ding: FlowMap: an optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs. IEEE Trans. on CAD of Integrated Circuits and Systems 13(1): 1-12 (1994)

Stephen Jang, Billy Chan, Kevin Chung, Alan Mishchenko: WireMap: FPGA Technology Mapping for Improved Routability and Enhanced LUT Merging. TRETS 2(2) (2009)

Jason Luu, Jason Helge Anderson, Jonathan Rose: Architecture description and packing for logic blocks with hierarchy, modes and complex interconnect. FPGA 2011: 227-236

Jason Luu, Jonathan Rose, Jason Helge Anderson: Towards interconnect-adaptive packing for FPGAs. FPGA 2014: 21-30

Alexander Marquardt, Vaughn Betz, Jonathan Rose: Timing-driven placement for FPGAs. FPGA 2000: 203-213

Kristofer Vorwerk, Andrew A. Kennings, Jonathan W. Greene: Improving Simulated Annealing-Based FPGA Placement With Directed Moves. IEEE Trans. on CAD of Integrated Circuits and Systems 28(2): 179-192 (2009)

Mingjie Lin, John Wawrzynek: Improving FPGA Placement With Dynamically Adaptive Stochastic Tunneling. IEEE Trans. on CAD of Integrated Circuits and Systems 29(12): 1858-1869 (2010)

 

Week 5: FPGA CAD II: Routing

Objective: Introduce FPGA routing algorithms and discuss their implementation

Lecture Slides
09-FPGA Routing

Papers
Larry McMurchie, Carl Ebeling: PathFinder: A Negotiation-based Performance-driven Router for FPGAs. FPGA 1995: 111-117

Russell Tessier: Negotiated A* Routing for FPGAs. FPD 1998

Jordan S. Swartz, Vaughn Betz, Jonathan Rose: A Fast Routability-Driven Router for FPGAs. FPGA 1998: 140-149

Xun Chen, Jianwen Zhu, Minxuan Zhang: Timing-Driven Routing of High Fanout Nets. FPL 2011: 423-428

Yehdhih Ould Mohammed Moctar, Guy G. F. Lemieux, Philip Brisk: Routing algorithms for FPGAS with sparse intra-cluster routing crossbars. FPL 2012: 91-98

Optional
Vaughn Betz, Alexander Marquardt, and Jonathan Rose: Architecture and CAD For Deep Submicron FPGAs. Springer (1998)
♦ Implementation details on VPR's routability-driven and timing-driven routers, which have not been published elsewhere

 
Week 6: Midterm Review and Midterm

Objective: Review and assess core lecture material from the first half of the course

 
Week 7: FPGA Applications

Objective: Introduce and discuss novel and innovative applications that use FPGAs for acceleration

Lecture Slides
10-FPGA Applications

Papers
Gordon J. Brebner, Weirong Jiang: High-Speed Packet Processing using Reconfigurable Computing. IEEE Micro 34(1): 8-18, 2014

Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Bernard Brezzo, Sameh W. Asaad, Donna Dillenberger: Database Analytics: A Reconfigurable-Computing Approach. IEEE Micro 34(1): 19-29, 2014

Haohuan Fu, Lin Gan, Robert G. Clapp, Huabin Ruan, Oliver Pell, Oskar Mencer, Michael J. Flynn, Xiaomeng Huang, Guangwen Yang: Scaling Reverse Time Migration Performance through Reconfigurable Dataflow Engines. IEEE Micro 34(1): 30-40, 2014

James Coole, Greg Stitt: Fast, Flexible High-Level Synthesis from OpenCL using Reconfiguration Contexts. IEEE Micro 34(1): 42-53, 2014

Andreas Agne, Markus Happe, Ariane Keller, Enno Lübbers, Bernhard Plattner, Marco Platzner, Christian Plessl: ReconOS: An Operating System Approach for Reconfigurable Computing. IEEE Micro 34(1): 60-71, 2014

 

Week 8: FPGA Memories and I/O

Objective: Introduce optimizations to support multi-ported memories on FPGAs, on-chip memory models, I/O interfaces, and on-chip communication

Lecture Slides
(See below; links to author webpages)

Papers
Andrew Putnam, Susan J. Eggers, Dave Bennett, Eric Dellinger, Jeff Mason, Henry Styles, Prasanna Sundararajan, Ralph Wittig: Performance and power of cache-based reconfigurable computing. ISCA 2009: 395-405

Charles Eric LaForest, J. Gregory Steffan: Efficient multi-ported memories for FPGAs. FPGA 2010: 41-50 (Slides)

Michael Adler, Kermin Fleming, Angshuman Parashar, Michael Pellauer, Joel S. Emer: Leap scratchpads: automatic memory and cache management for reconfigurable logic. FPGA 2011: 25-28 (Project webpage)

Eric S. Chung, James C. Hoe, Ken Mai: CoRAM: an in-fabric memory architecture for FPGA-based computing. FPGA 2011: 97-106 (Project webpage; Slides)

Michael Papamichael, James C. Hoe: CONNECT: re-examining conventional wisdom for designing nocs in the context of FPGAs. FPGA 2012: 37-46  (Slides)

Charles Eric LaForest, Ming G. Liu, Emma Rae Rapati, J. Gregory Steffan: Multi-ported memories for FPGAs via XOR. FPGA 2012: 209-218 (Slides)

Udit Dhawan, André DeHon: Area-efficient near-associative memories on FPGAs. FPGA 2013: 191-200

Ameer Abdelhadi, Guy G. F. Lemieux: Modular multi-ported SRAM-based memories. FPGA 2014: 35-44 (Slides)

Michael Papamichael, James C. Hoe: CONNECT: re-examining conventional wisdom for designing nocs in the context of FPGAs. FPGA 2012: 37-46

 

Week 9: Soft Vector Processors and Reconfigurable Processors on FPGAs

Objective: Introduce the concepts of high-performance processor design for FPGAs, focusing on novel use of on-chip resources (for vector processing) and dynamic reconfiguration

Lecture Slides
(See below; links to author webpages)

Papers
Peter Yiannacouras, J. Gregory Steffan, Jonathan Rose: Portable, Flexible, and Scalable Soft Vector Processors. IEEE Trans. VLSI Syst. 20(8): 1429-1442 (2012) (Slides; more Slides; and more Slides)

Christopher Han-Yu Chou, Aaron Severance, Alex D. Brant, Zhiduo Liu, Saurabh Sant, Guy G. Lemieux: VEGAS: soft vector processor with scratchpad memory. FPGA 2011: 15-24 (Slides)

Aaron Severance, Guy Lemieux: VENICE: A compact vector processor for FPGA applications. FPT 2012: 261-268 (Slides)

Zhiduo Liu, Aaron Severance, Satnam Singh, Guy G. F. Lemieux: Accelerator compiler for the VENICE vector processor. FPGA 2012: 229-232 (Slides)

Aaron Severance, Guy G. F. Lemieux: Embedded supercomputing in FPGAs with the VectorBlox MXP Matrix Processor. CODES+ISSS 2013: 1-10 (Slides)

Aaron Severance, Joe Edwards, Hossein Omidian, Guy Lemieux: Soft vector processors with streaming pipelines. FPGA 2014: 117-126 (Slides)

Lars Bauer, Muhammad Shafique, Simon Kramer, Jörg Henkel: RISPP: Rotating Instruction Set Processing Platform. DAC 2007: 791-796 (Slides)

Lars Bauer, Muhammad Shafique, Dirk Teufel, Jörg Henkel: A Self-Adaptive Extensible Embedded Processor. SASO 2007: 344-350 (Slides)

 

Week 10: Reconfigurable Computing Alternatives / Course Review

Objective: Introduce reconfigurable computing architectures and programming models that go beyond FPGAs. Cover the history of reconfigurable computing. Review course material from the second half of the course.

Lecture Slides
(See below; links to author webpages)

Papers
Gerald Estrin: Reconfigurable Computer Origins: The UCLA Fixed-Plus-Variable (F+V) Structure Computer. IEEE Annals of the History of Computing 24(4): 3-9 (2002)

André DeHon and Ethan Mirsky: MATRIX: A Reconfigurable Computing Device with Configurable Instruction Distribution and Deployable Resources. HotChips 1997. (Slides; Extended Slides)

Doug Burger, Stephen W. Keckler, Kathryn S. McKinley, Michael Dahlin, Lizy Kurian John, Calvin Lin, Charles R. Moore, James H. Burrill, Robert G. McDonald, William Yode: Scaling to the End of Silicon with EDGE Architectures. IEEE Computer 37(7): 44-55 (2004) (Trips Project Webpage; Slides)

David Grant, Chris Wang, Guy G. Lemieux: A CAD framework for Malibu: an FPGA with time-multiplexed coarse-grained elements. FPGA 2011: 123-132 (Slides)

 

 

 


More Information

General Campus Information

University of California, Riverside
900 University Ave.
Riverside, CA 92521
Tel: (951) 827-1012

Career OpportunitiesUCR Libraries
Campus StatusDirections to UCR

College Information

Bourns College of Engineering
Bourns Hall

Tel: (951) 827-5190
Fax: (951) 827-3188
E-mail: systems@cs.ucr.edu

Related Links

Footer