Path: EDN Asia >> Design Centre >> Computing/Peripherals >> Exploring Programmable Real-Time Unit Sub-System for connectivity solutions
Computing/Peripherals Share print

Exploring Programmable Real-Time Unit Sub-System for connectivity solutions

30 Mar 2012  | Sachidananda Karanth

Share this page with your friends

The number of peripherals that can be embedded into an SoC is limited by space and cost considerations. Having all types of peripheral controllers increases the cost and complexity. That makes inclusion of the peripheral controllers a real big challenge in the SoC design. The programmable core along with general purpose IO pins allow customised peripheral controllers. These programmable cores, not only allow system integrators with option to implement the peripheral controllers, but also implement proprietary business logic to improve the system performance.

This write-up illustrates one such implementation using the Programmable Real-time Sub-System (PRUSS) in AM18xx series of SoCs from Texas Instrument.

The AM18xx series of SoCs from Texas Instrument includes programmable real-time core called PRUSS. This sub-system consists of two completely independent PRU with independent Instruction and Data RAM. Both of these PRU units have access to the internal switch central resources to access the SoC peripherals (including the system memory). There exists an interrupt controller for PRUSS that is capable of routing up to 64 different events to the PRU and updating the internal registers (PRU does not support any interrupts vector table though). The two together form an efficient inter-processor communication mechanism.

Each PRU is

 32bit Load/Store RISC architecture

 4K Byte Instruction RAM per core

 512B Data RAM per core

 PRUSS can be disabled via software to save power

 Register 30 of each PRU is exported from the sub-system in addition to the normal R31 output of the PRU cores

 PRU intended operation is little endian similar to ARM and DSP processors.

This core along with the synchronous serial interface (McASP in AM18x) is used to implement multiple UART controllers. The remainder of this write-up provides details of this implementation, challenges faced during the implementation and the learning.

Architectural Design
The software implementation allows the following modes of operations:

1. Single PRU mode

2. Both PRU mode

While in single PRU mode, single PRU[0] is used for executing the Soft-UART emulation firmware and PRU[1] is unused. While using both PRU mode, PRU[0] handles the receive operations for all soft-UARTs and PRU[1] handles the transmit operation for all soft-UARTs.

The following functional block diagram depicts the single PRU mode of operation.

Click to enlarge

Figure 1: Functional block diagram of Single PRU mode

By having such programmable cores along with development tools, it allows the system integrator the flexibility of having custom interfaces, while the SoC designers focus on optimising the SoC's BOM cost. By having such programmable cores along with development tools, it allows the system integrator the flexibility of having custom interfaces, while the SoC designers focus on optimising the SoC's BOM cost.

The following functional block diagram depicts both PRU mode of operation of Soft-UART implementation.

Click to enlarge

Figure 2: Functional block diagram of Both PRU model

The McASP has 16 serialisers, which are equipped with a buffer and a shift register for transmission and receiving. We can configure each serialiser as either transmitter or receiver. Depending upon transmit and/or receive, UART can be FULL UART or HALF UART.

� Up to 4 FULL UART on single PRU, both TX and RX constitute FULL UART.

� Up to 8 Half UARTs on single PRU, either RX or TX constitute HALF UART

� Up to 8 Full Soft-UARTs on both PRU.

In the software implementation of the UART using the McASP, the data transfers will be configured in the Time Division Multiplexing (TDM) Mode. The UART format data will be transmitted or received during the time slots of the TDM frame. The timing requirements of the UART protocol will be achieved by dividing the internal clock of the McASP. This eliminates the requirement of using any of the on-chip timers or implementing a timer on the software side. One of the McASP serialisers will be configured to perform the transmit function while another McASP serialiser will be configured to act as the receiver section of the UART. The PRU would act as the controlling device, which formats the data and handles events.

Transmission Control flow diagram
This section illustrates the control flow for Transmit operation in single PRU mode.

Click to enlarge

Figure 3: Initialisation of Transmit & IDLE State

Click to enlarge

Figure 4 Transmission control flow in single PRU model

Click to enlarge

Figure 5: Transmission control flow in both PRU model

Reception Control flow diagram
This section illustrates the control flow for Receive operation in both Single PRU and Both PRU mode.

Click to enlarge

Figure 6: Receive operation initialisation control flow

Click to enlarge

Figure 7: Receive operation control flow in single PRU and both PRU model

Each PRU runs the firmware implementing the soft-UART using the McASP. In single PRU mode, the PRU[0] runs the firmware while in both PRU mode, both PRU[0] and PRU[1] execute the same firmware code.

The PRU firmware basically consists of a CORE_LOOP, two system (ARM/DSP) event handlers and two McASP event handlers. The PRU keeps spinning on the CORE_LOOP until a system event is generated by the McASP or ARM/DSP. The four handlers are:

TxServiceReqHndlr: This event is generated from the ARM/DSP to the PRU. The PRU on reception of this event copies the data from the shared RAM into the PRU RAM and processes it.

TxIntrHndlr: This event is generated from the McASP to the PRU. The PRU on reception of this event pre-scales and copies the data into the McASP transmit buffers. When transmission completes, the PRU interrupts the ARM/DSP back.

RxServiceReqHndlr: This event is generated from the ARM/DSP to the PRU. The PRU on reception of this event registers the ARM/DSP request for receiving data (maximum fifo size 16 chars).

RxIntrHndlr: This event is generated from the McASP to the PRU. The PRU on reception of this event processes the received data and copies it into shared RAM. When reception completes, the PRU interrupts the ARM/DSP back.

Click to enlarge

Figure 8: Main loop of PRU[0] in single PRU mode of operation

Click to enlarge

Figure 9: Flow chart of PRU[0] in Both PRU mode

Click to enlarge

Challenges faced and evolution
As with any other fresh development, this development also took place in stages. From a tiny minimal transmit and receive from a single PRU, the development team faced multiple challenges forcing the current design with some critical ones described below.

The development of firmware with emulating the single UART with features just sufficient to have the transmission and reception functional (it did not even support programming of baud rate). Since the transmit functionality is totally controlled by PRU and had all the data locally, this was no challenge. The receive functionality was where the challenge lay, as it is critical to maintain strict timing (else data may get lost). Identifying the falling edge on data line and sampling the mid-point was critical to the receive functionality.

Subsequently, the support for other features needed to be added, including baud rate programmability, transmit and receive buffers, interrupts, receive time-out, etc. This involved maintaining various counters and larger buffers for both transmit/receive and inter-processor communication mechanism (using shared memory and mail-box interrupts).

Extending support to multiinstancing of UARTs on PRU was a significant step ahead. Optimising the counters and event flags on per UART context was critical to supporting multiple UARTs (based on the limited RAM space availability in PRU). Even with separate context specific counters and event flags, there were issues of context variables getting corrupted un-intentionally. With coding done in assembly, it was not trivial to identify such corruption. There is a simple and efficient way to overcome this problem. All context variables were grouped together to enable caching into the CPU registers. This enables loading of the context variables from RAM to CPU registers specific to the context. The PRU code then makes modifications only to the CPU registers. On un-loading the context, the cached context variables are written back into the RAM. This gave a two-fold benefit: (a) Reduced probability of context specific variables being unintentionally corrupted, (b) Marginal speed improvement with minimised access to RAM for updating context variables.

Having been able to emulate multiple UARTs on a single PRU, the logical next step is to enable the second PRU. Enabling the second PRU for emulating additional UARTs was not as simple as running the same code with additional buffers. There were system resources being accessed in the code. With second PRU being alive, it also contends for the same set of system resources (event flags) leading to a race condition and thereby performance issues. Dividing the role/tasks among the two PRUs is the key to eliminating system resources dependencies. Each PRU is designated to carryout transfer in one direction, that is PRU[0] for receive and PRU[1] for transmit. In this model, all the system events were raised to the PRU[0]. It further passed the event to the PRU[1] for system events corresponding to the transmit functionality. Having partitioned the roles/tasks in this manner, this allowed utilising the complete bandwidth of both PRUs with no interdependencies and thereby getting maximum efficiency.

Having reached a stable state on the PRU implementation, the interrupt latencies from OS (Linux) now became an issue. The ARM took longer time to read the FIFO data leading to FIFO under-run. This was overcome by doubling the FIFO size and raising the interrupt to ARM on half full or half empty state and enabling the ping-pong buffering in PRU.

Troubleshooting and profiling
The only development tools available for this development were

 Assembler for generating binary

 Remote debugger with

 Step by step instruction execution with support for break-point

 PRU register view /modify capability

 PRU instruction cycle counters for active and stall cycles

The nature of problems encountered bring out innovations in using the remote debugger for troubleshooting. Some of them are discussed here below.

During the single UART emulation for receive functionality, there were issues receiving characters back to back (while using 1 stop bit). The cycle counter in the debugger information was used to troubleshoot this problem. On completing the receipt of character (including the stop bit), there were many counters and event flags being updated, in addition to transferring the received character to the shared RAM. The cycle count for accessing the shared RAM was a significant portion of the total time (and was varying). The trick here was to move the received character to shared RAM on receiving the last data bit (without waiting to receive the stop bit). The counters and flags (reflecting successful reception) were updated on receiving the proper stop bit though.

During the transmit operation, there were repeated transmission of characters. This was attributed to two reasons: (a) race condition between the ARM and PRU; (b) Counters getting corrupted for transmit context. The single stepping feature of the remote debugger enabled finding the counter corruption problem. However, identifying the race condition required carefully crafting the code to include making copies of counters at various points and having the break-point set to view the value of counter.

Timing differences between the transmitter and receive clock: Even though minor, any small differences in the clock can result in skew over a longer duration. While receiving large files, the received character started getting corrupted. The problem became worse while interfacing with different kind of transmitters (COM ports on PC, UART ports of target boards, USB to Serial converters from different makes, etc.). This was the most challenging problem to address as this problem manifests after large amount of data transfer. Multiple schemes were employed to identify the root-cause: (a) Modifying the code to execute a specific path while the expected character was not received. This was supplemented by sending a file containing 'U' (0x55). This pattern ensures the data line was toggling every data-bit (a perfect square wave with 50% duty-cycle). Code was added to drive external GPIO lines (to facilitate triggering the oscilloscope) and also to enable the software break-point using the debugger. (b) Additionally, a special code was added to capture all data received by the PRU (including sub-samples) into a different shared RAM to analyse the behaviour of the receive algorithm. Both these mechanisms together enabled us to identify the timing corrections that were required to re-synchronise the start condition.

Food for Thought
In the above article, some of the experiences and learnings of designing and development of a UART controller (used merely as an example here) on such programmable cores is illustrated. However, it is the concept of programmable peripherals that is the key. By having such programmable cores along with development tools, it allows the system integrator the flexibility of having custom interfaces, while the SoC designers focus on optimising the SoC's BOM cost. Similar PRU can be used to implement

1. Full UART controller with support for modem signals and other features

2. Inter IC (I2C) bus controller

3. Serial Peripheral Interface (SPI) bus controller

4. Profi-Bus

5. EtherCAT

6. CAN Bus

About the Author
Sachidananda Karanth is a Lead Architect at Mistral Solutions Pvt. Ltd. He received a Bachelor of Engineering in Instrumentation & Electronics from Bangalore University in 1999. He has over 12 years of experience at Mistral in firmware/software development for embedded systems, including defence systems, hand-held mobile handsets & industrial controllers. Sachi joined Mistral in 2000 and is currently contributing in architecture of embedded system design.

Mistral is a technology design and integration company providing end-to-end services for product development and deployment. Mistral's Product Engineering Services are delivered through a proven development process, designed for embedded product development. Mistral's hardware and software team works together in a seamless manner providing expert product designs covering board and FPGA Designs, BSP and Firmware developments, Embedded Application developments, integration of third party solutions, verification/validation, product prototyping, production coordination and product sustenance services.

Want to more of this to be delivered to you for FREE?

Subscribe to EDN Asia alerts and receive the latest design ideas and product news in your inbox.

Got to make sure you're not a robot. Please enter the code displayed on the right.

Time to activate your subscription - it's easy!

We have sent an activate request to your registerd e-email. Simply click on the link to activate your subscription.

We're doing this to protect your privacy and ensure you successfully receive your e-mail alerts.

Add New Comment
Visitor (To avoid code verification, simply login or register with us. It is fast and free!)
*Verify code:
Tech Impact

Regional Roundup
Control this smart glass with the blink of an eye
K-Glass 2 detects users' eye movements to point the cursor to recognise computer icons or objects in the Internet, and uses winks for commands. The researchers call this interface the "i-Mouse."

GlobalFoundries extends grants to Singapore students
ARM, Tencent Games team up to improve mobile gaming

News | Products | Design Features | Regional Roundup | Tech Impact