The Cell Processor Engine Architecture (CBEA) is a multiprocessor chip, with nine processor operating on a shared, coherent memory. The objective of this article is to provide an introfuction to the Cell BE architecture and the development tools provided by IBM. Also the article will describe how to setup a full functional test environment using the Sony PS3 hardware.
The main difference to conventional multicore CPU is that Cell isn't an omogeneous system with 9 identical processor, but is an eterogeneous system consting in one general purpouse “Power Pc Processor Element” (PPE), and 8 “Synergistic Processor Element” (SPE), specialized in vector operations. All these units are connected via an EIB (Element Interconnect Co-Processor) and comunicate with devices or other CPU's via the FlexIO interface.

PPE is a convencional 64 bit, dual-threaded microprocessor and generally runs the operating system and manage the SPEs.
Every SPE is a self contained vector processor which acts as an independent processor. They each contain 128 x 128 bit registers, there are also 4 (single precision) floating point units capable of 32 GigaFLOPS* and 4 Integer units capable of 32 GOPS (Billions of integer Operations per Second) at 4GHz. The SPEs also include a small 256 Kilobyte local store instead of a cache.
To start to write your first application for Ps3, you need to download and install "Ibm Sdk for Multicore Acceleration 3.0", a complete package of developed tool for Cell Processor. The Ibm Sdk is composed of runtime tools, developed tools, software libraries and frameworks, performance tools and the hardware simulator : Ibm full-system simulator. Unfortunately, the Sdk is available only for "Fedora" and "Red Hat " , so you have to get one of those two distribution before starting the installation. I advice strongly to choose Fedora because not all the tools are provided for Red Hat. In particular, the simulator is available only for Fedora. Despite Ibm ensures the proper functioning only until Fedora 7.0, i installed the Sdk without problem on Fedora 8.0.
You can download Sdk at official Cell web site , clicking on "developed package". The "extra package" provides add-on like "SPU timer library" , "Spu Timing Tool" and the simulator that can be applied to "developed package" at a later time.
To install the Sdk you can simply follow the Ibm Installation guide . If the installation is successful, you should find the Sdk in /opt/cell and /opt/ibm/systemsim directories. To start the simulator in a graphical way, simply issue PATH=/opt/ibm/systemsim-cell/bin:$PATH systemsim -g . When the simulator is started it creates a simulated machine containing a Cell Broadband Engine processor and display a GUI, to interact with simulated machine. To boot Linux on simulated machine, click on mode-lassy (to speed up boot time) and on go buttons.

To load and execute your application on simulated machine, simply issue :
callthru source path_of_executable_file > name_of_executable_file chmod +x name_of_executable_file ./name_of_executable_file
Ps3 is without doubt the easiest and cheapest way to getting hands on Cell Be processor. Currently there are many distribution optimized for Ps3 including Yellow Dog, Debian, Ubuntu, Fedora and Red Hat. All the distributions are able to use only 6 of 8 Synergistic Processor Elements, so when you write your Cell applications for Ps3 you have keep in mind this heavy limitation. I tried to install two different distribution on ps3 : Yellow Dog 5.0 and Ubuntu 8.0.
The only hardware you need for the installation are a keyboard, a mouse and an usb key.
The first time that you install Linux on a Ps3, you have to perform some starting operations :
Yellow Dog 5.0 is the first distribution developed for Ps3, but has many problems : runs slowly, the documentation is very poor and the installation of last Ibm libraries (libspe 2.0) is very difficult. After many attemptes to install libspe 2.0, i decide to try last version of Ubuntu.
Ubuntu 7.10 installation is very simple, but when you have to select the language, remember to not select Italian because in the DVD there aren't necessary packets and the installation can't reach 100%. I tried without results to enable internet connection to obtain the necessary packages from online repository, so i decided to change language.
First of all, since the SPEs are accessed through a virtual file system called spufs, add the following line to /etc/fstab:
none /spu spufs defaults 0 0
Afterwards do:
mkdir /spu mount /spu
Now, download and install cellsdk (using apt-get utilites) to be sure to have all sdk-libraries and enjoy yourself with your programs!
Searching on Google, you can find a lot of documents about Cell programming, but you have to pay attention to filter information about the last version of libreries : libspe 2.0 . In fact, many documents referred to libspe 1.X, which use is now deprecated by IBM.
IBM Cell web site provides interesting instructions to start developing applications for Cell/B.E. using Ibm SDK for multicore accelleration.
A Cell application is composed of at least 2 programs :
Applications running on Cell Be tipically execute a main thread on ppu, which creates subthreads that run on Spes. The ppu program manages subthreads using an object defined in libspe 2.0 : the context .
A Spe context is a logical representation of an SPE and holds all persistent information about a logical SPE .
The base structure of a ppe program is :
The last version of libraries allows to create more context then physical spe, but you have to consider the overhead associated with swapping out of the Spe context.
Now you should ask yourself : how can i write a program if i don't know the number of phsysical Spe? Am i sure that the next version of linux kernel will not allow to use 7 or 8 physical spe?
You can optmize your code using the function spe_cpu_info_get(SPE_COUNT_USABLE_SPES, -1) to compute at run-time the number of physical Spe.
The communication between Ppe and Spe is not so as simple as it should be. Since the spes can't access directly to main memory, we can't just use load/store instructions to exchange datas .
There are three communications mechanism bewteen Ppe and Spe's :
Mailboxes are queues of 32-bit for exchanging 32-bit messages. Every Spu has 1 ountboud mailbox and 4 inbound mailboxes. The ppu can send 4 messages to Spu before that Spu reads any. If mailbox is full and Ppu writes to it, some undefined actions happens.
Signaling is an alternative method to transfer 32 bit messages between Ppe and Spe's. There are only two, 32 bit signal buffers, so if you need 3 Spes to send your message, you are out of lucky!
DMA commands allow to transfer datas between Ppe and Spe's. DMA transfers must be multiple of 16 byte (up to a maximum of 16 kB) and aligned on a 16 byte boundary.
SPE programs consider main memory pointers as integers of 32 (or 64) bit, so you have to pay great attention to manage them correctly. For example consider following example, taken from a my simple application:
mfc_put function tranfers n (192) bit from local store (temp[(smem.valore*8)+7]) to main store (smem.p_num + (192*8*smem.valore)+(192*7)). smem.p_num is defined as long long int to memorize correctly 64 bit addresses but corresponds to a pointer to a bidimensional matrix. Obviously, we have to explicit the address using pointer arithmetic.
If you decide to compile Ppe code for 32 bit execution, you have to use “int, long int, or uint32_t” variables to memorize the pointers in SPE programs.The Ibm examples use “uint64_t” and “uint32_t” variables without include libraries, but on my host if i don't include “stdint.h” i get “error: expected specifier-qualifier-list before 'uint64_t'”.
The UFPR. Web-site provides useful examples about communications mechanism, but keep in mind that to maximize the performance you have to use lower-level functions defined in libspe 2.h, spu_intrinsics.h and cbm_mfc.h.