Found this interesting piece of information about why accidentally intel chips are little-endian:

The ia32 is a little-endian architecture. When a chip is designed the choice between these two possibilities is pretty much arbitrary. In the case of the ia32, the decision was forced by considerations of compatibility, since all previous Intel chips are also little-endian, and Intel did not want to have incompatible data layouts between different chips, because otherwise transition from one chip to another would be more complicated. If you trace the history of Intel chips ( has a nice account of this history), the very original chip was the 4004. The 4004 was little-endian, because it was the product of a research project which aimed to show that a single chip could duplicate the capabilities of an existing computer. The computer chosen happened to be little-endian, and the reason for that was interesting. This was from the early days of computing (over forty years ago), and the machine they were copying had delay line memories. This memory technology basically stores data on a rotary electronic device very much like a hard disk, in that you have to wait for the data stream to rotate till you can access the byte you want. When you are doing multi-byte additions, it is convenient to access the least significant byte first, so that you can propagate carries to more significant bytes. This makes a little endian arrangement more efficient, since otherwise you would have to wait a rotation between bytes.


Sharing Options:

Intel’s SDK for OpenCL Applications

Intel SDK for OpenCL

The Intel SDK for OpenCL Applications 2012 provides development environment for OpenCL applications on Intel Architecture for Windows and Linux operating systems. This SDK includes code samples, development tools, an optimization guide, and support for optimization tools.

OpenCL is an open standard for a unified programming model supporting both CPU and processor graphics compute devices. It is designed to be used for highly data-parallel applications and for visual computing applications including video, media, and 3D content.

The Intel SDK for OpenCL Applications 2012 supports the OpenCL 1.1 full-profile on 3rd generation Intel® Core processors with Intel HD Graphics 4000/2500 across both CPU and Intel HD Graphics.

Video Tutorials

Intel’s engineers presented a three part webinar series on writing applications using this SDK. Here are the links for the same:

  1. Getting Started with Intel SDK for OpenCL Applications
  2. Writing efficient code for OpenCL Applications
  3. Creating and Optimizing OpenCL Applications

This listing was originally posted here, however the links on the page are dead and above links work.

Sharing Options:

Parallel Computing

Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently (“in parallel”). There are several different forms of parallel computing: bit-level, instruction level, data, and task parallelism. Parallelism has been employed for many years, mainly in high-performance computing, but interest in it has grown lately due to the physical constraints preventing frequency scaling. As power consumption (and consequently heat generation) by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multicore processors.

Parallelism can be broadly classified in three types:

  1. Instruction-level
  2. Data Parallel
  3. Task Parallel

Continue Reading…

Sharing Options:

Intel and Micron announces world’s first 20nm 128Gbit NAND flash

Intel and Micron has announced that their joint venture IM Flash Technologies will be launching 128Gbit NAND flash sometime in the second half of next year. In the meantime, we’ll have to make do with IMFT’s 20nm 64Gbit NAND flash which the two companies announced has gone into mass production.

The 64Gbit parts are built using IMFT’s brand new 20nm process which should for the time being, be the most advanced flash memory manufacturing process.

20nm 128Gbit NAND flash

Read more:

Sharing Options:

We Need More Than Multicore

In a recent article in the HPC Source magazine, HPC consultant Wolfgang Gentzsch discusses the good, the bad, and the ugly of multicore processors. The good: their great performance potential and recent software development environments which provide excellent support for multicore parallelization. The bad: you won’t really re-write all the billions of lines of code out there, would you? Even if you wanted to, how many algorithms resist parallelization, bullheadedly, because they are simply serial? And the ugly: all efforts are for nothing when running even the greatest core-parallel codes in a multi-user multi-job environment. And, hybrid systems will further complicate the challenge of optimizing system utilization. And, it’s all getting worse.

Since the first multicore announcements seven years ago, we have witnessed the release of 2-core, 4-core, 6-core, 8-core, 12-core and, with the latest AMD Interlagos and Fujitsu Sparc64-IXfx, 16-core processors.  In 2012, organizations will be deploying large numbers of relatively low cost 32, 64, even 128 core servers, and one can infer from processor roadmaps that core counts will continue rising at a rapid pace. Yes, Moore’s Law lives on.

Continue Reading…

Sharing Options: