Sunday, October 17, 2010

Jee-Pee-Yoo :S

Today was pretty interesting, even though the thoughts of an 'extra class' on a Saturday are far from interesting! The teacher started off by giving this whole background of how processors are general-purpose, and how there are some operations that need specific processors. Like the SIMD instructions (Single Instruction Multiple Data) - which are basically used for image/graphics processing. So this needs a GPU. Graphics Processing Unit.

My eyes opened wide. Only the previous night, I had been FULL of questions regarding GPUs, and how well can they be used where SIMD instructions are concerned. Turns out, GPUs are suited FOR SIMD only!

I mean, cool :D

Now I at least know, that SIMD instructions need a GPU for optimal performance. Next thing, how would a GPU be used in a er, CPU :P. The answer lies with what NVIDIA came up with. You must be aware of NVIDIA because of their strong market hold in graphics cards! So they have come up with special GENERAL purpose GPUs ... (lol)... called GPGPUs (General Purpose Graphics Processing Units).

How do they work?

They are basically co-processors, meant to reside in the PCI slot of the PC. They also handle SIMD instructions, BUT, they go well beyond only graphic/image processing applications. They work for say, astrophysics simulations, or ... um, fluid dynamics modeling - basically anyplace where a load of data is involved. And places where a load of simulations are needed. They have a load of cores in them (in the order of 1000s) so they are supposed to perform super fast! If you have a loop running from 1 to 1000, the normal processor (even with hyperthreading, multi cores etc) will only be able to run it sequentially (or run chunks of it, sequentially). And recursion WILL be needed. But in a GPGPU, this computation will be handled easily via the cores - each core will take up one instance of the loop. Run it. Viola. 1000 loops done, in just ONE unit of time. Amazing.

Khair, so that done. The next question. How to program in a GPGPU..? The way that's being done at NVIDIA is through the CUDA architecture [pronounced coodaa] :). CUDA stands for Compute Unified Device Architecture. This enables programmers to use CUDA C (if they're programming in C) use their languages for this specific architecutre, and get their things done.

Sir showed us a 'simple' example. Man, I can't tell you how confusing that 'simple' example was.. :'(. I mean, one has to have a GPGPU in the brain, in order to understand how parallel architecture works. I remember it had been too hard to just keep track of loops when three-dimensional matrix algebra was being done and i was trying to write the code for it. Today, we actually saw, (or tried to see :P).. how this multiplication:

[2 3 5 6]             [5 6 3 2]
[3 5 6 3]     X     [4 5 2 2]
[5 6 7 3]             [8 2 9 3]
[4 6 8 8]             [3 5 2 1]

is done.. in parallel. Now this example above seems easy. But it actually isn't.. especially when you have to divide the whole darn thing into blocks of lil 2x2 matrix babies and then multiply the babies, add the babies and then put them in their respective places .. ahem... and when you consider an N by N matrix. N being... say, a simple...100 or so ...**casually**.... just think about THAT.

Life sure is challenging.


majworld said...

seems interesting.. but my mind is too loaded with assignment related to wireless networks at this time am unable to read technical terms in this post at this time :) its bit advanced course i guess. i studied comp architecture in bachelors in which there was MIPS processor design..dat was interesting..
anyways enjoy the challenge :p

Uni said...

:) What's your assignment about? I mean, what specifically?

It's yes, advanced. But when I go online, this seems like the most common and done-that thing of all!! And imagine, Sir said something like 'Hardly 10 people in Karachi would be knowing programming CUDA'
:S.. Weird.

MIPS yes, we are also studying that now... in Advanced Computer Architecture.

Thanks for dropping by!

majworld said...

we completed reading MAC layer for wired nd wireless, so its few questions related to IEEE 802.11 frame format used in few questions related to other wireless MAC approaches..but time consuming part is writing, as plagiarism is checked here strictly, nd have to write with care to avoid unintentional one too..its the second assignment, first mein i got full marks ;)nd abhi aik term paper bhi likhna hai as term project.

CUDA..heard first time about it..thanks i m nt studying abt it :p..
nd ya, we read complete MIPS..nd also in lab we made its processor in verilog.

Uni said...

Hmm, this brings back memories of undergrad!
It's so good plagiarism is checked strictly!

I have a term paper to write as well - but it's not due till December!

And good going about the marks. Keep it up!

CUDA is very interesting - I so wish I can become proficient in it ! Simply because the instructions aren't difficult - they're just challenging to USE.. !

Thanks for the comment!