BR Algorithms Corner

chilarai · Post by **chilarai** » 23 Jun 2014 16:19

If you are very much into mathematics and such , you may really enjoy programming in Haskell. (from job market perspective it is a limited market but many banks like standard chartered uses haskell more and more ). I do only very beginner level programming in haskell but love it absolutely. In fact learning haskell improved my programming in languages like C++, python as well. all in all i think its worth a look.

Eric Demopheles · Post by **Eric Demopheles** » 23 Jun 2014 21:05

chilarai wrote:If you are very much into mathematics and such , you may really enjoy programming in Haskell. (from job market perspective it is a limited market but many banks like standard chartered uses haskell more and more ). I do only very beginner level programming in haskell but love it absolutely. In fact learning haskell improved my programming in languages like C++, python as well. all in all i think its worth a look.

chilaraiji, thanks for the information. I had heard about Haskell many years ago, but do not still have a clear idea about it. Could you give a hint about what is functional programming and why is 'C' or python not one, etc?

chilarai · Post by **chilarai** » 23 Jun 2014 23:44

No ji for me plese. Well not that I am an expert but heres my understanding.

in pure functional programming language , the compiler can provide far more gurantees than say C or python. In C say if we have a function declared as

int func1( int a , int b) ,it says that it takes 2 ints as argument and returns an int.
and we can have a similar function in haskell with type signature func2 :: Int -> Int -> Int ( which means it takes an Int and an Int and returns an Int )

Now in Haskell the functions are like mathematical functions and for same arguments it will always give same result. This is guranteed by the compiler . In C on the other hand the function can call rand() and return a different output for each invocation ( or as the cliched example given , even launch rockets ). so it is much easier to reason about haskell programs
Thats the primary difference why Haskell is pure functional language and C is not. But there are many other benifit

1 ) it is very strongly typed : a boolean for example is either True or False. You can never have a integer in a boolean variable unlike C where 0 can be false and 1 can be true. From my experience such behaviour can lead to bugs which are very hard to debug.
2) It is a lazy language : only values which are required are evaluated .( this is not always an advantage though , but there are ways go around the problem )
3. Haskell programs tend to be more concise than equivalent C program and they tend to be more readable.
4. and more often than not haskell programs tend to work correctly once it compiles

that being said I do think it has a bit of steep learning curve. It is easy to get started but some of the advanced concepts i have been struggling with for long . BUt with your background in mathematics you would definitely find the journey smoother.

added later : this is an excellent read http://learnyouahaskell.com/chapters

Amber G. · Post by **Amber G.** » 24 Jun 2014 00:03

x-post (us chances to advance to next round)

from a news source wrote:The United States was seconds away from defeating Portugal on Sunday when Michael Bradley, normally one of the steadiest American players, mishandled a ball in midfield and gave Portugal a last opportunity. Silvestre Varela took advantage, scoring on a header.

Now USA has about 75+% (80% - depending on what data I use) chances .. up from about 2/3 before the match.

- U.S. will not be guaranteed advancement unless it manages at least a draw against Germany

Three ways - Calculate the chances by your model

100% way Draw or ( beat Germany.

(Some odds give about 40% chances of that happening ... what do you think? )

Another 100% way - Lose to Germany and hope Ghana and Portugal draw (30% Prob. per some data)

Less than 100% way - Lose to Germany but edge out Portugal or Ghana — whichever team wins in tie rules..

United States has +1 differential. Ghana -1 differential. Portugal -4.

So if US looses and a victor in Ghana/Portugal with a blow out win US is out..

If Portugal is a victor, the combined blow out (assuming Germany wins by 1-0 or 2-0) .. is less likely (nearly a few percentage) .. something like the both wins has to be something like 3-0 and 3-1.

If Ghana is a winner.. US is ok if the both the losses are 1-0.. higher score.. US is out...

Might be fun to calculate each scenario..

(Ghana/Portugal match must be fun to watch because both teams want to win with a blow out..)
(US/Germany both advance if draw or win by a small difference..)

ArmenT · Post by **ArmenT** » 24 Jun 2014 08:30

Eric Demopheles wrote:
chilarai wrote:If you are very much into mathematics and such , you may really enjoy programming in Haskell. (from job market perspective it is a limited market but many banks like standard chartered uses haskell more and more ). I do only very beginner level programming in haskell but love it absolutely. In fact learning haskell improved my programming in languages like C++, python as well. all in all i think its worth a look.
chilaraiji, thanks for the information. I had heard about Haskell many years ago, but do not still have a clear idea about it. Could you give a hint about what is functional programming and why is 'C' or python not one, etc?

Actually, python does support some functional programming paradigms as well. So does perl.

Me and a few other guys at work are studying Haskell on the side at work (We meet once a week during lunch time and we use learnyouahaskell.com as a guide, as well as a course by an UPenn prof).

The idea behind pure functional languages is that they avoid having state (i.e. no variables) and objects are immutable (i.e. once an object is initialized, it cannot be modified). One way to think about this is:

Code: Select all

C code
=============
int c = 5;
c = 44;
c = c + 5

Haskell code
==============
c = 5  -- Initializes it
c = 45 -- This doesn't work in haskell, as c has been initialized and the value has been set before already 
c = c + 5 -- Again, doesn't work, as c is immutable once it has been initialized.

Basically, in Haskell, it helps to think as though c is not a variable, but is something like the following C code:

Code: Select all

int c() {
    return 5;
}

which is why you can't change it once it has been defined.

In functional languages, the definition between functions and variables is kinda blurred and functions are treated as first-class objects, just like variables and constants are in imperative languages. You can pass a function as an argument to another function and receive functions back from other functions (i.e. higher order functions). Also, quite a bit of haskell code resembles mathematical notation for functions and makes it somewhat familiar to math majors. For instance:

Code: Select all

factorial :: Integer -> Integer
factorial 0 = 1
factorial n = n * factorial(n - 1)

The first line declares that factorial is a function that accepts an integer argument and returns an integer argument. The second line declares that if the argument passed to factorial is 0, then it should return 1. The third line says that if the argument passed is n, then it should return n * factorial(n - 1). This looks pretty similar to how functions are written out in math textbooks. Actually, when I was reading the article about lambda calculus in wikipedia, a lot of that stuff started to look familiar, thanks to me dabbling in scheme and haskell.

One more interesting thing about haskell is that evaluation is lazy and stuff is not evaluated until it is needed. Therefore, the following code is valid in Haskell.

Code: Select all

y = x * 5 + 20
x = 23

Note that x is declared AFTER y, but the code is still valid because the expression for y is not evaluated until it is needed, say if you had to display the value of y later on, that's when it actually evaluates the expression. This also guarantees that changing the order of the statements will not change the final result. This isn't possible in imperative languages like C.

As chilarai said, pure functions are those functions that will always return the same values, given the same arguments. E.g. here's some C code for a pure function (I'm using C because you said you're familiar with it).

Code: Select all

int sum(int x, int y) {
    return x + y;
}

The above will always return the same result if you pass the same argument. Also, it has no side effects. This allows the compiler to perform certain optimizations. For example, if you call the function twice in the code with the same arguments, the compiler can detect that you already called the function with the same arguments before and cache the result and return you the same value instead of calling the function a second time. Also, pure mathematical computations can be easily split into several parallel threads to evaluate the sub-expressions, which is pretty advantageous for multi-core CPUs. Since C is mainly an imperative language, a C compiler cannot easily guess whether a function is pure or not and is pretty conservative about making optimizations of this sort.

Now let's look at an impure function or two (again using C):

Code: Select all

time_t get_current_time() {
    return time(NULL);
}

int get_random_number(int min, int max) {
    return (rand() % max) + min;
}

void print_greeting(char *name) {
    printf("Hello %s\n", name);
    printf("How are you today\n");
}

In the above code, the first two functions aren't pure because they can return different results even if they are called with the same arguments. The third function has the property of having a side-effect (i.e.) it causes the screen to scroll every time it is called. These functions are therefore not pure functions. In general, any I/O operation is impure because it has a side effect of altering the environment and also the compiler needs to execute them in some specific order (i.e. you can't call two output functions in parallel because the output may come out in the wrong order than you expect).

While Haskell (and other functional languages) do support impure functions, it requires the programmer to mark them specially (in haskell, this is called a 'monad').

Hope this piques your interest.

Eric Demopheles · Post by **Eric Demopheles** » 24 Jun 2014 09:31

chilarai, ArmenT,

Thanks a lot for the explanation on functional programming. It was very helpful. And a lot of food for thought. So the "pure" functions cannot do I/O, access pointers, generate random numbers, etc., right?

Although Haskell is interesting from a computer science/programming perspective, here my motivation was the job market and suitability for scientific computing. So if I must learn a language, Python still seems to have got advantages.

ArmenT · Post by **ArmenT** » 24 Jun 2014 10:19

Eric Demopheles wrote:So the "pure" functions cannot do I/O, access pointers, generate random numbers, etc., right?

Any function that has side-effects or maintains some internal state cannot be called a "pure" function.

Eric Demopheles wrote:Although Haskell is interesting from a computer science/programming perspective, here my motivation was the job market and suitability for scientific computing. So if I must learn a language, Python still seems to have got advantages.

Python is not a bad language by any means. The only reason I learn multiple languages is because I find that it alters my way of thinking about problems and I become a better programmer as a result. For instance, after I read "Higher Order Perl", I actually started writing C code in a functional style and wrote a very interesting extensible module for our system, that is very admired by the other guys in the team for its ease of use and functionality.

chilarai · Post by **chilarai** » 24 Jun 2014 11:43

Eric Demopheles wrote:
here my motivation was the job market and suitability for scientific computing. So if I must learn a language, Python still seems to have got advantages.

for scientific computing , python at the moment is the winner , what with some fantastic libraries like NumPy around

ArmenT · Post by **ArmenT** » 24 Jun 2014 21:11

pandyan wrote:for large datasets, do you use python on a big box or use pydoop style programming to divide and conquer

Depends on the definition of "large" and also if the type of dataset suits parallel processing. If it is just one or two files from a single server for a 10-minute period, probably easier to just bang out a standard python script right then and there. For larger datasets (e.g. files from an entire cluster of servers for a few hours worth of data), it is hadoop for me.

For the record, I hadn't heard of pydoop before. I just use the built in streaming jar to feed data to my mapper and reducer jobs written in python and it works just fine for my purposes. Looking at the install instructions on the pydoop site, it appears to be a fair amount of work to get it set up on a hadoop cluster (the install instructions on their site appear to assume that the hadoop cluster has only one node).

Eric Demopheles · Post by **Eric Demopheles** » 25 Jun 2014 07:11

SaiK wrote:in comparison c/c++, how does sage perform? let us say the same algorithm written in sage or in c/c++.

SaiK-ji, the problems are such that it wouldn't even occur to me that I could write it in C.

What follows is the perspective of an amateur programmer whose main attention is on math. The more computing-oriented issues can be answered by the gurus.

For example, suppose I want to find the factorial of a number. Let us say, we write it in C:

Code: Select all

#include <stdio.h>
 
int main()
{
  int c, n, fact = 1;
 
  printf("Enter a number to calculate it's factorial\n");
  scanf("%d", &n);
 
  for (c = 1; c <= n; c++)
    fact = fact * c;
 
  printf("Factorial of %d = %d\n", n, fact);
 
  return 0;
}

Copied from:

http://www.programmingsimplified.com/c- ... -factorial

Not because I can't write it myself, but for ease of posting.

Not only that the program is long, I have to compile it, debug for any compilation errors etc.

Compare the above to PARI/GP:

Code: Select all

factorial(x)

Or to Python:

Code: Select all

math.factorial(x)

Etc.

This is just for the very simple calculation of factorial. We don't even want to imagine doing more involved stuff like testing whether a given number is prime, or elliptic curve computations, in 'C'.

Of course, if the algorithm is good, 'C' is far faster, for the same code.

But python, or PARI/GP, or SAGE, etc. come pre-packaged with algorithms that are the best available, or at least something very close. Algorithms written and compiled in C or C++ as binaries or modules, usually.

In summary, the only reason I could think of using 'C' or C++ is to build libraries or binaries for PARI/GP, SAGE, Mathematica etc. themselves, or for other high performance purposes.

Eric Demopheles · Post by **Eric Demopheles** » 25 Jun 2014 07:18

ArmenT wrote:
pandyan wrote:for large datasets, do you use python on a big box or use pydoop style programming to divide and conquer
Depends on the definition of "large" and also if the type of dataset suits parallel processing. If it is just one or two files from a single server for a 10-minute period, probably easier to just bang out a standard python script right then and there. For larger datasets (e.g. files from an entire cluster of servers for a few hours worth of data), it is hadoop for me.

For the record, I hadn't heard of pydoop before. I just use the built in streaming jar to feed data to my mapper and reducer jobs written in python and it works just fine for my purposes. Looking at the install instructions on the pydoop site, it appears to be a fair amount of work to get it set up on a hadoop cluster (the install instructions on their site appear to assume that the hadoop cluster has only one node).

Sir,

This thread is the first time I am hearing of hadoop etc. and I got interested and looked it up a bit. It seems Java is used a lot for it. My belief about Java was/is that it has low performance because it is an interpreted language run on a virtual machine, with all the extra overhead coming from it, and that hence it is unsuited for all high performance work. A few posts back you said Java is used for high performance stuff when python is not enough. How could it be? Forgive me if this question is naive.

ArmenT · Post by **ArmenT** » 25 Jun 2014 11:44

Eric Demopheles wrote: Sir,

This thread is the first time I am hearing of hadoop etc. and I got interested and looked it up a bit. It seems Java is used a lot for it. My belief about Java was/is that it has low performance because it is an interpreted language run on a virtual machine, with all the extra overhead coming from it, and that hence it is unsuited for all high performance work. A few posts back you said Java is used for high performance stuff when python is not enough. How could it be? Forgive me if this question is naive.

Java has improved in performance over the years and JIT (Just In Time) compilation technologies give the java runtime engine quite a performance boost. My comments were with respect to coding for hadoop jobs.

Basically, the hadoop framework is written in java (for various reasons including cross-platform portability. It is not unusual to have a hadoop cluster consisting of a hodge-podge of machines). Therefore, jobs written in java have access to most of hadoop's parts. They also offer a C++ interface (hadoop pipes) to write code in C++. However, these are not the only two options -- they also offer a streaming jar that streams data to and from jobs via stdin and stdout. When using the streaming jar interface, the hadoop jobs can be written in any language that can read from stdin and write to stdout. This means you can write hadoop jobs in perl, python, C, BASIC, ruby, R, whatever. However, the overhead of the streaming jar has a price, because hadoop streams data through the jar, which writes to stdin, which is read by your program, which writes to stdout, which is read back by the streaming jar back into hadoop. This can add something like 3x extra running time to your script, than if you wrote it in java (this is per some benchmarks I ran a few years ago when we first started playing with hadoop. If you do a search for "hadoop" on this forum, you'll see me raving about it around 4 years ago, so it has been a while since I ran those benchmarks). Therefore, jobs which take say 1 min. to run in java, take 3-4 minutes in perl or python. Also, running through the streaming interface uses a bit more memory and cannot handle really large datasets. However, it is a lot easier to write map-reduce jobs in perl or python than it is to write one in java. Also, many of our operations guys understand perl compared to java, so they usually write ad-hoc reporting jobs in perl (because they don't care that the report takes 3 minutes longer to run, it is still acceptably fast for them). So, if someone asks for some custom report on a few hours or a day's worth of data, it is generally written in a scripting language like perl or python. Our analytics/stats guys write their hadoop jobs using a language called R, because they're most familiar with it (The R and S languages are big with statisticians). For jobs that need to run over larger datasets (such as a month's worth) and crunch a lot of data, we generally write them in java for the extra performance + less memory usage (running out of RAM does cause kernel panics, so less RAM usage is a good thing).

In general, for hadoop:
run time of java job > run time of C++ job using hadoop pipes > run time of job using a streaming jar.

However, it must be remembered that hadoop itself is implemented in java and that's why java is the best supported language within the hadoop framework. There are other map reduce frameworks written in other languages though. For instance, the famous one that really got the ball rolling big time is Google's map reduce framework, which is written in C++ and has interfaces for C++, python and java.

vina · Post by **vina** » 25 Jun 2014 12:58

Our analytics/stats guys write their hadoop jobs using a language called R, because they're most familiar with it (The R and S languages are big with statisticians)

Yes. R is a god send . It is just the free /open source version of S+ . Without FOSS stuff (Sqlite works fine for me, Linux and R), my set up costs itself would have been a whopping amount just 10 years ago (think Orcale,Sun Workstation and SAS) , and no way for a boor abdul like me to get into bijiness.

Eric Demopheles · Post by **Eric Demopheles** » 26 Jun 2014 09:37

ArmenT,

Sir, This explanation was a lot helpful in understanding the present-day status of Java. I just browsed and read a few posts on StackOverflow regarding this too.

By the way, do Android phone apps use JIT compilations?

While we are talking of StackOverflow, how helpful is it for a programmer in general? Addiction may be a problem..

SaiK · Post by **SaiK** » 03 Jul 2014 02:19

ArmenT, is there a way you could use hadoop to leverage your bit array techniques?

ArmenT · Post by **ArmenT** » 05 Jul 2014 13:02

SaiK wrote:ArmenT, is there a way you could use hadoop to leverage your bit array techniques?

Sorry, didn't see this earlier. Sir, I don't see where hadoop would give you an advantage for this. The whole idea of the bit array is to provide a quick lookup for a set of items which are sequential (with potential gaps in between). Hadoop is used for processing large volumes of data, where the processing operation can be parallelized. Apples and oranges really.

ArmenT · Post by **ArmenT** » 05 Jul 2014 23:56

Ok, here's a new fun topic of discussion. Something I learned from Knuth Vol III ("Sorting and Searching"). What is beautiful about it is its simplicity and the fact that this situation occurs often in code that many of us might have written.

Problem: You have an unsorted list of items (can be an array, singly linked list, doubly linked list, tree etc.) and you need to search through it for an item. Code should return true if found and false if not.

The way that many people implement this (pseudo pythonic code, but should be understandable to anyone with a background in any programming language):

Code: Select all

result = False
N = length_of_array 
i = 0
while i < N:
    if array[i] == item_to_find:
        result = True
        break
    i = i + 1

return result

Now, this is a classic O(N) algorithm, where the code linear searches through the array (or linked list or whatever) until it either finds what it is looking for, or runs off the end of the array. Very easy to understand and everyone has possibly written something like this at some point.

Most people assume that there is only one comparison operation happening inside the loop, but the code actually performs two comparisons through every iteration in the loop. One to check whether we have found the item we are looking for and a second check to find out if we have reached the end of the array or not. The reason I highlighted this second one in bold font is because many people completely overlook the fact that the second check is happening (possibly because the check on the line with the "while" keyword is not indented with the rest of the loop). The same thing happens if you have a "for" loop instead of a "while" loop -- there are two comparison operations actually happening, but many people tend to assume there is only one.

Knuth presents an interesting variant in Book III of "The Art of Computer Programming". Basically, he says that if the data structure (array, list etc.) can have an additional item inserted at the end, we should add a "dummy value" (a.k.a sentinel value) at the end. So what should this "dummy value" be? It should be the item we are searching for. So, the algorithm would be rewritten like this (again, pseudo-python, but easy to follow for anyone who has programmed in any language):

Code: Select all

result = False
N = length_of_array 
i = 0
push array, item_to_find 

Loop:
    if array[i] == item_to_find:
        result = True
        break
    i = i + 1

if i == N:
    result = False

return result

Now this code looks similar to the previous one. It is also an O(N) algorithm, but the beauty of this code is that while it looks a bit larger, it actually performs 50% fewer checks inside the loop. Unlike the previous code, this one does not need to check inside the loop whether it has reached the end of the array or not, because it is always guaranteed to find the item we are looking for, as we have artificially placed it at the end of our array. We only need to check at the end to see if the item that is found is the dummy value that we placed earlier or not. Therefore, we remove one comparison from inside the loop (which happens multiple times, since it is inside the loop) and just add one comparison outside the loop that happens only once. So while both algorithms are O(N) algorithms, the second one is more efficient because it performs almost 50% less comparison operations overall.

The beauty of this is that it is such a simple modification to a linear search and is trivial to implement, but almost doubles the performance as the array size grows. It is also language agnostic -- if your programming language of choice doesn't support dynamic array resizing, you can always allocate a static array of a size one larger than what you need to hold your data and use that last element to hold your dummy value. And it can be used in other data structures as well: singly or doubly linked lists, trees etc.

SaiK · Post by **SaiK** » 06 Jul 2014 03:41

makes sense, but where does the loop end?.. sorry used to the curly braces.

on the hadoop question, I was wondering if there are large chunks of block data distributed, then would it be possible to read these chunks faster by using such mechanisms - of course I am blabbering with half knowledge. so, if key or index to the block of data can be from using your encoded bits, can it be represented in a manner to leverage that lookup?

may be I am assume such a use will be there.. I am struggling to get a good example.

ArmenT · Post by **ArmenT** » 06 Jul 2014 03:49

SaiK wrote:makes sense, but where does the loop end?.. sorry used to the curly braces.

For you sir, here's the first program written with curly braces showing where the blocks begin and end:

Code: Select all

result = False;
N = length_of_array;
i = 0;
while (i < N) {
    if (array[i] == item_to_find) {
        result = True;
        break;
    }
    i = i + 1;
}

return result;

and the second piece of code translated to look more like C or C++ (still pseudo-code, but hopefully more like something you are used to):

Code: Select all

result = False;
N = length_of_array;
i = 0;
push(array, item_to_find);

while (True) {
    if (array[i] == item_to_find) {
        result = True;
        break;
    }
    i = i + 1;
}

if (i == N) {
    result = False;
}

return result;

SaiK · Post by **SaiK** » 06 Jul 2014 03:54

would not that error out? or buffer overrun when i + 1 >= N ?

ArmenT · Post by **ArmenT** » 06 Jul 2014 04:08

Not unless you try to actually access an element outside the array range. One of the assumptions made by this algorithm is that you can add an extra element to the end of the array. For languages that support dynamic arrays (say C++, python, perl etc.), you can simply push an extra element to the end of the array. For languages that support static arrays (like C, FORTRAN, BASIC etc.), you simply declare the array size to be N+1 when you actually need to store N elements, and then use the last element to store the sentinel value. There's no question of a buffer overrun happening, as you are not accessing any elements outside the array bounds.

SaiK · Post by **SaiK** » 06 Jul 2014 19:14

aah! I forgot to see the push()! thx

VikramS · Post by **VikramS** » 19 Jul 2014 13:07

Regarding languages:

I think knowledge of one core Object Oriented Language and one scripting/Interpretive language is good enough.

Object Oriented Language: Java or C++(C)
Java is very popular in Hadoop type applications; C++(C) is very popular in high performance computing. In terms of market demand and availability of developers Java >> C++.

Interpretive/Scripting Language: It used to be Perl; now Python is the norm.

Specialized Languages:
Functional Language like Haskell

Concurrent Language like Erlang

SaiK · Post by **SaiK** » 20 Jul 2014 08:12

^can you explain what concurrent language means? i thought concurrency is supported by all of the languages you have mentioned above.

ArmenT · Post by **ArmenT** » 21 Jul 2014 11:55

^^^
Other languages such as C, C++ etc. do support concurrency, but they do it using external libraries (e.g. libpthreads, OpenMP, Thread Building Blocks (TBB) etc.), or with some OS-level functions (CreateProcess(), execve(), mutexes, semaphores etc.) Erlang does these natively as part of the language. Erlang also hides a lot of the underlying low-level socket communication stuff behind the language.

SaiK · Post by **SaiK** » 22 Jul 2014 05:12

a quick chacha gave me http://www.slideshare.net/artancami/erlang-vs-java
so language based concurrency and share nothing i note.

would it be able to leverage features like Java-NIO?
share nothing, then it is all message passing. should be cool to work with good parallel activities, and not so dependent on each other.

SBajwa · Post by **SBajwa** » 22 Jul 2014 23:23

If I give you lottery numbers of each day from last 10 years can you come up with an algorithm for the top 20 numbers based on this data?

Vayutuvan · Post by **Vayutuvan** » 22 Jul 2014 23:45

An SDRE ivi lega (stats or CS) prafessor did just that a couple of years back.

Vayutuvan · Post by **Vayutuvan** » 02 Aug 2014 09:14

ravi_g:

As promised, here are a few links to quantum computing.

An Introduction to Quantum Computing for Non-Physicists

The above article is quite good. Prereqs are a little bit of linear algebra. It requires basics like basis, outer products, linear transformations and their superposition property. One can simply take certain things on faith and move on. Dirac's bra-ket notation is explained which is a compact notation for certain permutation matrices (or unitary . It is a survey paper 45 pp. and quite understandable if one follows the definitions and formal manipulation from definitions. This article gives a lot of references and includes a good description of Shor's quantum algorithm for bounded probability integer factorization. This paper does not explain how to build a quantum computer. For that one needs a lot of experimental physics which is a wild guess as I do not have the faintest idea about how one would go about building a q.computer, entaglement, how many particles can be entangled without losing coherence and for how long etc. Those are all important for one physical realization of a q.computer. There is a reference and one can follow through. I myself am interested in the abstract QC model, i.e. assuming that such a QC is realizable, what kinds of functions can be computed and temporal and spatial resources required to perform the computation.

I have taken the above from the following link which has lot more links to tutorials. I would suggest one to read the paper by Deutch on a Quantum Turing Machine. Also important is the paper by Bernstien and Vazirani (younger of the Vazirani brothers) on Universal Quantum TM. Also Toffoli Gate is an important concept to understand.

Quantum Computing Introductions and Tutorials

The above link has a link to Quantum Computation explained to my Mother. I am planning to read it once and then another time before leaving for dEsh to visit my mom.

I would like to inject a bit of caution at this point as it is not all fun and games and we are home free with unlimited computational power at our beck and call if and when a Qc is physically realized. The power of QCs can be harnessed only under some conditions (under the assumption that a robust QC has been built, operational, and all the nitty-gritty like programming languages have been defined and compilers have been built along with reasonable input-output).

1. There is a quantum algorithm for the problem to be solved.
2. An exponential speed up is enough to solve practical sized problems.

As of now there are only two popular quantum algorithms - one is Shor's algorithm for factoring integers and Love Grover's algorithm to speed up certain algorithms by a quadratic factor. Both are exaplined in the article in the first link above. To understand the second constraint, one has to understand the unbounded parallelism inherent in NTMs (Non-deterministic Turing Machines). The parallelism in an NTM is far beyond a simple exponential speedup. Some critics (Prof. Gil Kalai of Yale is the foremost) feel that QCs are impractical for solving NP-hard problems. A simple argument will show that NP-complete problems would require (even with exponential speedup possible with a QC) very large nmumber of particles to in entangled state for a long enough time. But on the other hand there are others (Prof. Scott Aronson of MIT, student of U. Vazirani) feel that QCs will be realized and in use within 10 years or less.

In any case, they are nice abstract devices on the same footing as TMs and are fun to think about.

There may be several terminological mistakes in the above but then I am not a QC peson so please excuse and correct.

HTH.

SaiK · Post by **SaiK** » 02 Aug 2014 09:35

so how many quantum bits can be stored in one digital bit?

Vayutuvan · Post by **Vayutuvan** » 03 Aug 2014 07:15

SaiK: The question has to be asked in the reverse, i.e. how many bits (no such thing as "digital bit" - it is either digit or bit which is a "binary digit" where the base is 2 instead of base 10 for digit) can be stored in a qubit? Then again one has to define what one means by "storing". But the explanation in Wikipedia using a Bloch sphere is quite lucid (but doesn't answer your specific question - modified by me). You still need to work out a few examples. The fun is when two (or more) qubits are entangled. Also when you are looking at qubits wiki page, please pay particular attention to qubit registers. Unitary transoformations are of importance as well as closed systems. "closed system" assumption is a strong condition and has implications on the physical realizability of a QC of sufficient power (IOW sufficient word length and the duration of time for entanglement to be maintained) to be to solve practical problems which are solved routinely on current 64 bit desktops. One can solve large problem instances of NP-complete (let me add "NP hard" so that algorithmic purists do not come down on me like a ton of bricks) problems for all within certain approximation which is good enough for all practical purposes.

IMVH(very humble)O, QCs will not be realized in the 30 years (and AS not in the next 20 years). But then I am a wussy - not going to wager $100K to counter Prof. Scott Aranson of MIT.

chilarai · Post by **chilarai** » 03 Aug 2014 12:12

SBajwa wrote:If I give you lottery numbers of each day from last 10 years can you come up with an algorithm for the top 20 numbers based on this data?

My friend who was using Hidden Markov Model for his PeeChaddi in voice authentication, used his code to come up with lottery numbers. Both of us lost 60$ in the process

SaiK · Post by **SaiK** » 04 Aug 2014 01:58

i'm blotched to visualize quibit info sphere.. so let me ask questions from states and control angle. how will you control a human readable info bit (say 0 or 1) in qubit form, to move from one state to another. is this all due to quantum polarization or states can be also managed using quantum jumps leading to more quantum storage space. just trying to get an idea of given a bit of the sold state space, if i could replace that space with a quantum storage bit device, what would be the equivalent information that can be stored in a quantum sphere?

kanpooshun onlee

SaiK · Post by **SaiK** » 04 Aug 2014 09:56

looking for solutions how to map key-set of values. now, the problem is values are keys as well. let us call them peers. given any peer, I need to map the set of keys, excluding itself (the key on which the map function is called).

which is the best way to store and access this?

key->{keys},
or x1->{x1..xN} - {x1}, x2 -> {x1..xN} - {x2} ... xN -> {x1..xN} - {xN}.

pick any language as long as few explanations are given. The question is about both structuring and algorithm to access the fastest way. I am thinking about maps.. but I may be wrong.

ArmenT · Post by **ArmenT** » 04 Aug 2014 19:49

Hash of hashes is one way to tackle it.

Vayutuvan · Post by **Vayutuvan** » 04 Aug 2014 21:22

Hashing is usually as good as the hashing algorithm. So the usual caveat applies regarding collisions. If you don't have much information about the key distribution, I suggest to avoid hashing.

If it is static: use sort followed by binary search with the equivalence classes themselves stored in an array and each key in the class pointing to same array.

If dynamic, you need a heap and the equivalence classes stored in a dynamic array or a linked list. If the classes are small linked list will be less overhead.

If you want the classes sorted then keep them in a balanced binary tree for dynamic or a simple sorted array for static case. In either case ones you have the initial heap if the sorted array of the universal set you can get these built from them - no need to separately sort them.

Heap with a union find is another possibility if these are equivalence classes, disjoint sets.

Vayutuvan · Post by **Vayutuvan** » 04 Aug 2014 21:27

ArmenT: thanks for mentioning sentinels. sentinel trick is a good trick to remember, especially if one is programming in C.
---

This might be of interest to some.

Python Patterns

The site has short pythonic mini-patterns. These are so short that they can be quickly glanced through at one look. This is especially useful for me as I am very bad with syntax of languages.

SaiK · Post by **SaiK** » 05 Aug 2014 03:05

I liked the each key pointing to the same array. thx! not much concern on space i have now, as it would be just the extra space for reference to the array for each key. linked list is also interesting... perhaps double linked, i can traverse left or right all the way to the edges.

xa-n, xa-1, ... <xa> ... xa+1, ...xa+m ? xa is key, -/+ for link traversal left or right.

Ardeshir · Post by **Ardeshir** » 08 Aug 2014 12:53

I have a problem, and would really appreciate some direction on this.

I have n-number of tables taken from different sources (eg. Walmart, Target, Mom N Pop Store etc.) with identical descriptors such as Product Name, Product Price, Product Brand, Product Description Etc. Given that the same product can be listed in a different ways by different sources (eg. "iPhone 5s 16gb" vs "Black 16gb iPhone 5s"), how can I find common products across these n-number of tables? One approach I have been exploring is the use of Bayesian Classification.

SaiK · Post by **SaiK** » 08 Aug 2014 22:46

^what do you mean by common products across "these n-number of tables"?

aren't we talking about proxy names or aliases for the products? the reference to the actual product is direct no? or is this within the scope of the aliases words, and how to map them to the right product?