BR Algorithms Corner

Jaspreet · Post by **Jaspreet** » 14 Jun 2014 18:48

Inspired by BR Maths Corner, I would like to start an Algorithms Corner.

Many BRites are software developers and this thread would be an ideal ground for them to learn and contribute and help others.
Many software jobs these days have tough entry requirements starting with an initial phone interview, an online test, an algorithms & technical interview and finally an HR interview.

The expectation in this thread is not just to state an algorithm but also to discuss it in terms of intuition, performance and related tangential discussions that are the hallmark of any vibrant conversation.

Please feel free to make book suggestions.
Bit-fiddling algorithms to make calculations faster are also welcome.

[Note to moderators: Obviously, your word is the last, so if you think this thread doesn't belong in this forum, feel free to delete it. No problem.]

Jaspreet · Post by **Jaspreet** » 14 Jun 2014 18:58

I will set the tone by talking about a permutation algorithm.

The problem is: Given a set of n digits, find all the permutations.
For example, given 1 & 2, the permutations are
1 2
2 1

Given 1 2 3, the permutations are
1 2 3
1 3 2
3 1 2
2 1 3
2 3 1
3 2 1

The number of permutations is n!.

An algorithm to generate all n! perms is attributed to Narayana Pandita (14th century).

http://en.wikipedia.org/wiki/Narayana_Pandit

It is described here:
http://en.wikipedia.org/wiki/Permutatio ... phic_order

Though how he arrived at it and what's the intuition and insight behind, I don't know.

Jaspreet · Post by **Jaspreet** » 14 Jun 2014 19:02

Book suggestion.
One book that I've seen recommended is
Data Structures and Algorithms Made Easy by Narasimha Karumanchi

http://www.flipkart.com/data-structures ... yfq8sgvft3

If anyone has read it, their review is welcome.

Jaspreet · Post by **Jaspreet** » 14 Jun 2014 19:10

A question I was asked recently:

Given an m by n array of letters, create four letter words by adhering to the following:
1. Start from element 0, 0.
2. You may go up, down, left or right but not diagonally.
3. If you can't create a four letter word (maybe because the path you took reaches the top or bottom) you should go somewhere back in the middle or maybe at the start (you have to determine where to go) and start from there.

If you have thus enumerated all four letter words, how will you find they're in the dictionary.

Jaspreet · Post by **Jaspreet** » 14 Jun 2014 19:14

A variation of the above is this:
Given an n by m array of letters, find how many times a certain word occurs in it.
For example, the array might be
a f o r
b c f o
d r o r

how many times the word "for" occurs in it.
The algorithm should take any word and find its count.

ArmenT · Post by **ArmenT** » 14 Jun 2014 23:37

Jaspreet wrote:A question I was asked recently:

Given an m by n array of letters, create four letter words by adhering to the following:
1. Start from element 0, 0.
2. You may go up, down, left or right but not diagonally.
3. If you can't create a four letter word (maybe because the path you took reaches the top or bottom) you should go somewhere back in the middle or maybe at the start (you have to determine where to go) and start from there.

Never answered a pathing question in my life, but was pretty easy to hack. Here's a solution in python (with font resized to tiny, so as not to be a spoiler):

[code]
#!/usr/bin/env python

# Author: Armen Tamzarian, Date: 2014-06-14
# Use of this code for cheating at interviews is forbidden.

import sys
import pprint

# List of path directions
NONE = 0
RIGHT = 1
DOWN = 2
LEFT = 3
UP = 4

def init_lists():
"""Initializes an array of size mxn and returns its size"""
array = []
array.append(list("afor"))
array.append(list("bcfo"))
array.append(list("dror"))
return array

def generate_permutations(array):
"""Generate the permutations."""
max_rows = len(array) # We're assuming that this is not an empty array
max_cols = len(array[0]) # and we are also assuming that all rows have equal # of columns
answers = [] # We aren't planning to eliminate duplicates in the answer
for row in range(max_rows):
for col in range(max_cols):
tmp_list = []
permute(array, row, col, max_rows, max_cols, tmp_list)
answers += tmp_list
return answers

def permute(array, row, col, max_rows, max_cols, tmp_list, prev_direction = NONE, string = ''):
"""Generate a set of permutations of 4 characters max.
prev_direction indicates which direction our path came from, so it won't reverse back into the same
path that it came from.
"""
string += array[row][col]
if len(string) == 4:
tmp_list.append(string)
return
if row - 1 > 0 and prev_direction != DOWN:
permute(array, row - 1, col, max_rows, max_cols, tmp_list, UP, string)
if row + 1 < max_rows and prev_direction != UP:
permute(array, row + 1, col, max_rows, max_cols, tmp_list, DOWN, string)
if col - 1 > 0 and prev_direction != RIGHT:
permute(array, row, col - 1, max_rows, max_cols, tmp_list, LEFT, string)
if col + 1 < max_cols and prev_direction != LEFT:
permute(array, row, col + 1, max_rows, max_cols, tmp_list, RIGHT, string)
return tmp_list

if __name__ == "__main__":
array = init_lists()
answers = generate_permutations(array)
pprint.pprint(answers)[/code]

Jaspreet wrote:If you have thus enumerated all four letter words, how will you find they're in the dictionary.

Standard way to do this is to read all 4 letter words from dictionary into a dict (pythonic term. Other languages may call the same thing a "hash table", "associative array", "dictionary" etc.). Then go through the answer array and do a hash lookup for each word.

Another alternative is to read the 4 letter words from a dictionary into an array of strings and then sort them in alphabetical order. Then you can lookup words in this array using a binary search. (this is done this way in some code that we maintain and we've never bothered to change to hash tables because it is fast enough for us as-is)

Yet another alternative is to construct a patricia trie of 4 letter words from the dictionary (yes, I've used this a few times as well for a particular product of ours)

Yensoy maadi

ArmenT · Post by **ArmenT** » 14 Jun 2014 23:46

Jaspreet wrote:A variation of the above is this:
Given an n by m array of letters, find how many times a certain word occurs in it.
For example, the array might be
a f o r
b c f o
d r o r

how many times the word "for" occurs in it.
The algorithm should take any word and find its count.

Pretty easy to modify above code to do it.
After accepting the input word, the code just computes the length of the input word as N and then generates all combinations of N length strings (instead of length 4) and then counts how many times the input word is generated.

Note, the code above has a flaw that doesn't allow it to work quite correctly for strings greater than length 4. How to fix it to work for strings of any length is left as an exercise for the reader

.

As Thalaiva would say, "Jujubee!"

Vayutuvan · Post by **Vayutuvan** » 15 Jun 2014 22:25

ArmenT: Thanks for the PATRICIA Tries reference. I did know about Radix trees but did not know they are AKA PATRICIA tries.

Long back I attended a grad seminar lecture on tries by a professor from Padua who was working on sparse tries. He had some great ideas and was supposed to be at that time one of the leading researchers in tries.

Jaspreet · Post by **Jaspreet** » 16 Jun 2014 01:22

Hi Armen,
Thanks for spending time on the algorithm and then posting your solution.

My algorithm is as follows:
For each element in the array (or matrix)
... push into a stack
... remember that we visited this location. We need to remember this so that when we generate neighbours we don't include it.
... The letter under consideration is the first letter of the word.
... As long as the stack isn't empty
...... Look for a neighbour of the letter under consideration. The neighbour is the next letter in the word, e.g. 'o' if the letter under consider is 'f'.
...... If no such neighbour found
......... pop stack
......... the letter under consideration is the previous letter
.......Else if one of the neighbours is the next letter of the word
.........push it into the stack
.........remember we visited it
......... the letter under consideration is the next letter of the word
......If we found the word
.........increment word count
.........pop the stack
....... the letter under consideration is the previous letter
...
... At this point the stack is empty, before going to the next element of the array,
... we forget all points visited, so that we can correctly include them as neighbours of the elements we consider.

I have implemented this in C++, which I know better than any other language. I can post the code if someone is interested. Be warned though, it isn't as pretty as Python.

Edited for clarity.

Jaspreet · Post by **Jaspreet** » 16 Jun 2014 01:24

MatrimC,
Thanks for bringing my attention to PATRICIA Tries. I'll study it and see how I can incorporate it in my solution.

ArmenT · Post by **ArmenT** » 16 Jun 2014 10:58

Actually, there is another way you can lookup words real quickly without using hashes or tries (patricia trees/radix trees are not all that good for large numbers of shorter strings). Note that in the specific case, the word is only 4 letters long, so we can use a special case search structure (a.k.a. the humble array!)

Since there are 26 letters in the alphabet (treating lowercase and uppercase as the same), maximum number of combinations of 4 letter words is 26 ** 4 = 456976

Therefore, if we have an array that is around 456k long, we can hold every possible combination of 4 letter words. And we don't have to store the actual words, we can simply store 1 or 0 in the appropriate locations depending on whether a word occurs in the dictionary or not. (e.g. "aaaa" would cause array[0] = 1, "aaab" would cause array[1] = 1, "aaae" would cause array[4] = 1 etc.) Remember, memory is cheap these days, so a 456k array of bytes is chump change.

However, if we wish to optimize for memory, we can actually use individual bits instead of bytes to store 1 or 0. Therefore, our array size will become 456976 / 8 = 57122 bytes, which is less than 64K of memory (which ought to be enough for anyone, per Bill Gates

).

Now our lookup becomes simple. For instance, we want to see if "face" is in the dictionary or not. So we proceed as follows:
'f' is 6th letter of alphabet and first letter in the word.
'a' is 1st letter of alphabet and second letter in the word
'c' is 3rd letter of alphabet and 3rd letter in the word
'e' is 5th letter of alphabet and 4th letter in the word.
Therefore, its position in the lookup array is (6 - 1) * 26^3 + (1-1)*26^2 + (3 - 1)*26 + (5-1) = 87936
Therefore, we just look up the 87936th bit in our dictionary (i.e) bit 0 of byte 10992 in our array (10992 = 87936 / 8 ) and see if the bit is set to 1 or 0. If the bit is set to 1, we know the word is in the dictionary and if it is set to 0, we know it does not exist.

While this is a specific algorithm for 4 letter words only, it may be faster than some general purpose hashing algorithms or patricia tries, since all we are doing is computing the array index (and part of that computation can be parallelized) and then checking the bit corresponding to that index.

@Jaspreet: Would like to see your C++ solution as well. Please post away (For the record, I actually program in C, C++ and java mostly at work.)

Jaspreet · Post by **Jaspreet** » 16 Jun 2014 20:25

My C++ solution is here.
It compiles under Visual Studio 2013.

Code (and design) review is welcome. But this is probably not the kind of code I'd write in production environment. This was only meant to illustrate an algorithm.

Recommended starting point is GetCountOf() function. Others are helper functions.

[Edit: I found a bug in it and am fixing it.]

SaiK · Post by **SaiK** » 17 Jun 2014 17:44

matrimc+1. thanks for the learning here folks. good infos.
having 26 root elements, we can have the whole inglicks dictionary built.

--just realized my 13" ultra 4k display is not good for reading tiny fonts.. but thanks to fingering and zooming (an algorithm world in a different class I guess).

ArmenT · Post by **ArmenT** » 17 Jun 2014 21:34

SaiK wrote: --just realized my 13" ultra 4k display is not good for reading tiny fonts.. but thanks to fingering and zooming (an algorithm world in a different class I guess).

Hit Ctrl - Shift - + multiple times to increase font size (+ key being the one above =, not the one of the right side of the keyboard). Then hit Ctrl - 0 to go back to regular font size.

Vayutuvan · Post by **Vayutuvan** » 17 Jun 2014 22:24

ArmenT: that bit array trick is nice. But of course the space grows exponentially on the length of the strings.

---

I will post a couple of links to ready made code - one public domain one in Python and another gpl in Java - by well known professors in algorithms.

Hopefully their is as good as ArmenT and jaspreet's code.

They have lot of algorithms which are usually covered in a second UG/beginning grad course implemented in these libraries.

Also found an axis paper of 2000 by Knuth called dancing links. Idea is simple and reduces constants in backtracking searches. Useful in tiling problems which we have been discussing in math dhaga. Since exact cover can be reduced to tiling, there are no known polynomial time algorithms. So even Knuth would not be able to solve these problems very fast.

SaiK · Post by **SaiK** » 17 Jun 2014 23:49

can that bit array be extended to bigger words? would it become inefficient bit space waste if all the combination does not lead to a lookup number?

context: I am looking at random words from dictionary.

Vayutuvan · Post by **Vayutuvan** » 18 Jun 2014 01:02

Dancing links paper by Knuth: Original paper (?) and some interesting applications of DLX (Dancing Links)

Dancing Links by Knuth (arXiv PDF) (other formats, eg. PS, also available).

There is a wiki article on Dancing Links as well. Look at Bipartite matching (Dulmhage-Mehndelsson decomposition which is used as a final step in one of the most used graph partitioning heuristics for sparse linear system direct solvers, parallelization by domain decomposition, and in data mining among other applications). Also Hypergraphs can be represented as bipartite graphs. Next reference by Savage talk about bipartite matching among others.

Vayutuvan · Post by **Vayutuvan** » 18 Jun 2014 01:29

Fred Savage's Algorithms is available online along with Java implementations and data sets.

Might be of interest to pandyan and vina.

Finding negative cost cycles, eg. currency arbitrage or other kinds of arbitrage. Two points to remember, namely (1) To solve real life stock market arbitrage problems such simple algorithms will not do as the input data is dynamic and statistical in nature with the distributions or known, partially known, or completely unknown other than a few sample points and (2) These can be used as the starting point with several other components and can act as the kernels for solving these problems fast (provided formulating these problems in this framework even makes sense).

Algorithms by Fred Savage

Vayutuvan · Post by **Vayutuvan** » 18 Jun 2014 01:36

Python library public domain (do as you please but no warranties) code by David Eppstein (well known algorithms prof at UC Irvine).

PADS
public domain algorithms (in Python) by David Eppstein
http://www.ics.uci.edu/~eppstein/PADS/
[includes implementation of Edmonds nonbipartite matching algorithm
and Hopcroft-Karp for bipartite matching]

Enjoy.

SaiK · Post by **SaiK** » 18 Jun 2014 02:17

now you guys are making me learn snake language.

ArmenT · Post by **ArmenT** » 18 Jun 2014 21:28

matrimc wrote:ArmenT: that bit array trick is nice. But of course the space grows exponentially on the length of the strings.

Yep, that's why I said it was a "specific algorithm"

. I'm aware that memory grows exponentially as the # of letters in the word increases, but for 4 letters, it does stay tolerably small! However, it must be noted that since it doesn't store the actual words, if you compare it to a data-structure that actually does store the words to lookup, then it *may* actually be more memory efficient as well (subject to certain conditions!):
1. Assuming C style string storage, 4 letter words take up 5 bytes (4 for the letters, 1 for the ASCII NUL byte (\0)). Other languages use other ways to store strings (e.g. store length at the beginning), but they generally take up more memory as well.
2. If you have to store 11425 four letter words or more, then the bit array becomes more efficient (because, space used for 11425 words is 11425 *5 = 57125 which is greater than 57122 bytes that the bit array takes)
3. This is assuming that the data structure has no other storage requirements. For a sorted array, this is exactly the # of bytes required, but for a hash structure or a PATRICIA trie, they may have additional pointer variables and such, so the actual number of words that they can store before they start to exceed a bit array's memory requirements becomes even less.

SaiK wrote:can that bit array be extended to bigger words? would it become inefficient bit space waste if all the combination does not lead to a lookup number?

Well, it could, but the memory requirements start rising rapidly as the length of the words increase. By the way, there is no combination that does not lead to a lookup number, as the bits cover every possible combination of letters. All it does is tell us if such a word exists in the dictionary or not (1 if it exists, 0 if not). Of course, there will be a lot more 0 bits than 1 bits because of how words are in the English language. Granted, there is a bit of wasted space, but as I pointed out above, once the # of words to lookup goes past a certain number, the bit array may actually become more memory-efficient.

On the other hand, I wouldn't give a pass to the humble array so easily. For instance, if you have a requirement to keep track of social security numbers instead of four letter words. Social security numbers are numeric and 9 digits long. If you had to keep a table of all the social security numbers assigned in the US, the bit array technique will possibly be more memory efficient and speed efficient compared to other TFTA data structures (e.g. you need only < 125 MB to store every possible social security # if using a bit array. Now compare that to a hash table, PATRICIA tree or whatever and see how many numbers you can store before you exceed that size). Lookups will also be a lot faster, as the index calculation is simply a couple of bit shifts and a subtraction

. Of course, as I noted before, this is a special case algorithm to solve a specific problem.

Vayutuvan · Post by **Vayutuvan** » 18 Jun 2014 21:53

ArmenT: I would not dismiss your special case algorithmic trick that quickly. As part of a solution to a complete problem it could be important. Please see Knuth's DLX paper above for how he is able to reduce constants for lots of different problems.

---

Here one trick I don't know whether people have seen or not.

There is an array A of length n which has two parts - A1 if length n1 and A2 of length n2. A1 is in the beginning starting at 0 and A2 right after A1 abutting it. How much extra space is required to interchange the positions of the two parts? Iow, we need A to be A2 followed by A1. We still need to maintain efficiency, I.e. O(n) asymptotic time complexity.

Amber G. · Post by **Amber G.** » 20 Jun 2014 00:07

Okay, here is something people might like to try for fun... or share their results..

WC 2014 has 32 teams. Some of the results we know... let us see if we can make a model and predict results. Since there are many permutations/combinations
one will need a computer to run the simulation.

Here is my model ...
It is simple, but I think it is as good as anything else I have seen.

First stage:

I assigned a vector (3 dimensional - three components ) for each team. The components are Midfield (or overall), defense, attack. (Let us call it M,D,A)
(To assign these value needs some thinking, I took the values (or comments from experts) from betting market and assigned a value for each team)

Second stage:

I developed a simple vector function (matrix) which takes these values and for Win, Draw, loose.

(Again, simply speaking, I have three scalar functions (or one vector) such as
W(M1,D1,A1;M2,D2,A2) is the prob. of win of team 1 (with values M1D1A1) over team 2 (M2,D2,A2)

(Again my model was simple - One can make any nice function from best guess)

(Again you may like to have even a scalar function using your best guess to assign W,L,D prob)

Third stage:

Run simulation of games (use the WC schedule for pairings) about few thousand times...

See what the result produces..

I have made it so it is interactive (that is I put in the results of the games already played)...
I modify (M/A/D) of the team (from what I can translate from reading expert views - The sample size is too small to use historic data)

And I run the simulation... which is relatively simple,.

(I know very similar algorithm(s) are run by betting houses etc...or college games (where the sample data is large enough to judge the accuracy of the model)

I keep it simple...

My vector has only three dim. ( But I guess even 1 dim - scalar- function may be good enough)

For vector function, (or Tensor - to use correct terminology) - one can check, say Stress tensor (http://en.wikipedia.org/wiki/Cauchy_stress_tensor
type analysis, if one is not familiar with, how such functions are used,.

If there is interest... I will put a running score ...

For example, the function predicts (Uru- Eng) game going on right now..

England win - 37%, Loss 34% Draw 29% (This is essentially close to most people say)

And at present, the model predicts about 1 in 3 chances that England will qualify for the next round. (again this is close to what most will say)

Edited Later: (Almost when the game URU-ENG has finished - URU is winning 2-1) (Looking at the game, the prob. function looks quite ok)

ArmenT · Post by **ArmenT** » 20 Jun 2014 11:20

matrimc wrote: Here one trick I don't know whether people have seen or not.

There is an array A of length n which has two parts - A1 if length n1 and A2 of length n2. A1 is in the beginning starting at 0 and A2 right after A1 abutting it. How much extra space is required to interchange the positions of the two parts? Iow, we need A to be A2 followed by A1. We still need to maintain efficiency, I.e. O(n) asymptotic time complexity.

My naive guess would be we need abs(n1 - n2) + 1, at least to be reasonably efficient (possibly a couple more vars to keep track of head and tail, especially if we're using a circular buffer as the temp working location). It can be done with just 1 extra variable, but will then require some extra linear scans to shift stuff downwards.

Rahul Mehta · Post by **Rahul Mehta** » 20 Jun 2014 12:28

In a town , there are 2N + 1 persons.

Each one is either a liar or a truthful

A truthful will always speak truth, while a liar will speak lie or truth.

A truthful will never conspire with liar or truthful. While two liars may conspire

Each one in town knows who is liar and who is truthful.

Now you are entering the town, and you have no information on who is liar and who is truthful.

You can ask any number of questions to anyone. But only question you are allowed to ask A is "is B truthful?" And his answer will be YES or NO.

Assume worst case scenario. Then whats the minimum number of questions you need to ask to know who is truthful and who is liar?.

Rahul Mehta · Post by **Rahul Mehta** » 20 Jun 2014 12:32

Puzzle you know :

1. A town had 100 couples and no singles.

2. Every wife knows whose husband is a cheater, but doesnt know about his own husband,

3. A priest came to town and told all wives that "at least one husband is cheating on you".

4. On 10th day, 10 wives killed their respective 10 husbands.

You know how they deduced.

======

New puzzle :

1. to 3. same

4. Next day, all 100 wives killed their respective 100 husbands.

How did they deduced the "fact"?

Amber G. · Post by **Amber G.** » 20 Jun 2014 21:33

Hello Rahulji,

These (or very similar) questions are some what well known (and may have been appeared in BRF's math dhaga) but interesting nevertheless.

But born in a country which gave the concept of ZERO to the world, I find that there may be some sloppiness in presenting the question(s)..

For example, what if there is ONLY one person in the village? (2N+1=1 when N=0).. You DO NOT say that the population is more than 1.
(In that case "is B truthful" may not make any sense - and no matter how many question(s) one asks, one can not know)

Also, what if number of truth-tellers is ZERO. (All are liars - your question does not rule that condition out)...Again in this case, one can not find out for sure, no matter how many questions one asks

(Interesting similar problem exists, in your formation of the problem, when number of liars = 0)

***

Second problem is very will known. but again here it would have been nice if the question was asked in some-what non-sloppy way...

For example ..
There is NO logical reason to assume that priest is telling the truth. (And assuming that ALL people will believe he is telling the truth)...

NO logical reason to assume that a spouse has no option but to kill his/her spouse the VERY day she can deduce it from logic. (In fact it is incredible to believe a person who understands mathematical concept of Induction will be so dumb that she does not have a clue of what is happening ..)

(This problem has appeared MANY MANY times, I have never seen described in such a sloppy-way )

Regards.

Vayutuvan · Post by **Vayutuvan** » 21 Jun 2014 01:24

RM's couples problem is also known as mathematicians wearing black or white hats. By the way I have a nice handbook on Induction. Several problems with solutions on the application on induction. Will post if I find some interesting problems (algorithmic in nature - here, otherwise in Math dhaga).

Vayutuvan · Post by **Vayutuvan** » 21 Jun 2014 01:32

ArmenT: You'r answer of requireing only one extra variable is correct several linear scans are not required.

Hint1:Assume there is subroutine to reverse which uses a call to swap which does swapping using one extra temporary variable.

Hint2:[Use the following identities on reverse function

1. R(A1+X+A2) = R(A2) + R(X) + R(A1)
2. R(R(Ax)) = Ax

where + is a concat operation.

Eric Demopheles · Post by **Eric Demopheles** » 22 Jun 2014 11:46

I have actually a question here. What is the most useful language for scientific computing for a casual user of linux who is comfortable with the command line?

I mean a language for general programming tasks too; not something like PARI/GP or SAGE.

I am leaning towards Python, but would love to hear other recommendations. How long it would remain available in the long term, is also a consideration.

ArmenT · Post by **ArmenT** » 22 Jun 2014 12:41

matrimc wrote:ArmenT: You'r answer of requireing only one extra variable is correct several linear scans are not required.

Hint1:Assume there is subroutine to reverse which uses a call to swap which does swapping using one extra temporary variable.

Hint2:[Use the following identities on reverse function

1. R(A1+X+A2) = R(A2) + R(X) + R(A1)
2. R(R(Ax)) = Ax

where + is a concat operation.

Yep, that's exactly what I came up with too. By the way, I said "extra" scans, not "several". Basically, I wasn't sure if making a second pass to reverse the set to the final result counts as O(N) or O(2N), strictly speaking

. Using a ring buffer only makes a single pass down the set.

Vayutuvan · Post by **Vayutuvan** » 23 Jun 2014 05:07

ArmenT: I figured you came up with the answer. By the way this works even if there is a non-empty subarray in the middle of A1 and A2. Big O notation hides the constant so any number of constant scans k are OK in that they are all O(n) since all of them are O(n) still.

I have an example of an application for your bit array representation in Bioinformatics which otherwise might not do very well with hashing as there could be too many collisions. I don't know that would be an contrived example or not. Let me think about it a little to see how can out it forward.

Vayutuvan · Post by **Vayutuvan** » 23 Jun 2014 05:20

Eric Demopheles wrote:I have actually a question here. What is the most useful language for scientific computing for a casual user of linux who is comfortable with the command line?

I mean a language for general programming tasks too; not something like PARI/GP or SAGE.

I am leaning towards Python, but would love to hear other recommendations. How long it would remain available in the long term, is also a consideration.

EricD:

Python is certainly very popular but not necessarily all that great (from a computer language design perspective).

I am curious as to why not sage(math). If you are leaning towards Python, then sage is a perfect fit (especially so if my guess that you are an Algebraist (Geometry or Topology) is correct). It may be worth while to post your requirements to sage support list and see if they have suggestions as to what you want is there already in sage or how you can quickly put together what you want from their stock packages plus any third party packages. If your interest is in solving moderate sized problems (numerically as opposed to symbolically) one option could be sage + numpy/scipy. But large problems would require your own algorithms of course which you would be able to hook into sage (or even plain Python) after writing them in C/C++ which is my favorite for large scale scientific programs - former for algorithms which are written by a small group of people and the latter for applications/products which solve a set of general problems in some scientific area.

FWIW

SaiK · Post by **SaiK** » 23 Jun 2014 07:48

in comparison c/c++, how does sage perform? let us say the same algorithm written in sage or in c/c++.

Vayutuvan · Post by **Vayutuvan** » 23 Jun 2014 08:46

sage has internal timing and they from time to time do publish timings for some small instances of problems. But "algorithm" for computing what?

Several algebraic algorithms are exponential or O(n^k) where k is large. I will post some links from mathoverflow (or is it CS overflow, I forget) where there was lengthy discussion on what algorithms are there which are polynomial but of high degree.

Eric Demopheles · Post by **Eric Demopheles** » 23 Jun 2014 08:48

matrimc wrote:
Python is certainly very popular but not necessarily all that great (from a computer language design perspective).

I am curious as to why not sage(math).

Sage is enough for math purposes. The desire for a general programming language has other origins.

The reason is for job market purposes. You know how academia job-safety is during early career. The tension is quite hair-graying.

It takes some weight off the mind to know that there is a viable backup option to find employment in the software industry. But also I don't want to go too far away and I want to be able to return to academia at the earliest -- I wanted to conserve brain and not get too far into software engineering proper. So I was asking about python, which is linked tightly to SAGE, to google, etc.. As of now, I can "if necessary" write small scripts in python. The programming language I know truly and properly is C, which is unfortunately too cumbersome outside systems programming.

ArmenT · Post by **ArmenT** » 23 Jun 2014 09:40

^^^
Interestingly, we use C quite a bit at where I work and I'm no systems engineer. We also use python a fair bit, but mostly as a glue language and to write quick management scripts and hadoop jobs.

Eric Demopheles · Post by **Eric Demopheles** » 23 Jun 2014 11:03

ArmenT wrote:^^^
Interestingly, we use C quite a bit at where I work and I'm no systems engineer. We also use python a fair bit, but mostly as a glue language and to write quick management scripts and hadoop jobs.

The thing is that, for simple things, there is a lot to write in C that can be done in a couple of lines in python.

Examples:

1. You want to read a file consisting of a list of names, and sort these names into an array.

2. You want to connect to gmail and send an e-mail with certain content.

3. You want to take a list of names and group them into random pairs.

Etc.

It is my humble opinion that these were all much easier to do in python, despite any performance hits.

ArmenT · Post by **ArmenT** » 23 Jun 2014 12:26

Absolutely sir. We use python and perl for a lot of tasks ourselves and only use C or java for the higher performance stuff.

Rahul Mehta · Post by **Rahul Mehta** » 23 Jun 2014 12:33

The 10 Hardest Logic Puzzles Ever Created

http://www.conceptispuzzles.com/index.a ... rticle/424

2. The Hardest Logic Puzzle Ever

Three gods A, B, and C are called, in no particular order, True, False, and Random. True always speaks truly, False always speaks falsely, but whether Random speaks truly or falsely is a completely random matter. Your task is to determine the identities of A, B, and C by asking three yes-no questions; each question must be put to exactly one god. The gods understand English, but will answer all questions in their own language, in which the words for yes and no are da and ja, in some order. You do not know which word means which.

American philosopher and logician George Boolos described the above riddle which was devised by Raymond Smullyan and published it in the Harvard Review of Philosophy in 1996. Boolos called it "The Hardest Logic Puzzle Ever". The original article can be downloaded here. You can read about making this puzzle even harder on the Physics arXiv Blog.