Wizard
Administrator
Wizard
  • Threads: 1491
  • Posts: 26435
Joined: Oct 14, 2009
October 11th, 2010 at 4:44:57 PM permalink
I saw the movie The Social Network about a week ago. It was quite good, I might add. As those who saw the movie will know, while still a Harvard student, Facebook creator Mark Zuckerberg created a site in which he posted pairs of pictures of female Harvard students and asked student to rank which one was more attractive. The object, I think, was to give each girl a rating, as judged by the users of the site.

It got me to thinking. Suppose you had a x such pictures, and wished to sort them in attractiveness order yourself. What would be the most efficient strategy? You are limited to comparing two pictures at a time. I'm open to how "efficient" should be defined, but I would suggest minimizing either expected matches required, or maximum matches.

I'll hold off on posting my strategy for the moment. However the following table will show the minimum and maximum number of pairings it would produce.

Faces Min matches Max matches
2 1 1
4 5 5
8 15 20
16 49 76
32 169 288
64 609 1104
128 2273 4288
256 8705 16832
512 33921 66560
1024 133633 264448
2048 529921 1053696
4096 2109441 4205568
8192 8415233 16801792
16384 33611777 67162112
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
October 11th, 2010 at 5:01:38 PM permalink
I would expect that in this type of subjective ranking/sorting, the issue of consistency would be more problematic than the issue of efficiency. By this I mean that if I rank A higher than B and B higher than C, I am still not certain that I would necessarily rank A higher than C in a direct comparison.

Edit: Or that I would necessarily rank A higher than B if I repeated the comparison.
larwiz1
larwiz1
  • Threads: 15
  • Posts: 58
Joined: Sep 16, 2010
October 11th, 2010 at 5:38:50 PM permalink
I may not understand what is required, but looking at your table.

Suppose we have 3 (A,B,C)-I would compare AB and put them in order. I would then compare C to the best (most attractive), assuming it was A, if C is more attractive then I would have my minimum matches (2), with CAB order, if C was not more attractive then I would compare CB an put them in order for a max match number of 3.

Now for four (A,B,C,D). I would take the same strategy. Compare AB, put them in order and compare C to the best (assume A), if C most attractive, then my order is CAB (at this point), I then would compare D to C, If D more attactive my work is done with a minimum 3 matches and a DCAB order. If D is not more attractive I would compare D to A, if more attractive then my work is done (4 matches), with an order of CDAB, if not then a 5th match will determine the DB order.

I would do the same strategy with more faces. It seems it is possible that the minimum could be the number of faces minus 1, however improbable.
MathExtremist
MathExtremist
  • Threads: 88
  • Posts: 6526
Joined: Aug 31, 2010
October 11th, 2010 at 7:32:30 PM permalink
Quote: Wizard


It got me to thinking. Suppose you had a x such pictures, and wished to sort them in attractiveness order yourself. What would be the most efficient strategy? You are limited to comparing two pictures at a time. I'm open to how "efficient" should be defined, but I would suggest minimizing either expected matches required, or maximum matches.



You're asking about sorting algorithms and computational complexity, two fundamental topics in computer science. Start here:

http://en.wikipedia.org/wiki/Sorting_algorithm

"Efficiency" depends on several factors like whether you care more about runtime or storage space, or what you know about your data (e.g. is it nearly-sorted or very out-of-order).

More info: the presumption in your question is that you have a collection of N objects and a binary-comparison function f(a, b) which returns >0 if a > b, 0 if a == b, and <0 if a < b. If you don't have that, for example you sometimes rank a > b and other times a < b (entirely possible when dealing with "attractiveness"), then it's a different problem that doesn't map well to existing algorithms.
"In my own case, when it seemed to me after a long illness that death was close at hand, I found no little solace in playing constantly at dice." -- Girolamo Cardano, 1563
FinsRule
FinsRule
  • Threads: 128
  • Posts: 3914
Joined: Dec 23, 2009
October 11th, 2010 at 7:50:57 PM permalink
This is only slightly related to the original question, but, I believe that attractiveness is normally distributed. If average attractiveness is a 5 out of 10, I believe the standard deviation is 1.5. I think 95% of people fall into the 2-8 category. Meaning that only 1 in 40 women would really be considered a 9 or a 10.

My wife asked me what she was. I told her that of course she's a 10.
Wizard
Administrator
Wizard
  • Threads: 1491
  • Posts: 26435
Joined: Oct 14, 2009
October 11th, 2010 at 8:22:27 PM permalink
Quote: MathExtremist

You're asking about sorting algorithms and computational complexity, two fundamental topics in computer science. Start here:

http://en.wikipedia.org/wiki/Sorting_algorithm...



Thanks, good answer. I was trying to do something like a merge sort, but my method was not optimal. I'll need some more time, but hope to answer my own question tomorrow.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
October 11th, 2010 at 8:36:55 PM permalink
Quote: MathExtremist

... the presumption in your question is that you have a collection of N objects and a binary-comparison function f(a, b) which returns >0 if a > b, 0 if a == b, and <0 if a < b. If you don't have that, for example you sometimes rank a > b and other times a < b (entirely possible when dealing with "attractiveness"), then it's a different problem that doesn't map well to existing algorithms.

Hmmm. I think that just might be the mathematician's way of saying what I tried to convey about the issue of consistency in my earlier post. Or did I misunderstand once again?
mkl654321
mkl654321
  • Threads: 65
  • Posts: 3412
Joined: Aug 8, 2010
October 11th, 2010 at 9:23:46 PM permalink
This seems so obvious to me. And I can do it without getting all mathy:

Take two pictures. Rank them in order of attractiveness.

Take the third picture, and compare it to the bottom picture. If Pic 3 is lower in attractiveness than the bottom picture, put it at the bottom. If higher, compare it to the top picture, and place Pic 3 either in the middle or above the top picture, according to the results of the comparison. You now have three pictures in ranked order.

Take Pic 4 and compare it to the middle picture. This will tell you whether to move it up or down. In each case, compare it to the next picture in the relevant direction, and use that comparison to place Pic 4 either between the middle and that pic, or at the top or bottom. You now have four ranked pics.

Repeat this procedure with each successive pic, starting as close to the middle as possible. If a given comparison preserves the existing position of the new pic, leave that pic there. For example:

Seven pics have been ranked. Take Pic 8 and compare it to the fourth-ranked pic. If it is more attractive, compare it to the third-ranked pic. Repeat until the pic stops moving up, then place the pic in that position. Same procedure if the initial comparison shows the new pic to be less attractive than the middle pic; keep moving down until it stops, then place the new pic in that position.

No math necessary.
The fact that a believer is happier than a skeptic is no more to the point than the fact that a drunken man is happier than a sober one. The happiness of credulity is a cheap and dangerous quality.---George Bernard Shaw
Wizard
Administrator
Wizard
  • Threads: 1491
  • Posts: 26435
Joined: Oct 14, 2009
October 11th, 2010 at 10:00:52 PM permalink
Quote: mkl654321

This seems so obvious to me. And I can do it without getting all mathy...



That is an acceptable way to do it. It would be more efficient than the standard bubble sort. However, the question is can you prove it minimizes the expected or maximum number of comparisons?
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
mkl654321
mkl654321
  • Threads: 65
  • Posts: 3412
Joined: Aug 8, 2010
October 11th, 2010 at 10:39:44 PM permalink
Quote: Wizard

That is an acceptable way to do it. It would be more efficient than the standard bubble sort. However, the question is can you prove it minimizes the expected or maximum number of comparisons?



Not mathematically, but it would seem that making each new placement by comparing that picture with the current middle-ranking picture, and then moving up or down as needed, would minimize the number of comparisons, as any new entrant is most likely to wind up as near the middle of the existing group as possible (if there are eight existing ranked pictures, the new picture would be more likely to wind up #5 in the resultant group of nine than in any other position).

To put it another way, a comparison with the middle of the existing group gives the most information; comparing a new entrant with the top or the bottom ranking picture gives minimum information; so the optimal comparison method is to start in the middle.
The fact that a believer is happier than a skeptic is no more to the point than the fact that a drunken man is happier than a sober one. The happiness of credulity is a cheap and dangerous quality.---George Bernard Shaw
Wizard
Administrator
Wizard
  • Threads: 1491
  • Posts: 26435
Joined: Oct 14, 2009
October 11th, 2010 at 10:47:38 PM permalink
I've been trying to understand the quicksort and heapsort, but am not there yet. I'm all ears if there is a particular method you recommend that is easy to understand, and also minimizes the maximum number of comparisons.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
jackblack21
jackblack21
  • Threads: 5
  • Posts: 31
Joined: Oct 9, 2010
October 11th, 2010 at 11:10:18 PM permalink
Quote: mkl654321

Not mathematically, but it would seem that making each new placement by comparing that picture with the current middle-ranking picture, and then moving up or down as needed, would minimize the number of comparisons, as any new entrant is most likely to wind up as near the middle of the existing group as possible (if there are eight existing ranked pictures, the new picture would be more likely to wind up #5 in the resultant group of nine than in any other position).

To put it another way, a comparison with the middle of the existing group gives the most information; comparing a new entrant with the top or the bottom ranking picture gives minimum information; so the optimal comparison method is to start in the middle.



It's been a while since I did any programming, but this sounds correct to me. To clarify further, if you already have 100 pictures ranked then compare the new picture to #50. If it ranks higher, compare to #75. If it ranks lower, compare to #63, and so on. Awe heck, just listen to any redneck play the high-low game on the John Boy and Billy Show and you get the idea.
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
October 12th, 2010 at 4:57:42 AM permalink
Quote: Wizard

I've been trying to understand the quicksort and heapsort, but am not there yet.




You don't want any of the comparison based algorithms, they will not be optimal in your case. The lower boundary on the order of a comparison sort is n*log(n), but a good distribution algorithm is usually linear.
If your set of ratings is discreet (say, 1 to 10), you want counting sort. If it is not, then, a bucket sort is, probably, the best choice.
"When two people always agree one of them is unnecessary"
Wizard
Administrator
Wizard
  • Threads: 1491
  • Posts: 26435
Joined: Oct 14, 2009
October 12th, 2010 at 6:49:19 AM permalink
Quote: jackblack21

It's been a while since I did any programming, but this sounds correct to me. To clarify further, if you already have 100 pictures ranked then compare the new picture to #50. If it ranks higher, compare to #75. If it ranks lower, compare to #63, and so on. Awe heck, just listen to any redneck play the high-low game on the John Boy and Billy Show and you get the idea.



Okay, let's call this the mkl654321 sort. I show the maximum number of comparisons to sort 100 items is 573. This can be found as the sum for i=1 to 100 of roundup(log(i,2)), where log(i,2) is the log of i to the base 2. That is much more than the maximum of other sorts which should be 100*log(100,2)=664.39.

Note: Corrected. Function was previously incorrectly expressed as sum for i=1 to 100 of int(log(i,2)).
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
October 12th, 2010 at 6:57:09 AM permalink
Edit: delete.

I overlooked some intervening posts.
DorothyGale
DorothyGale
  • Threads: 40
  • Posts: 639
Joined: Nov 23, 2009
October 12th, 2010 at 7:22:18 AM permalink
Quote: Wizard

I've been trying to understand the quicksort and heapsort, but am not there yet. I'm all ears if there is a particular method you recommend that is easy to understand, and also minimizes the maximum number of comparisons.


Depending on the data, you may find that "bin sort" is the most efficient. For sorting integers, nothing comes close. And it has 0 comparisons.

--Dorothy
"Who would have thought a good little girl like you could destroy my beautiful wickedness!"
Wizard
Administrator
Wizard
  • Threads: 1491
  • Posts: 26435
Joined: Oct 14, 2009
October 12th, 2010 at 7:29:15 AM permalink
Quote: DorothyGale

Depending on the data, you may find that "bin sort" is the most efficient. For sorting integers, nothing comes close. And it has 0 comparisons.



Care to explain how it works? For example, how would you sort the Munchkins in order of height?
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
October 12th, 2010 at 7:31:51 AM permalink
Quote: Wizard

Okay, let's call this the mkl654321 sort. I show the maximum number of comparisons to sort 100 items is 573. This can be found as the sum for i=1 to 100 of int(log(i,2)), where log(i,2) is the log of i to the base 2. That is much more than the maximum of other sorts which should be 100*ln(100)=460.5.



Something is wrong with this estimate. Sum(log(i)) for i<=100 cannot be greater than 100*log(100), because it has 100 elements, each being less than log(100).


Also, remember, that n*log(n) is just an order of the algorithm, not an estimate of the actual number of operations It does not mean that you can actually achieve the result in that many comparisons (the key meaning of the order is not how quickly you can sort a given array, but rather how badly your speed will degrade, as the size of the array grows).. At the very least, there needs to be a constant factored in: C*n*log(n), that makes the base of the logarithm irrelevant among other things.

The performance of "Modified mkl's" sort (where you keep dividing halves into smaller halves instead of resorting to sequential comparison after looking at the middle element as in his original version) is actually inline with the other comparison-based algorithms, because it is proportional to n*log(n).
"When two people always agree one of them is unnecessary"
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
October 12th, 2010 at 7:35:27 AM permalink
Quote: Wizard

Quote: DorothyGale

Depending on the data, you may find that "bin sort" is the most efficient. For sorting integers, nothing comes close. And it has 0 comparisons.

Care to explain how it works? For example, how would you sort the Munchkins in order of height?



I think I sense a joke coming on. Maybe not.
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
October 12th, 2010 at 7:37:02 AM permalink
Quote: Wizard

Care to explain how it works? For example, how would you sort the Munchkins in order of height?


It's a simplification of a bucket/counting sort where you can also assume, that all values are unique (ideally, sequential integers).
It's not true that it requires 0 comparisons. Depending on the implementation and the properties of the data set (and on what exactly you call a "comparison"), you will need between N and k*N comparisons (k being the size of the value universe) - a standard figure for the "distribution sort" family.
"When two people always agree one of them is unnecessary"
Wizard
Administrator
Wizard
  • Threads: 1491
  • Posts: 26435
Joined: Oct 14, 2009
October 12th, 2010 at 7:51:14 AM permalink
Quote: weaselman

Something is wrong with this estimate. Sum(log(i)) for i<=100 cannot be greater than 100*log(100), because it has 100 elements, each being less than log(100).



You're right. I incorrectly wrote int(log(x,2)), when I should have wrote roundup(log(x,2)).
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
October 12th, 2010 at 7:52:41 AM permalink
Quote: Wizard

You're right. I incorrectly wrote int(log(x,2)), when I should have wrote roundup(log(x,2)).



Don't see how it matters. The are still all less than roundup(log(100)) :)
In fact, that's roughly how they usually arrive at n*log(n) for the order estimate for those "divide and conquer" algorithms - they are all roughly equivalent in performance - you have to deal with about log(n) partitions each of the n times.
"When two people always agree one of them is unnecessary"
Wizard
Administrator
Wizard
  • Threads: 1491
  • Posts: 26435
Joined: Oct 14, 2009
October 12th, 2010 at 7:54:03 AM permalink
Quote: weaselman

Don't see how it matters. The are still all less than roundup(log(100)) :)



Log to what base? It should be base 2.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
October 12th, 2010 at 7:56:54 AM permalink
The base doesn't matter (see my earlier post about it), but by way of how those estimates are calculated, it is indeed, usually 2.
"When two people always agree one of them is unnecessary"
DorothyGale
DorothyGale
  • Threads: 40
  • Posts: 639
Joined: Nov 23, 2009
October 12th, 2010 at 8:22:29 AM permalink
Quote: Wizard

Care to explain how it works? For example, how would you sort the Munchkins in order of height?


This is the old sort that used to be how punch cards with SSN's were sorted. Just picture a bunch of cards running through a sorting machine. They get dumped in bins 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 according to their 1's digit. Then the cards are gathered, the machine is set to the 10's digit and they get put through again. Then the cards are gathered, the machine is set to the 100's digit and they get put through again. Repeat until you've reached the 9'th digit for SSNs.

If there are N numbers and the largest has M digits, then the running time is N*M. Since M is bounded in all practical applications, this is O(N) and it kicks the butt of heapsort, mergesort and quicksort, and it has no comparisons.

This is trivial to explain with a piece of paper, sitting in front of you. It would take about 2 minutes and you would completely get it.

--Dorothy
"Who would have thought a good little girl like you could destroy my beautiful wickedness!"
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
October 12th, 2010 at 8:30:01 AM permalink
Quote: DorothyGale



If there are N numbers and the largest has M digits, then the running time is N*M. Since M is bounded in all practical applications, this is O(N) and it kicks the butt of heapsort, mergesort and quicksort, and it has no comparisons.



What you describe is called radix sort, not bin sort. "Looking at the digit" is a comparison. And there are (in case of SSN numbers) 4.5*N of them on average
"When two people always agree one of them is unnecessary"
DorothyGale
DorothyGale
  • Threads: 40
  • Posts: 639
Joined: Nov 23, 2009
October 12th, 2010 at 8:34:57 AM permalink
Quote: weaselman

What you describe is called radix sort, not bin sort. "Looking at the digit" is a comparison. And there are (in case of SSN numbers) 4.5*N of them on average


A comparison means to compare two values to be sorted and determine if they are <, =, or >. This is not a comparison in any sense. At no time in this sort are two data values compared.

For example, if one were to determine the frequency of a bunch of positive integers, x, y, z, etc... in an array, one would simply type freq[x]++, freq[y]++, freq[z]++ ... this is not a comparison in any sense. Any more than x++ is a comparison because it needs to know what x is before adding 1 to it.

Yes, it is "radix." It is also called "bucket sort." My bad. The main issue with this sort is not the time requirement, it is the space requirement.

#include <iostream>
using namespace std;
const int MAX = 100;

void fillArray(int a[], int size, int& numberUsed);

int bucketSort(int b[], int size);

void printArray(int c[], int size);

int main() {
int arraytosort[MAX], numberUsed;
cout << "This program reads in numbers and sorts them "
<< "using bucket sort.\n";
fillArray(arraytosort, MAX, numberUsed);
cout << "------------------------------------------" << endl;
printArray(arraytosort, numberUsed);
int howmany = bucketSort(arraytosort, numberUsed);
cout << "Number of steps to sort array: " << howmany << endl;
return 0;
}

void fillArray(int a[], int size, int& numberUsed) {
cout << "Enter up to " << size << " nonnegative whole numbers.\n"
<< "Mark the end of the list with a negative number.\n";
int next, index = 0;
cin >> next;
while ((next >= 0) && (index < size)) {
a[index] = next;
index++;
cin >> next;
}
numberUsed = index;
}

int bucketSort(int b[], int size) {
int steps=0;
int max=0;
int bucket[10][size];
int temp;
int depth[10]; // the depth of the bucket
for (int ctr = 0; ctr < size; ctr++) {
if (b[ctr] > max)
max = b[ctr];
steps++;
}

for (int power = 1; power <= max; power*=10) {
for (int ctr = 0; ctr < 10; ctr++) {
steps++;
depth[ctr]=0;
}
for (int ctr = 0; ctr < size; ctr++){
temp = (b[ctr]%(power*10))/power; // the bin.
steps++;
bucket[temp][depth[temp]++] = b[ctr];
steps++;
}// distribution pass
int index=0;
for (int ctr = 0; ctr < 10; ctr++){
for (int ctr1= 0; ctr1 < depth[ctr]; ctr1++){
b[index++] = bucket[ctr][ctr1];
steps++;
}
}//gathering pass
printArray(b, size);
}
return steps;
}

void printArray(int c[], int size) {
for (int ctr = 0; ctr < size; ctr++) {
cout << c[ctr] << " ";
}
cout << endl;
}





-Dorothy
"Who would have thought a good little girl like you could destroy my beautiful wickedness!"
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
October 12th, 2010 at 11:20:05 AM permalink
Quote: DorothyGale

A comparison means to compare two values to be sorted and determine if they are <, =, or >. This is not a comparison in any sense. At no time in this sort are two data values compared.



Well, you and me must be looking at different dictionaries :) In my dictionary checking if the first digit is a 1 is "comparison" ...


Quote:

Yes, it is "radix." It is also called "bucket sort."



No, not quite. Bucket sort is yet a different beast. Another variety of a distribution sort more suitable for use on continuous data set.
"When two people always agree one of them is unnecessary"
Doc
Doc
  • Threads: 46
  • Posts: 7287
Joined: Feb 27, 2010
October 12th, 2010 at 11:43:43 AM permalink
Quote: weaselman

Well, you and me must be looking at different dictionaries :) In my dictionary checking if the first digit is a 1 is "comparison" ...

Well, here's a partial defense of Dorothy: In the original problem that the Wizard presented, the comparisons were between two of the members of the population. In Dorothy's comments, none of the items are directly compared with each other (no comparisons, in at least one sense) and they are only observed to determine whether they individually have some characteristic (last digit "1"?, Yes or no?). I think this last "characteristic observation" could be related to looking at a picture and judging whether a girl is drop-dead gorgeous (Yes or no?), butt ugly (Yes or no?), red headed (Yes or no?), etc., as opposed to comparing/ranking one specific picture vs. another specific one.
Wizard
Administrator
Wizard
  • Threads: 1491
  • Posts: 26435
Joined: Oct 14, 2009
October 12th, 2010 at 12:11:09 PM permalink
Quote: weaselman

The base doesn't matter (see my earlier post about it), but by way of how those estimates are calculated, it is indeed, usually 2.



My sum was 573. 100*log(100,2) = 664.39. So I still don't see your point.
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
Wizard
Administrator
Wizard
  • Threads: 1491
  • Posts: 26435
Joined: Oct 14, 2009
October 12th, 2010 at 12:48:46 PM permalink
I submit for your consideration the sort that both minimizes maximum comparisons, and is easiest to explain is the merge' rel='nofollow' target='_blank'>http://www.sorting-algorithms.com/merge-sort]merge sort. Here is how it works.

  1. Divide entire list into groups of two. There may be a group of one left over.
  2. Sort all groups of 2 with the smallest element first.
  3. Merge two pairs into a single group. See below how to do a merge. There may be a singleton group left over.
  4. Keep merging pairs of groups together until there is just a single sorted group.


How to do a merge.

  1. You should start with two sorted groups, and an empty merged group. The sorted groups should have the smallest elements on top, and the largest on the bottom.
  2. Compare the top two members of each group. Move the smaller one to the merged group.
  3. Repeat step two, and put the next member next in line in the merged group.
  4. Keep repeating step 3 until one group is exhausted. Then move the rest of the other group to the end of of the merged group.


Let's look at an example. Consider the following list.

270
258
547
556
73
401
579
829
614


First break it into groups of two:

270,258
547,556
73,401
579,829
614

Next, sort each group. Only the first group needed to be reversed.

258,270
546,556
73,401
579,829
614

Next, merge the first two groups. The smallest members of each group are 258 and 546. Move the 258 into the merged group. Next compare the next two smallest members, 270 and 546. 546 is smallest so move that next in line in the merged group. There is nothing left in the first group, so the second group can be appended to the end of the merged group, which now looks like 258,270,546,556.

Repeat for the next two groups, which will result in 73,401,579,829.

614 is still a group unto itself.

Now merge the first two groups (258,270,546,556) and (73,401,579,829). This will result in 73,258,270,401,546,556,579,829.

Next merge the singleton group of 614 into the group of 8. That will result in 73,258,270,401,546,556,579,614,829.

The following table shows the maximum number of comparisons for various group sizes. Note that the maximum is always less than or equal to n*log(n,2), where n is the number of itmes, and log(n,2) is the log to the base 2 of n.

Items Maximum Comparisons n*log(n,2)
1 0 0
2 1 2
4 5 8
8 17 24
16 49 64
32 129 160
64 321 384
128 769 896
256 1,793 2,048
512 4,097 4,608
1,024 9,217 10,240
2,048 20,481 22,528
4,096 45,057 49,152
8,192 98,305 106,496
16,384 212,993 229,376
32,768 458,753 491,520
65,536 983,041 1,048,576
131,072 2,097,153 2,228,224
262,144 4,456,449 4,718,592
524,288 9,437,185 9,961,472
1,048,576 19,922,945 20,971,520
2,097,152 41,943,041 44,040,192
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
October 12th, 2010 at 12:57:29 PM permalink
Quote: Wizard

My sum was 573. 100*log(100,2) = 664.39. So I still don't see your point.


You said initially that the some was greater than the 100*log(100), and my point was that it could not be so, because every member of the sum was < log(100).
"When two people always agree one of them is unnecessary"
DorothyGale
DorothyGale
  • Threads: 40
  • Posts: 639
Joined: Nov 23, 2009
October 12th, 2010 at 1:02:26 PM permalink
Quote: weaselman

Well, you and me must be looking at different dictionaries :) In my dictionary checking if the first digit is a 1 is "comparison" ...

No, this is absolutely not the definition of "comparison" as used in asymptotic analysis of sorting algorithms. Please, I know you are a smart guy, and you certainly know your sorts ...

Radix Sort

"radix sorts have resurfaced as an alternative to other high performance comparison-based sorting algorithms (like heapsort and mergesort) that require Ω(n · log n) comparisons, where n is the number of items to be sorted. Comparison sorts can do no better than Ω(n · log n) execution time but offer the flexibility of being able to sort with respect to more complicated orderings than a lexicographic one; however, this ability is of little importance in many practical applications."

The issues with radix sorts are that they require lexicographic orderings and a lot of space. There are 0 comparisons.

--Dorothy
"Who would have thought a good little girl like you could destroy my beautiful wickedness!"
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
October 12th, 2010 at 1:06:31 PM permalink
Quote: Wizard

I submit for your consideration the sort that both minimizes maximum comparisons, and is easiest to explain



I don't see how it is easier to explain than the radix sort (what Dorothy suggested) or a counting sort (an even simpler variation of the same idea) ... That maybe subjective, but what's definitely true is that either of those requires fewer comparisons than merge sort (which is a great algorithm, the best choice in most real-life scenarios, btw, just not for this task).

Quote: DorothyGale

No, this is absolutely not the definition of "comparison" as used in asymptotic analysis of sorting algorithms. Please, I know you are a smart guy, and you certainly know your sorts ...




"Used by" who? Technically, the term "comparison" is not (should not) be used for assymptotic analysis of sorting algorithms at all, particularly, because of its ambiguity. A proper term, that's actually used for this king of thing is "operations".

The wikipedia article you quoted (hardly a huge authority on the proper terminology btw) seems to use the two terms interchangeably, which is usually fine in popular (and even professional) literature, as long as everybody understands what we are talking about.
"When two people always agree one of them is unnecessary"
Wizard
Administrator
Wizard
  • Threads: 1491
  • Posts: 26435
Joined: Oct 14, 2009
October 12th, 2010 at 2:32:12 PM permalink
Quote: weaselman

I don't see how it is easier to explain than the radix sort (what Dorothy suggested) or a counting sort (an even simpler variation of the same idea) ... That maybe subjective, but what's definitely true is that either of those requires fewer comparisons than merge sort (which is a great algorithm, the best choice in most real-life scenarios, btw, just not for this task).



Those sorts don't meet my condition of comparing two elements in the sample against each other.

Quote: weaselman

You said initially that the some was greater than the 100*log(100), and my point was that it could not be so, because every member of the sum was < log(100).



I was confused because the Wiki page on sorting algorithms didn't indicate the base in their table. I assumed it was base e. That is what I thought you're supposed to do when the base isn't specified. While I'm on the topic, it is one of my pet peeves when people just write "log" without indicating the base. Also, has anyone ever heard of the function lg for the log to the base 2?
"For with much wisdom comes much sorrow." -- Ecclesiastes 1:18 (NIV)
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
October 12th, 2010 at 2:42:35 PM permalink
Quote: Wizard

Those sorts don't meet my condition of comparing two elements in the sample against each other.


Ah, I missed that restriction.
In that case, yes, mergesort is the best option.
"When two people always agree one of them is unnecessary"
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
October 12th, 2010 at 5:39:53 PM permalink
Quote: Wizard


I was confused because the Wiki page on sorting algorithms didn't indicate the base in their table. I assumed it was base e. That is what I thought you're supposed to do when the base isn't specified. While I'm on the topic, it is one of my pet peeves when people just write "log" without indicating the base.



It is customary not to specify logarithmic base when talking about algorithm complexity, because either way it only has meaning in the sense that the amount of "work" required to fulfill the task is roughly proportional to the expression behind the O(), and because all logarithms are "proportional" to each other, it makes no difference specifying one base or the other.

Quote:

Also, has anyone ever heard of the function lg for the log to the base 2?


I think, lg usually means base 10.
"When two people always agree one of them is unnecessary"
MathExtremist
MathExtremist
  • Threads: 88
  • Posts: 6526
Joined: Aug 31, 2010
October 12th, 2010 at 6:26:13 PM permalink
Quote: mkl654321

This seems so obvious to me. And I can do it without getting all mathy:

Take two pictures. Rank them in order of attractiveness.

Take the third picture, and compare it to the bottom picture. If Pic 3 is lower in attractiveness than the bottom picture, put it at the bottom. If higher, compare it to the top picture, and place Pic 3 either in the middle or above the top picture, according to the results of the comparison. You now have three pictures in ranked order.

Take Pic 4 and compare it to the middle picture. This will tell you whether to move it up or down. In each case, compare it to the next picture in the relevant direction, and use that comparison to place Pic 4 either between the middle and that pic, or at the top or bottom. You now have four ranked pics.

Repeat this procedure with each successive pic, starting as close to the middle as possible. If a given comparison preserves the existing position of the new pic, leave that pic there. For example:

Seven pics have been ranked. Take Pic 8 and compare it to the fourth-ranked pic. If it is more attractive, compare it to the third-ranked pic. Repeat until the pic stops moving up, then place the pic in that position. Same procedure if the initial comparison shows the new pic to be less attractive than the middle pic; keep moving down until it stops, then place the new pic in that position.

No math necessary.



It's not math that's the issue, it's data structures. Typically, the goal is to sort an array of objects such that array[0] is the worst and array[n] is the best (or vice-versa). Unfortunately, there's no good way to insert into an array -- you have to move everything past the insert point down one spot. Consider a full row of seats in a theater. If you inserted me into seat 4, everyone in seats 4 .. n would have to move down one, respectively, to seats 5 .. n+1. That operation has a cost inside a computer, and that's what computational complexity is about -- measuring (and minimizing) that cost with more efficient algorithms and data structures.
"In my own case, when it seemed to me after a long illness that death was close at hand, I found no little solace in playing constantly at dice." -- Girolamo Cardano, 1563
MathExtremist
MathExtremist
  • Threads: 88
  • Posts: 6526
Joined: Aug 31, 2010
October 12th, 2010 at 6:30:29 PM permalink
Quote: weaselman

Well, you and me must be looking at different dictionaries :) In my dictionary checking if the first digit is a 1 is "comparison" ... .



Here, "comparison" means "binary comparison", that is f(x,y). Radix sorts work differently by using additional knowledge of the data (in this case a range between 1..9) beyond just greater-than or less-than. The reason that's important is that often you're traversing linked lists or arrays, and the only thing you can look at is the current and next elements. Beyond binary comparisons, you have to start writing ternary, quaternary (I love that word) or more-parametered functions to perform a sort "operation", and it turns out to reduce to multiple binary comparisons anyway. Inside assembly language there's just the binary CMP operator anyway, so whatever you do at a higher level basically doesn't matter.
"In my own case, when it seemed to me after a long illness that death was close at hand, I found no little solace in playing constantly at dice." -- Girolamo Cardano, 1563
MathExtremist
MathExtremist
  • Threads: 88
  • Posts: 6526
Joined: Aug 31, 2010
October 12th, 2010 at 6:40:03 PM permalink
Quote: Wizard

While I'm on the topic, it is one of my pet peeves when people just write "log" without indicating the base. Also, has anyone ever heard of the function lg for the log to the base 2?



Yes, as I learned it:
log = log_10
ln = log_e
lg = log_2

But then I looked it up on WikiPedia and it turns out that lg is used differently depending on what discipline you're in and that lg(n) for log_2(n) isn't ISO compliant. Oh well.
"In my own case, when it seemed to me after a long illness that death was close at hand, I found no little solace in playing constantly at dice." -- Girolamo Cardano, 1563
mkl654321
mkl654321
  • Threads: 65
  • Posts: 3412
Joined: Aug 8, 2010
October 12th, 2010 at 7:14:08 PM permalink
Quote: MathExtremist

It's not math that's the issue, it's data structures. Typically, the goal is to sort an array of objects such that array[0] is the worst and array[n] is the best (or vice-versa). Unfortunately, there's no good way to insert into an array -- you have to move everything past the insert point down one spot. Consider a full row of seats in a theater. If you inserted me into seat 4, everyone in seats 4 .. n would have to move down one, respectively, to seats 5 .. n+1. That operation has a cost inside a computer, and that's what computational complexity is about -- measuring (and minimizing) that cost with more efficient algorithms and data structures.



You might be saying the same thing as I am in a different way--computers/computational algorithms are actually pretty crappy tools for solving problems like these. Obviously, if you were doing this on a tabletop with actual pictures, you could easily shuffle the "array" around any way you wanted.

It's sort of like, "program a computer to make a grilled cheese sandwich". Well before you were past your 1000th line of code, and still wrestling with unforeseen complexities, your six year old son would be sitting in front on the TV with a grilled cheese sandwich and a glass of milk. Many relatively simple human algorithms don't translate very well to computational ones; for one thing, we prune our decision trees much more severely than computers do.
The fact that a believer is happier than a skeptic is no more to the point than the fact that a drunken man is happier than a sober one. The happiness of credulity is a cheap and dangerous quality.---George Bernard Shaw
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
October 12th, 2010 at 8:15:33 PM permalink
Quote: mkl654321

Obviously, if you were doing this on a tabletop with actual pictures, you could easily shuffle the "array" around any way you wanted.


I'd love to watch you sort 10000 pictures on your tabletop ...
"When two people always agree one of them is unnecessary"
MathExtremist
MathExtremist
  • Threads: 88
  • Posts: 6526
Joined: Aug 31, 2010
October 12th, 2010 at 9:32:28 PM permalink
Quote: mkl654321

It's sort of like, "program a computer to make a grilled cheese sandwich". Well before you were past your 1000th line of code, and still wrestling with unforeseen complexities, your six year old son would be sitting in front on the TV with a grilled cheese sandwich and a glass of milk. Many relatively simple human algorithms don't translate very well to computational ones; for one thing, we prune our decision trees much more severely than computers do.



There's a funny coincidence. In my first week of freshman-year computer science, the professor was describing what an algorithm was and was not. He used the example of "give me an algorithm for making a peanut-butter-and jelly sandwich". Then he called on students, who responded with "place the bread on a flat surface" or "put the peanut butter on the bread", etc. And then he reached for a piece of bread, held it against the wall, watched it fall to the ground when he let go, and put the jar of peanut butter on top of the bread as it lay on the floor.

The point, of course, is that a set of instructions is very different from an algorithm. In an algorithm, there is basically no context - the computer doesn't know anything other than what you tell it and what low-level operations it already knows how to do, so the scientist has to be specific enough to get the job done using only the tools at his/her disposal. You can't assume that "flat surface" means something like a cutting board that has previously been positioned horizontally on a countertop. The wall is also a flat surface. So when you say "compare these two pictures and tell me which picture is more attractive", I already know what attractive means to me but a computer doesn't unless you tell it how. Not that it hasn't been done:
Discover Magazine, 2000
"In my own case, when it seemed to me after a long illness that death was close at hand, I found no little solace in playing constantly at dice." -- Girolamo Cardano, 1563
jvg1981
jvg1981
  • Threads: 0
  • Posts: 2
Joined: Sep 12, 2010
November 4th, 2010 at 10:05:00 PM permalink
If all you have access to is a "black box" comparison function, then any sorting algorithm's maximum number of comparisons will be at least ⌈log2(n!)⌉ when sorting n elements.  This is because there are n! possible arrangements of the n elements, exactly one of which must be correct, and each comparison can only be guaranteed to cut the number possibilities by at most a factor of 2.  However, this bound isn't always achievable (e.g. ⌈log2(13!)⌉ = 33 but up to 34 comparisons can be required to sort a set of 13 elements).  Merge Sort is asymptotically good, but I doubt it always achieves optimality in terms of maximum number of comparisons needed.
  • Jump to: