Showing posts with label algorithm. Show all posts
Showing posts with label algorithm. Show all posts

Tuesday, 2 July 2013

Find the Median of a Large Set of Number

Task:
There are 10G disordered numbers, you need to find out the median. You have 2G bytes memory.

Solution:

The simplest method is sorting all numbers then find the middle item:

1  2  3  … 49  50  51  … 98  99  100

There are two defects:

1. Not necessary to sort all numbers
2. Occupies too much memory

Improved:

Create a data table which count the numbers in separated ranges:
(After first scan)

Range 1-20 21-40 41-60 61-80 81-100
Count 20 20 20 20 20

The median must in range [41-60] while the offset is 10 which means the 10th item in the range block is the median

Next, program will scan data set again, and only copy 20 ranged numbers into memory. Do a sort and grab the 10th item.

=======

Assume the numbers are unsigned 64bit integer. So each number occupies 8 bytes. 2G ram could handle no more than 256M numbers each time. The 10G number set could be separated into 40 groups.

Then, define several ranges from 0 to 2^64 -1, looks like arithmetic sequence:

( Pseudo code )

int64 a = 2^64 / 4096;

int64[4096] b ;

Array b statistics the volume of numbers hit in the range. In current code, each b in the array maps a range of 2^64 / 4096 = 2^52

For instance, if there is a number 68846541654532100.

68846541654532100 / (2^52) = 15.287

will lead to:

b[15]++;

 

The program do a initial process, read 256M numbers each time, for 40 main loops. In each loop, compare function will find out the range of each number and update the b[4096]. When loops done, we will get a full record b[4096].

Find the Median range:

int64 sum=0;
int64 prev=0;
int i;
for (i=0 ; i < 4096 ; i++)
{
  prev=sum;
  sum += b[i];
  if (sum >= 2^64 / 2)
    break;
}
int64 offset=(2^64 /2 – prev);
print( i );

Median must located in:
from  i*a  to  (i+1)*a-1
and must be the offset=(2^64 /2 – prev) item in the target block.

Program will scan 10G number set again, but only pick the numbers in the specific range and write them to file system.

IF the final file is less than 2G, sort the content, and pick the ‘offset’ item.

IF the final file is greater than 2G, separate it again as previous steps but smaller range. Do not forget to translate the offset.

Friday, 28 June 2013

Monty Hall problem (Three-Door problem)

http://en.wikipedia.org/wiki/Monty_Hall_problem

image

The most well known statement of the problem is:

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice? (Whitaker, 1990, as quoted by vos Savant 1990a)

Vos Savant's response was:

Yes; you should switch. The first door has a 1/3 chance of winning, but the second door has a 2/3 chance. Here’s a good way to visualize what happened. Suppose there are a million doors, and you pick door #1. Then the host, who knows what’s behind the doors and will always avoid the one with the prize, opens them all except door #777,777. You’d switch to that door pretty fast, wouldn’t you?

image

image

So the ratio between WIN vs LOSE should be 66.6 : 33.3

I spend some time to write a simulator to check. (code is a bit ugly but it works)

Game class:

/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package montyhallproblem;

/**
*
* @author celeron533
*/
public class Game {

public Door[] doors;

public Game() {
this.doors = new Door[]{new Door(), new Door(), new Door()};//create doors
doors[(int) (Math.random() * 3)].item = Item.car;//allocate the prize
}
public int playerInitial;

public void playerInitialSelection(int s) {
playerInitial = s;
}

public int hostOpenGoatDoor() {
if (doors[playerInitial].item == Item.car) {
//open one door of the rest
int open;
do {
open = (int) (Math.random() * 3);
} while (open == playerInitial);
doors[open].opened = true;
return open;
} else {//when player select a goat door, host open the left goat door
for (int i = 0; i < 3; i++) {
if (i == playerInitial) {//skip player selection
continue;
} else {
if (doors[i].item == Item.car) {//skip car door
continue;
} else {//open
doors[i].opened = true;
return i;
}
}
}
}
return -1;
}

public class Door {

public boolean opened = false;
public Item item = Item.goat;
}

public enum Item {
car, goat
}

@Override
public String toString() {
String result = "select_" + playerInitial + " ";
for (int i = 0; i < 3; i++) {
result += doors[i].item + "_" + doors[i].opened + " ";
}
return result;
}
}


Main class:


/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package montyhallproblem;

/**
*
* @author celeron533
*/
public class MontyHallProblem {

/**
* @param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
Game game;
int playerSelect, hostOpen;
int keepSelectionWins = 0;
int switchSelectionWins = 0;
for (int i = 0; i < 10000000; i++) {
game = new Game();
playerSelect = (int) (Math.random() * 3);
game.playerInitialSelection(playerSelect);
hostOpen = game.hostOpenGoatDoor();
//Statistical results
if (game.doors[playerSelect].item == Game.Item.car) {
keepSelectionWins++;
} else {
switchSelectionWins++;
}
//System.out.println(game.toString());
}
System.out.println("keepSelectionWins: " + keepSelectionWins);
System.out.println("switchSelectionWins:" + switchSelectionWins);
}
}


 


And the perfect result:


run:
keepSelectionWins: 3333349
switchSelectionWins:6666651
BUILD SUCCESSFUL (total time: 3 seconds)


image