rxwine
rxwine
  • Threads: 212
  • Posts: 12220
Joined: Feb 28, 2010
February 20th, 2011 at 1:57:59 AM permalink
Though not to be confused with life.

Let's take Watson and modify it just a bit.

Two things he has already

1. First we know he has some ability to make sense of language. But if needed, maybe you could add one of the better Turing Test programs that have been developed.
2. He can be loaded with plenty of data

Second: things we need to add:

3. Give Watson the ability to record time with events and keep a historical timeline. Shouldn't be too difficult with the internal clock and date. So it can tell you in response to question of what it was doing 2 hours ago. "I was in sleep mode." or John Programmer was asking me a question. Or, "I was playing Jeopardy two days ago." or even, I was unconscious (turned off) between 0100 and 0730 hrs.

4. Once it has a historical timeline, give it the ability to look at what it was doing in the past, to respond to suggestions about what it may or may not be doing in the future based on its past activities. "I will probably be answering questions tomorrow" or "I will be turned off." (Sunday weekend, it is shutdown)

5. Program-in to any question about consciousness, to respond in the affirmative. And if you tell Watson that his programmer has simply programmed his response to say he is conscious, perhaps he will respond that maybe your programmer has also programmed you to respond that you are conscious.

SO, is it easy to disprove consciousness here? What would be the criteria for doing so?
There's no secret. Just know what you're talking about before you open your mouth.
P90
P90
  • Threads: 12
  • Posts: 1703
Joined: Jan 8, 2011
February 20th, 2011 at 5:03:15 AM permalink
Quote: rxwine

1. First we know he has some ability to make sense of language. But if needed, maybe you could add one of the better Turing Test programs that have been developed.
2. He can be loaded with plenty of data
...
SO, is it easy to disprove consciousness here? What would be the criteria for doing so?


From what I understand, no computer even comes close to passing the Turing test. The best they can do is stall you for a few questions by being intentionally obtuse or imitating an illiterate. So, Turing test works so far.
Resist ANFO Boston PRISM Stormfront IRA Freedom CIA Obama
rxwine
rxwine
  • Threads: 212
  • Posts: 12220
Joined: Feb 28, 2010
February 20th, 2011 at 5:09:06 AM permalink
Quote: P90

From what I understand, no computer even comes close to passing the Turing test. The best they can do is stall you for a few questions by being intentionally obtuse or imitating an illiterate. So, Turing test works so far.



I believe the Turing test standard is not being able to distinuish a human from a computer. However, couldn't some form of consciousness meet a lower standard?
There's no secret. Just know what you're talking about before you open your mouth.
P90
P90
  • Threads: 12
  • Posts: 1703
Joined: Jan 8, 2011
February 20th, 2011 at 5:19:20 AM permalink
Quote: rxwine

I believe the Turing test standard is not being able to distinuish a human from a computer. However, couldn't some form of consciousness meet a lower standard?


A Turing test is perfectly capable of doing that. CAPTCHA is not, but CAPTCHA is designed first and foremost to be quick and easy to implement. The original and correct variant of the Turing test consists of chatting with a computer and a human, not knowing which is which, where the computer's goal is to imitate a human and the human's goal is to help the interrogator make the correct conclusion. No program so far even comes close to capable of doing that.

There are chatterbots that tell you "versatile answers" ("Why are you asking?") to questions they can't answer, but they could just as well be implemented using a simple Edison cylinder with a four-letter word and a three-letter word spelled a number of ways. That does not constitute passing the Turing test.

For simplicity, if we don't have a control subject, just one clause has to be added: the machine needs to imitate a literate, mentally valid, cooperative human. This isn't much of an addition, since again even a simple Edison cylinder can imitate an uncooperative human ("Talk to my attorney"), an illiterate ("No entiendo") or a mentally disabled ("Torr with moo-moo").
Resist ANFO Boston PRISM Stormfront IRA Freedom CIA Obama
rxwine
rxwine
  • Threads: 212
  • Posts: 12220
Joined: Feb 28, 2010
February 20th, 2011 at 5:37:10 AM permalink
For a lower standard, how about a 5 or 6 year old child? They can be uncooperative, and not know what is being asked. But consciousness. Yes. Yes?
There's no secret. Just know what you're talking about before you open your mouth.
Nareed
Nareed
  • Threads: 373
  • Posts: 11413
Joined: Nov 11, 2009
February 20th, 2011 at 6:38:18 AM permalink
Plug in the computer and tell it "do what you want." If it does nothing without instructions or programming, it's not conscious.
Donald Trump is a fucking criminal
P90
P90
  • Threads: 12
  • Posts: 1703
Joined: Jan 8, 2011
February 20th, 2011 at 6:55:59 AM permalink
Passing the Turing test is not proof of consciousness, only intelligence. Not passing the Turing test is an indication of lack of consciousness, but not the reverse.

As for cooperation, the original Turing's challenge was that there were two subjects, each trying to convince you he's human (in the computer detection variant). To put it into a simple sci-fi scenario, there are a human and his evil robotic twin before you, and to kill the evil robot and not the human, you question them to tell which is which. It's quite clear that a human answering "have you talked about it with your therapist?" to your questions would not live long - so programs that claim to pass the test only do so by inventing a completely different test to suit what they can pass.

To come close to a 5-6 year old, I think that could be possible already, just a lot harder than the simple programs employed. Jerry Logan, perhaps a decade down the road. But properly passing the test, that's much harder.



By the way, while on the subject of Watson - most people have forgotten it by now (seeing how impressed they were by Watson), but about two decades ago text parsers were the standard means of input in computer games. You would ask the computer questions and it would respond, tell him what you want to do and your character would do it. It was not even a novelty, but the norm, and it was all done by typing in plain English. Admittedly, such parsers were far from perfect and not fond of complex sentences, but games these days were written by two guys in a shed, all their design, coding, writing, visuals and audio done by the same 1-2 people, and parser coding wasn't always on top of the list.

I'm quite confident that a computer beating Jeopardy could already be built back then, two decades ago. It would take more human effort to do the coding, having less brute force, but it could be done. You don't need an internet dump on it, just enter a good encyclopedia and a few lists of trivia like winning sports teams that shows like to ask about so much. Even on these computers, it would still beat humans to the button and to the answer percentage. You would just have to use real programming, as taught in pre-java colleges, instead of new-agey million answers cloud fuzzing that seemed to do more to showcase how much power the computer has to waste than to actually solve the questions.
Resist ANFO Boston PRISM Stormfront IRA Freedom CIA Obama
Nareed
Nareed
  • Threads: 373
  • Posts: 11413
Joined: Nov 11, 2009
February 20th, 2011 at 7:43:02 AM permalink
Quote: P90

By the way, while on the subject of Watson - most people have forgotten it by now (seeing how impressed they were by Watson), but about two decades ago text parsers were the standard means of input in computer games. You would ask the computer questions and it would respond, tell him what you want to do and your character would do it.



Hardly. Text games, like Zork and just about everything else Infocom ever did, employed a list of text commands the game understood and carried out. You used terms like "up," "north," "back" for directions and "take lamp," or "use axe on mirror" for actions.

But the game responded with phrases, not just statements. If you referred to something not on the scene, it would say "hit grue with hammer," it would say "there's no grue here." if you referred to something not on the game, it would say "I don't know what a slot machine is." If you wanted to go in a direction that was not available, it would say "You can't go there," or "Down is more likely."

So it gave the impression it could respond to language, but it didn't really.

Oh, if you used curse words it would chastise you. I've heard of a game that would get offended and refuse to play if you said the f word three times.
Donald Trump is a fucking criminal
P90
P90
  • Threads: 12
  • Posts: 1703
Joined: Jan 8, 2011
February 20th, 2011 at 8:08:25 AM permalink
Quote: Nareed

Hardly. Text games, like Zork and just about everything else Infocom ever did, employed a list of text commands the game understood and carried out. You used terms like "up," "north," "back" for directions and "take lamp," or "use axe on mirror" for actions.


Well, Zork is not exactly the be all end all game. It's 1979 - three decades ago, not two. A decade later, games had better parsing capabilities. You could use more natural language, like "Fill the bottle with water" or "Pour some water into the bottle" producing the same result. Not always, but in some. That is text parsing, not just command input (as you would do in a compiler).


Quote: Nareed

But the game responded with phrases, not just statements. If you referred to something not on the scene, it would say "hit grue with hammer," it would say "there's no grue here." if you referred to something not on the game, it would say "I don't know what a slot machine is." If you wanted to go in a direction that was not available, it would say "You can't go there," or "Down is more likely."
So it gave the impression it could respond to language, but it didn't really.


However, neither does ELIZA or other quasi-Turing-passing chatbots. They just look for any familiar word and churn out a tangentially relevant phrase from their list, or a versatile non-reply otherwise. Neither for that matter does Watson, it just delivers a more relevant word instead of a generic phrase.

If anything, game text parsers are to be placed a lot higher than chatbots on the Turing scale, because they actually at least partially understood you most of the time, not just stalled like chatbots. Computer quasi-intelligence is not a processing power problem, it's a coding problem.


P.S. Which is a sad thing, since unlike processing power, programming skills are declining, which makes it that much less likely that someday I'll be able to tell my computer "Shut up and do what you're told!" and have it understand this enormously complex request instead of popping out another "are you sure?" question box. I just hope there is a special circle of Hell reserved for programmers adding unnecessary question boxes to programs, where they have to give a hundred confirmations each time they want to take a leak.
Resist ANFO Boston PRISM Stormfront IRA Freedom CIA Obama
progrocker
progrocker
  • Threads: 4
  • Posts: 303
Joined: Feb 21, 2010
February 20th, 2011 at 8:41:23 AM permalink
I'm not saying I agree with it, but during the courses for my minor in philosophy Searle's 'Chinese Room Argument' was the most common thing brought up to refute the possibility of strong AI.

http://plato.stanford.edu/entries/chinese-room/

If I agree with anything from him, I would say that passing a Turing test alone would not prove consciousness. I would like to see the AI at least make some moral decisions and rationalize them before admitting he may be 'alive'.
Solo venimos, solo nos vamos. Y aqui nos juntamos, juntos que estamos.
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
February 20th, 2011 at 8:58:44 AM permalink
Quote: P90


I'm quite confident that a computer beating Jeopardy could already be built back then, two decades ago. It would take more human effort to do the coding, having less brute force, but it could be done. You don't need an internet dump on it, just enter a good encyclopedia and a few lists of trivia like winning sports teams that shows like to ask about so much. Even on these computers, it would still beat humans to the button and to the answer percentage. You would just have to use real programming, as taught in pre-java colleges, instead of new-agey million answers cloud fuzzing that seemed to do more to showcase how much power the computer has to waste than to actually solve the questions.


No, there was simply no technology back then to even put enough memory into a computer to store all the goodies (32-bit cpus could only address up to 4 gig of memory). But, even if there was, it would take days if not months for a twenty-year-old computer to find and retrieve the correct answer.

Parsing (and "understanding") the question is actually quite trivial (and it already was 20 years ago for those games you mentioned), especially, in English (some languages are much harder in this respect, notably, Russian or Polish). It is finding the answer, that is the real achievement in Watson.

BTW, I was quite disappointed they did not use speech recognition, and settled on feeding the clues to Watson electronically. I think, it would be way more impressive if Watson could actually hear and understand the questions on its own. It would also eliminate the suggestion that he had an advantage because he knew exactly when it was ok to buzz.


Also, unlike many modern "techniques" in the Computer Science, the "cloud fuzzing" stuff is the real thing. If you have two computers, you can use them to solve problems twice as fast as you could do with one. If you think that is not impressive, try that with people.
"When two people always agree one of them is unnecessary"
P90
P90
  • Threads: 12
  • Posts: 1703
Joined: Jan 8, 2011
February 20th, 2011 at 9:45:37 AM permalink
Quote: weaselman

No, there was simply no technology back then to even put enough memory into a computer to store all the goodies (32-bit cpus could only address up to 4 gig of memory).


You only need 20-40 megabytes to store a good encyclopedia. It would of course take gigabytes to store the sum of useful human knowledge (Backdoor Sluts Nine in full HD is not), but you don't need its entirety, just encyclopedic info and some trivia. 120-160 MB should do it. That is just two period low-cost consumer hard drives.

Even 120-160 MB is actually overkill that would be needed if you are entering only mildly cleaned OCR-ed books. A large book takes 450-800 kilobytes, in 8-bit format (6-bit or better 5-bit with number and capital prefixes would be much more appropriate for the task) and 200-250 books is more than most people read in their entire life. A purpose-written database would take somewhere closer to 20-40 megabytes, but as of 1991 it would be cheaper to add more power than to hire people to purpose-write a database.

Forget 32 bits; I'm talking about 16-bit computing. A cheap 1990 desktop, which were still called desktops because they could actually fit on a desk and even have a monitor on top, is all it takes takes to beat even a well-read human in terms of encyclopedic knowledge. All that's left is to write good algorithms for sorting through it. That one is harder, but it's harder in terms of human effort, not data storage.



Quote: weaselman

But, even if there was, it would take days if not months for a twenty-year-old computer to find and retrieve the correct answer.


Yes, it would indeed take hours to sort through all the information (I did have a few old computers; just hours, in fact, it took less time to fully copy a period drive than a modern one). But only new age web coders would need to sort through all the information. With optimized algorithms, milliseconds would be all it takes to find the proper section of data, then find it on the HDD and work through that.
Think in terms of "more or less?" number guessing games, not in terms of almost purposely wasteful total content search. Highly indexed directory and file structures, rejection algorithms and sequential approach. In time, as java cancer infiltrates formal education, that knowledge will be lost, but it still exists even today and certainly did in 1990.


Quote: weaselman

Also, unlike many modern "techniques" in the Computer Science, the "cloud fuzzing" stuff is the real thing. If you have two computers, you can use them to solve problems twice as fast as you could do with one.


I was actually referring to the tag cloud, not to cloud computing. Some of the illustrations for Watson's algorithms looked a lot like them.
Resist ANFO Boston PRISM Stormfront IRA Freedom CIA Obama
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
February 20th, 2011 at 10:15:03 AM permalink
Quote: P90

You only need 20-40 megabytes to store a good encyclopedia. It would of course take gigabytes to store the sum of useful human knowledge (Backdoor Sluts Nine in full HD is not), but you don't need its entirety, just encyclopedic info and some trivia. 120-160 MB should do it. That is just two period low-cost consumer hard drives.



The printed size of wikipedia is about 11 Gigabytes. But that's not all.
You can't store text. It has to be structured data with extensive indexes and cross-references over it for quick access. You can't also stop at the level of documents, because simply finding the appropriate document isn't going to be enough to answer the question, and some questions may require parts of more than one document and more cross referencing. I would expect that the amount of disk space wikipedia uses is at least a Terabyte, but that's still not enough (because it is only indexed at the document level), you'd need at least 100 times more space. 100 Terabyte would be a conservative estimate (Watson uses about 3.5 times more).

Moreover, you can't store that data on disk, because the access time would make it hopeless to ever be accessed in real time. It has to be it memory, and, like I said, twenty year old computers could not possibly address more than 4 gig.


Quote:

Yes, it would indeed take hours to sort through all the information


More like months, if you are talking about disk storage.

Quote:

But only new age web coders would need to sort through all the information. With optimized algorithms, milliseconds would be all it takes to find the proper section of data, then find it on the HDD and work through that.


Well ... All I can think of as a response to this, is - no, you are wrong.

Quote:

Think in terms of "more or less?" number guessing games, not in terms of almost purposely wasteful total content search.



Even the "more or less" thing (b-tree index) can be brutal when you are talking about massive amounts of data.
But the real question is how do you go about implementing the "more or less" search to answer questions like "which city has two airports, one named after World War II hero, and another one after a World War || battle"?

Quote:

I was actually referring to the tag cloud, not to cloud computing. Some of the illustrations for Watson's algorithms looked a lot like them.


Yes, the "tag cloud" is a way to index documents to implement those "more or less" queries you are referring to. You parse the clue into key words, then retrieve the documents, tagged with those keywords using the b-trees (several sequential queries), sort and weigh them by relevance, intersect the resulting sets (about 5-10 sets, about 100K documents each), sort and weigh again, and the mine the top documents for fragments, relevant to the question.
I don't quite understand what approach, other than this, you have in mind when suggesting to abandon the "tag clouds".


If you are not convinced, think about why something as trivial as searching for tickets from Boston to Las Vegas (something immeasurably simpler than natural language search like what Watson does) takes a minute or more on engines like expedia.com or itasoftware.com. And don't say it's because they are written badly. It may be so, but they are the best there are out there. If you really know how they can be made better, it's almost like having a 100% winning betting system - you can make billions on your knowledge.
"When two people always agree one of them is unnecessary"
P90
P90
  • Threads: 12
  • Posts: 1703
Joined: Jan 8, 2011
February 20th, 2011 at 10:46:41 AM permalink
Quote: weaselman

The printed size of wikipedia is about 11 Gigabytes. But that's not all.


And 99% of wikipedia is taken by data *you don't need* to answer these questions.


Quote: weaselman

I would expect that the amount of disk space wikipedia uses is at least a Terabyte, but that's still not enough


Well, obviously. You need to store not only User Talk, but also histories of User Talk pages for each and every edit, which occurs every time someone posts something.
Wikipedia is first and foremost a social network, which just manages to channel a tiny fraction of the human time normally wasted in social networks into doing something useful. A tiny fraction. Still beats this or just about any other forum, though.

Almost all of that is data, not information. You could reduce wikipedia to a few gigabytes and nothing of value will be lost. But even that much is far more than overkill.


Quote: weaselman

Even the "more or less" thing (b-tree index) can be brutal when you are talking about massive amounts of data.
But the real question is how do you go about implementing the "more or less" search to answer questions like "which city has two airports, one named after World War II hero, and another one after a World War || battle"?


Which is why you erase massive amounts of data and only store a relatively small (but massive in absolute terms) selection of useful information. At least 99% of the data in Wikipedia would only ever serve to slow you down. Take a paper encyclopedia for an example of being somewhat overkill.

In answer to your specific question, you would go into the Cities section, then filter only cities that have two airports [or more] [for the second run]. From these, you would compile a list of airports. Then you would retrieve the list of WWII heroes and the list of WWII battles. Run run them against your list of airports. All that can be done in well under a second even on a 80386 CPU, even accounting for the disk data requests.

Involving WWII makes it a very simple question for a computer. If the airports were named after something exotic that doesn't have a list, go into the Airports section from the list, and look for mentions of the origin clue in these airports. If that fails, 1) Log a request to fire some of the people who wrote your Airports section, because origin of the name is pretty much mandatory for an encyclopedic article; 2) Run through articles associated with the origin clue and search for mentions of airports, iterating for a search through linked articles.
You actually could take your time even searching through a tape drive, because the humans probably won't know the answer anyway.


Quote: weaselman

Yes, the "tag cloud" is a way to index documents to implement those "more or less" queries you are referring to. You parse the clue into key words, then retrieve the documents, tagged with those keywords using the b-trees (several sequential queries), sort and weigh them by relevance, intersect the resulting sets (about 5-10 sets, about 100K documents each), sort and weigh again, and the mine the top documents for fragments, relevant to the question.


Do you? Not the computer you would program - you; is that what you do when you are asked a question from Jeopardy?
If so, does your memory even store 500,000 encyclopedia article sized text documents (it doesn't)? Does your brain manage to retrieve them in even the 30-millisecond access time of a long obsolete hard drive (it doesn't)? If not, it's a wrong approach.

A processing-efficient approach would involve the computer first actually recognizing the question, and then doing a relevant search for parameters determined to be of importance. Your approach is coding-efficient, that is, easier to do than a processing-efficient one. However, such approaches utilize processing power less efficiently than a steam locomotive delivering mail on stone tablets utilizes the life efforts of creatures that turned into the coal it burns.


When an avionics suite prioritizes targets, it performs identification, probabilistic when necessary, and employs an algorithm, not a Google search on associated keywords. This is why all of F-22's software takes less space than a cell phone game and runs on CIP with the approximate processing power of a Pentium III. This is why the QNX Neutrino core that runs just about everything from fridges to nuclear powerplants only takes 48 kilobytes. This is why computers had wide military, industrial and commercial uses long before China started shipping containers of fifty-buck trillion-transistor chips and India churning out cheap coder workforce.


Quote: weaselman

If you are not convinced, think about why something as trivial as searching for tickets from Boston to Las Vegas (something immeasurably simpler than natural language search like what Watson does) takes a minute or more on engines like expedia.com or itasoftware.com.


Well, I could think, but there's no need to, as I know it's primarily it's due to latencies - communicating multiple consequent queries over extremely high latency internet links.

On the other hand, think about why something as non-trivial as optimizing all members of a complex-shaped building spaceframe takes less than a second on a computer that couldn't run your cell phone's menu animation.
Resist ANFO Boston PRISM Stormfront IRA Freedom CIA Obama
thecesspit
thecesspit
  • Threads: 53
  • Posts: 5936
Joined: Apr 19, 2010
February 20th, 2011 at 10:54:34 AM permalink
Quote: weaselman

Also, unlike many modern "techniques" in the Computer Science, the "cloud fuzzing" stuff is the real thing. If you have two computers, you can use them to solve problems twice as fast as you could do with one. If you think that is not impressive, try that with people.



Slightly less than twice as fast... only slightly but there is some overhead in parallelism.
"Then you can admire the real gambler, who has neither eaten, slept, thought nor lived, he has so smarted under the scourge of his martingale, so suffered on the rack of his desire for a coup at trente-et-quarante" - Honore de Balzac, 1829
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
February 20th, 2011 at 1:49:11 PM permalink
Quote: P90


And 99% of wikipedia is taken by data *you don't need* to answer these questions.


How do you know? And how do you tell which data you need without knowing the questions?


Quote:

Well, obviously. You need to store not only User Talk, but also histories of User Talk pages for each and every edit, which occurs every time someone posts something.



No, I did not include user talk, and histories. Just the main documents, and indexes.

Quote:

You could reduce wikipedia to a few gigabytes and nothing of value will be lost. But even that much is far more than overkill.


It is easier said than done. If you actually tried to do that, you'd quickly find out that the savings you are getting from titanic efforts of scanning every single page and removing the noise are negligible.
Simply running the page through gzip (modern databases do store data compressed anyway), would give you more benefit than going through it with a fine comb and trying to weed out stuff you don't need. And then, chances are, that you'll make more than one mistake, and lose a lot of useful information if you tried doing that.


Quote:

Which is why you erase massive amounts of data and only store a relatively small (but massive in absolute terms) selection of useful information. At least 99% of the data in Wikipedia would only ever serve to slow you down. Take a paper encyclopedia for an example of being somewhat overkill.


Well. Why don't you show me? Pick a page of wikipedia, say this one, and try to make it significantly (as in 99%) smaller without losing any potentially useful information.


Quote:

In answer to your specific question, you would go into the Cities section


That's a tag

Quote:

, then filter only cities that have two airports [or more] [for the second run].


Are you suggesting to index all cities by the number of airports they have? How about by the number of people? By area? By number of statues? By oldest building? By number of buildings built since 1900? Since 1895? Number of people under 30 years old? Under 25? Number of sunny days per year? Average temperature?
This is what I was saying earlier - the real life taxonomy is virtually infinite. You can't categorize everything by every possible category you will ever need.

Quote:

Then you would retrieve the list of WWII heroes and the list of WWII battles.


This are tags too.



Quote:

Do you? Not the computer you would program - you; is that what you do when you are asked a question from Jeopardy?
If so, does your memory even store 500,000 encyclopedia article sized text documents (it doesn't)? Does your brain manage to retrieve them in even the 30-millisecond access time of a long obsolete hard drive (it doesn't)? If not, it's a wrong approach.



Human brain works very differently from a computer. That's also what I was saying earlier. If Watson could "think" the way people do, it would not need 3000 CPUs, and a room full of electronics. This is exactly the point - we don't know how people's brains do what they do. What we do know is that it is nothing like a computer algorithm.

Quote:

Your approach is coding-efficient, that is, easier to do than a processing-efficient one.


"My" approach is the only one I know (and I am a professional in the field). If you can suggest something different, like I said before, don't keep it secret, there are, probably, billions to be made.


Quote:

Well, I could think, but there's no need to, as I know it's primarily it's due to latencies - communicating multiple consequent queries over extremely high latency internet links.



No, not at all. Expedia only searches its local cache for real time queries. ITA is the information vendor for other engines, it also searches its own local storage. In either case, the query never hits internet after the initial request reaches the edge server (which is usually under 0.5 second).


Quote:

On the other hand, think about why something as non-trivial as optimizing all members of a complex-shaped building spaceframe takes less than a second on a computer that couldn't run your cell phone's menu animation.


What you just said sounds foreign to me. I don't know what is "building spaceframe", what are its "members", and what it takes to "optimize" them. But if it really takes under a second, I'll dare say, you are wrong in thinking that it is not trivial (from the computing complexity standpoint).
"When two people always agree one of them is unnecessary"
P90
P90
  • Threads: 12
  • Posts: 1703
Joined: Jan 8, 2011
February 20th, 2011 at 2:36:09 PM permalink
Quote: weaselman

How do you know? And how do you tell which data you need without knowing the questions?


You don't. You just act as if wikipedia has never existed and use a real encyclopedia as the primary source instead.

Quote: weaselman

It is easier said than done. If you actually tried to do that, you'd quickly find out that the savings you are getting from titanic efforts of scanning every single page and removing the noise are negligible.


That's, again, why wikipedia will not be stored on gold discs sent into space to greet alien civilizations. The very notion of removing noise from wikipedia is oxymoronic.


Quote:

Well. Why don't you show me? Pick a page of wikipedia, say this one, and try to make it significantly (as in 99%) smaller without losing any potentially useful information.


Well, okay. I will use the Random Page function to be fair in the picks.

1. http://en.wikipedia.org/wiki/Rzewuszyce
Reduced to: An entry in the Settlements.Poland.Villages list
2. http://en.wikipedia.org/wiki/Coquitlam_Public_Library
Reduced to: Deleted
3. http://en.wikipedia.org/wiki/Tropical_Storm_Allison
Reduced to: "Tropical Storm Allison, Location: Texas, Time: 2001.06.04-2001.06.18..." <a few more figures and factoids>
4.http://en.wikipedia.org/wiki/Ma%C4%9Fara
Reduced to: An entry in the Settlements.Azerbaijan.Villages list
5. http://en.wikipedia.org/wiki/Clarence_W._Russell
Reduced to: Deleted
6. http://en.wikipedia.org/wiki/British_Rail_Class_936
Reduced to: An entry in the Locomotives.BR.EMU list
7. http://en.wikipedia.org/wiki/The_War_Prayer_%28Babylon_5%29
Reduced to: Deleted

Oh, and as for your link.
8. http://en.wikipedia.org/wiki/Quantum_computer
Reduced to: List entry

It's a game show aimed at people with two-digit IQ. They couldn't correctly tell what a computer is, much less a quantum computer. The show wants to appeal to its audiences, so it won't be asking in-depth questions about quantum computers.


Quote:

Are you suggesting to index all cities by the number of airports they have? How about by the number of people? By area? By number of statues? By oldest building? By number of buildings built since 1900? Since 1895? Number of people under 30 years old? Under 25? Number of sunny days per year? Average temperature?


Yes and no. But not via indexing. Merely reduce the overwhelming majority of cities to a specifications table. Then, separate the cities list into important, relevant, borderline relevant, irrelevant. Delete all irrelevant American cities and all borderline relevant and irrelevant non-American cities.

In each numerical entry in the spec sheet, pick the top 10 and the bottom 10, and store them as a separate records sheet. Delete middle of the road entries for low relevance cities.

Don't try to store everything. Delete stuff. Since the show is US-centric, you don't even need to store a tenth as much information about other countries as a paper encyclopedia does. You aren't trying to preserve the sum total of humanity's knowledge (which would still be far less than most people think, and could fit on one magnetic tape cassette). Just win a game show. Even after you discard 99% of human-collected data as useless altogether, over 99% of legitimate information is still useless for this purpose. Only a tiny subset is useful.

Once you delete all non-computer-readable data and organize the rest in a computer-optimized form (which does not require a trillion redundant indexes), you'll suddenly find that the processing power required is less than an insignificant fraction of what would be required with a brute force approach.


Quote:

"My" approach is the only one I know (and I am a professional in the field). If you can suggest something different, like I said before, don't keep it secret, there are, probably, billions to be made.


I'm afraid "my" approach has been the standard for decades - perhaps not in the field of game show winning or ticket search, but in industrial computing equipment. Computing power hasn't been so cheap until very recently, and so neither were the brute force approaches you are advocating. They are easier to implement, and perhaps even cost-effective, all things considered, but inefficient in computational terms.


Quote:

What you just said sounds foreign to me. I don't know what is "building spaceframe", what are its "members", and what it takes to "optimize" them. But if it really takes under a second, I'll dare say, you are wrong in thinking that it is not trivial (from the computing complexity standpoint).


It's an engineering problem.
One you have to solve every time you are building a skyscraper or just any complex steel-framed structure.
One that is incredibly difficult to solve without computers and takes months of meticulous human work.

It was important enough that computer solutions for it were developed before java and other shitlanguages. That is the only reason it's solved so quickly, and your airline ticket search takes so long. The problem of airline ticket search is a simple filter and search procedure, thousands times less complex, from computing complexity standpoint, than the problem of finite element analysis, and could be solved by a script kid, unlike FEA. The difference comes from the fact that FEA was addressed by programmers and ticket search by script kids.
Resist ANFO Boston PRISM Stormfront IRA Freedom CIA Obama
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
February 20th, 2011 at 5:05:55 PM permalink
Quote: P90

You don't. You just act as if wikipedia has never existed and use a real encyclopedia as the primary source instead.



"real" encyclopedia holding the same amount of information would be about the same size. I only used wikipedia, because it was easiest to come up with a number for the size.

Quote:


Well, okay. I will use the Random Page function to be fair in the picks.

1. http://en.wikipedia.org/wiki/Rzewuszyce
Reduced to: An entry in the Settlements.Poland.Villages list


This won't help to answer the question "which village is located 47 km (29 mi) west of the regional capital Kielce". This is just one of many possibilities you have lost, by compressing the page by about a factor of 3 (long cry from 99%).

Quote:


2. http://en.wikipedia.org/wiki/Coquitlam_Public_Library
Reduced to: Deleted



Deleted? Really? Do I have to comment on this? :)

Quote:


3. http://en.wikipedia.org/wiki/Tropical_Storm_Allison
Reduced to: "Tropical Storm Allison, Location: Texas, Time: 2001.06.04-2001.06.18..." <a few more figures and factoids>


It was clever of you to include "<a few more figures and factoids>" in your answer, so that there is no possible way I can show you what you missed. But I bet, the very fact that you felt the need to do it must have already shown you that what you are suggesting is way more complicated than you are making it look.

Quote:

5. http://en.wikipedia.org/wiki/Clarence_W._Russell
Reduced to: Deleted



Yeah, right ...


Quote:

Yes and no. But not via indexing. Merely reduce the overwhelming majority of cities to a specifications table. Then, separate the cities list into important, relevant, borderline relevant, irrelevant. Delete all irrelevant American cities and all borderline relevant and irrelevant non-American cities.



Yeah ... Relevant to what? Remember, you don't know the question ahead of time. You would have to determine the relevance of every city to everything else that ever existed in the universe. And not just cities. You will need to determine the relevance of everything that has ever existed to everything else.


Quote:

Don't try to store everything. Delete stuff. Since the show is US-centric, you don't even need to store a tenth as much information about other countries as a paper encyclopedia does.



Well, what you are describing is called heuristics. You are trying to come up with arbitrary rules to reduce the amount of information.
I could pick holes in it all night. But what's the point.
Ok, so, instead of just everything, you have to describe the relationship of everything in the US to everything else. This is still infinite for all intents and purposes. Doesn't matter

Quote:



I'm afraid "my" approach has been the standard for decades - perhaps not in the field of game show winning or ticket search, but in industrial computing equipment.


I am afraid, you are mistaken here ... or else, I am a total moron, as well as my stupid employers, who have been paying big bucks to me for several decades (which is a possibility, but it would take a lot more than "I say so" from you to convince me of it). I have never heard of anything reminiscent of "your" approach, and I am considered what you call "an expert" in this field. If you really think you have "an approach", different from what I have been describing, please do share, I am really curious to know what you have invented.

Quote:


It's an engineering problem.
One you have to solve every time you are building a skyscraper or just any complex steel-framed structure.
One that is incredibly difficult to solve without computers and takes months of meticulous human work.



What is difficult to solve without computers isn't necessarily a non-trivial computing problem. Take black jack or poker strategy calculations. Without a computer, it would be very tedious, time consuming and resource hungry problem to solve. But writing a computer program is a problem for a first-year student, and, once programmed, a computer will come up with the answer in a second or two.
I still don't know what engineering problem you are talking about, but, if you are saying that it can be solved in under a second on a cell phone, it is a trivial computational problem. Playing Jeopardy on the other hand, is not.

Bottom line is, what's trivial for a human may be a real challenge for the computer, and vice versa.

Quote:

It was important enough that computer solutions for it were developed before java and other shitlanguages.



I have started programming in Fortran and Assembly. Used Pascal, Lisp, ProLog, Erlang, Haskel, COBOL, C, C++, Java, Python, Perl, Php among others.
Not sure which ones of these you are referring to as "shitlanguages". In my "professional opinion", there is no real difference beyond semantics and syntax.

Quote:

That is the only reason it's solved so quickly, and your airline ticket search takes so long.



The airline search isn't in java.
But again, if you think you know a way to make it faster (in any language), please share. I am dead serious, you and I could make really big money on this. This is one of the hottest issues now in the whole internet. You really sound right now a lot like those guys that show up every now and then and claim that they have finally found the guaranteed betting strategy for roulette.
"When two people always agree one of them is unnecessary"
P90
P90
  • Threads: 12
  • Posts: 1703
Joined: Jan 8, 2011
February 20th, 2011 at 8:53:24 PM permalink
Quote: weaselman

"real" encyclopedia holding the same amount of information would be about the same size.


The very point is that you don't need that amount of information.
I don't normally watch game shows, so I can't give a representative pattern of what Jeopardy questions tend to be about, but I've watched a few episodes and read a few transcripts, and got some impression of the general direction.

Quote: weaselman

This won't help to answer the question "which village is located 47 km (29 mi) west of the regional capital Kielce".


Actually an entry with geographical coordinates (in digital form, never store plaintext) would help one do just that.
However, what are the chances you will be asked that question on Jeopardy?

Quote: weaselman

It was clever of you to include "<a few more figures and factoids>" in your answer, so that there is no possible way I can show you what you missed.


Big subject. It's one of those that would be worth having an article about, because it is: 1) Recent, 2) US-centric, 3) Highly public.
However, you would not store a big plaintext article, you would reduce it to what can be expected to be Jeopardy clues.


Quote: weaselman

Yeah ... Relevant to what? Remember, you don't know the question ahead of time. You would have to determine the relevance of every city to everything else that ever existed in the universe. And not just cities. You will need to determine the relevance of everything that has ever existed to everything else.


No, you don't have to determine everything. The show asks about things that are publicly and relatively widely known. You know for a fact it will not be asking you to prove the Steiner Theorem, so you don't have to store even a mention of it, much less the proof.

Don't think of it as preserving sum total of knowledge, think of it as cheating on a test. Grow your data collection from the ground up, by adding what is necessary, not by taking everything and filtering it. The machine doesn't need to be perfect; just beat puny hunams.


Quote:

I have never heard of anything reminiscent of "your" approach, and I am considered what you call "an expert" in this field. If you really think you have "an approach", different from what I have been describing, please do share, I am really curious to know what you have invented.


Well, if you insist, but I'm afraid it is not my invention.
When a computer program needs to find variable X[15,87], it doesn't sift through the entire memory to find an entry marked "Variable X[15,87]" and read it (which is what full text search is like). It keeps a pointer to array X, adds a displacement determined by index, and goes there.

When a script kid needs to sort through millions of variables (without the advantage of programmer-written libraries), he writes a bubble algorithm and then says "Well, it's millions of variables!" when it takes hours on a high-end server. When a programmer needs to do the same, he selects and implements a mathematically optimized approach, which then takes milliseconds on a rusty 16-bit desktop. To someone born today, who thinks of CPU cycles and RAM bytes as too cheap to meter, the latter will never occur, and he'll just accept as a fact of life that it indeed takes 45 megabytes and a gigahertz CPU to run Tetris.

Storing a dump of plaintext data is NOT a processing-efficient approach. A processing-efficient approach is to recognize and categorize the question and apply an optimized algorithm for solving it. Using pre-optimized databases designed explicitly for and together with the algorithms. There is only so many categories a game show can do, only so many ways it can ask a question, only so many ways needed to address them.

Full text search is a script kid approach, and you know it better than I do (since actual coding is the part my job requires me to delegate). You wouldn't be storing wiki dumps in 16-bit Unicode and doing iterative full text searches if each byte of RAM cost $1 and came out of your stock value.

Since RAM doesn't cost $1 per byte anymore, however, there are no billions to be made here, only to be lost. Sometimes it's just cheaper to throw an extra 44.9 MB of drive space, an extra 1023.25 MB of RAM and an extra 995 MHz of CPU clock at the task, but let the programming be done by a bubble-sorting Indian student on his lunch break than it is to hire a skilled programmer to write a computing-efficient implementation.

If winning a war against The Axis Of Evil depended on making a Jeopardy-beating computer, we would have done so in the 1980s at most. Since it doesn't, we only got around to it now. Furthermore, if Jeopardy wasn't an American show, but, say, a Kenyan one, we would only get around to it sometime in 2050s. And the programmer who did it would be just as convincing in insisting it couldn't possibly be done in 2010s, because there was not enough processing power in the whole 2010s world to implement the eight trillion virtual pseudo-neurons he used in his solution.



Quote:

What is difficult to solve without computers isn't necessarily a non-trivial computing problem. Take black jack or poker strategy calculations. Without a computer, it would be very tedious, time consuming and resource hungry problem to solve. But writing a computer program is a problem for a first-year student, and, once programmed, a computer will come up with the answer in a second or two.


Well, let me elaborate then. Finite element analysis is one of those problems that are incredibly difficult to solve even with computers. Only the very best of modern students could have a shot of writing such a program. And it would run for hours, because they wouldn't care (or be able to) optimize the discretization. Fortunately, FEA routines have been written when computers weren't yet considered white goods and programmers still knew more than php.
They didn't have Sandy Bridge, so they had to do with what computers they had, and so they put effort into making these routines fast.


Quote:

I have started programming in Fortran and Assembly. Used Pascal, Lisp, ProLog, Erlang, Haskel, COBOL, C, C++, Java, Python, Perl, Php among others.
Not sure which ones of these you are referring to as "shitlanguages".


Of course you're sure. If you started with Assembler, you know just as well about the high-level language bloatware cancer.
It's not about syntax, there are only so many ways to do it.
Can you point me to a java program that can produce textured 3D-looking graphics on an 80386?
A java program that delivers a demoscene-competitive audiovisual clip in 64 kilobytes, or, say, ever won Breakpoint?
A java program that can manage over 4,000 active torrents, all in a 380KB package, taking just 1% of a single CPU core and 31MB of RAM?
A java program that runs real-time operations in any industrial or military embedded system?


Quote: weaselman

But again, if you think you know a way to make it faster (in any language), please share.


If you insist. But it's a secret, don't tell anybody.
Dedicate the entire server's resources to just finding your ticket, instead of sharing them between multiple users (accept requests from weaselman, else respond host unreachable). Presto!
Resist ANFO Boston PRISM Stormfront IRA Freedom CIA Obama
boymimbo
boymimbo
  • Threads: 17
  • Posts: 5994
Joined: Nov 12, 2009
February 21st, 2011 at 6:35:30 AM permalink
Expedia comes back with a search from BOS to LAX in about 13 seconds. They took a great deal of pride in being the fastest out there and had a mega-project back when I was working there to change all of their code from one language to another to quicken the search results. They came up with caching algorithms and other methods to speed up the search.

The problem with airline sites is that most fares are stored on one database that is slow so most of that time is taken up accessing the database, which must be in realtime because of the nature of how airfares are still based on a simple one-letter code with a number for the seats available in that code. Each code has its own restrictions and rules that the search provider is responsible for decoding (ie, a V fare might be the lowest fare but it must be booked 15 days in advance and not include a stopover in denver).
----- You want the truth! You can't handle the truth!
ItsCalledSoccer
ItsCalledSoccer
  • Threads: 42
  • Posts: 735
Joined: Aug 30, 2010
February 21st, 2011 at 8:44:33 AM permalink
Not much to add to the "is a brain a computer" or the "computers can do what brains do if they're sufficiently advanced" facets of the discussion.

But I think that computers are fundamentally different than brains and will never duplicate them. Yes, there are resemblances, and yes, there are computers that can work on some problems much faster than brains. But just because we build a machine that out-performs a human in some regard doesn't mean the machine is the same as a human. Hell, we've been building things that are stronger, faster, larger, etc. than humans for thousands of years. The computer is just another invention along that same line.

Computers do what they're told. If they're told garbage, they spit out garbage. Some answers on the Jeopardy challenge illustrated that. While the Watson wrong answer had some logic to it, and while humans can be wrong, it was still just a series of algorithms and no original thought that brought that answer about.

But the brain does more than think. It emotes. Computers can't do this, even sufficiently advanced ones we might imagine ever being invented. A computer doesn't love to the detriment of its own well-being, but humans do all the time. They don't anger to the point of murder ... humans do. They don't lust, they don't greed, they don't hope, etc., etc., etc.

It can be argued whether the non-ability to do these things is a superiority or inferiority of a computer. But it cannot be argued that, no matter how impressive Watson answers questions, it does not emote. There's something in the biology that is fundamentally different than in the programming algorithms.

While cybernetics may or may not be in the future, it is not now, and we only have guesses as to whether or not any future cybernetic man-machine will hope and love and lust and greed while at the same time answering Jeopardy questions (or whatever else).
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
February 21st, 2011 at 9:41:54 AM permalink
Quote: P90

The very point is that you don't need that amount of information.



Your were the one who suggested to put encyclopedia into a computer. Now you are saying you don't need it. Well, suggest something else then.

Quote:

Actually an entry with geographical coordinates (in digital form, never store plaintext) would help one do just that.



So, you would have to calculate the distance from every city in the world to that point? How fast do you think that would be?

Quote:

However, what are the chances you will be asked that question on Jeopardy?



I don't know. Do you? Are you suggesting to think of every possible question that can be ever asked, and estimate the odds of that happening?

Quote:

However, you would not store a big plaintext article, you would reduce it to what can be expected to be Jeopardy clues.



Right. That's what you tried to do just now, and failed.

Quote:

No, you don't have to determine everything. The show asks about things that are publicly and relatively widely known. You can know it will not be asking you to prove the Steiner theorem, so you don't have to store it.



Sure, you can name a thing or two it, probably, will not ask. But how do you create a list of all things that can be asked?


Quote:

Well, if you insist, but I'm afraid it is not my invention.
When a computer program needs to find variable X[15,87], it doesn't sift through the entire memory to find an entry marked "Variable X[15,87]" and read it (which is what full text search is like). It keeps a pointer to array X, adds a displacement determined by index, and goes there.



Yes, that's exactly what it does. But what does it have to do with the discussion?

Quote:

When a script kid needs to sort through millions of variables (without the advantage of programmer-written libraries), he writes a bubble algorithm and then says "Well, it's millions of variables!" when it takes hours on a high-end server.


Actually, a bubble sort of a few million numbers would not take more than a second or two, even on not-so-high-end server. But, again, what does it have to do with anything?

Quote:

When a programmer needs to do the same, he selects and implements a mathematically optimized approach, which then takes milliseconds on a rusty 16-bit desktop. To someone born today, who thinks of CPU cycles and RAM bytes as too cheap to meter, the latter will never occur, and he'll just accept as a fact of life that it indeed takes 45 megabytes and a gigahertz CPU to run Tetris.



Are you suggesting that today's programmers only use bubble sort? Seriously?


Quote:

Storing a dump of plaintext data is NOT a processing-efficient approach.


Right. It isn't. That's why I told you that it would not work. You'd need a lot more storage space, than what's required to just store plain text.

Quote:

A processing-efficient approach is to recognize and categorize the question


That would be the tag cloud you were criticizing in the beginning.

Quote:

and apply an optimized algorithm for solving it.


And what exactly would that algorithm be?

Quote:

. There is only so many categories a game show can do, only so many ways it can ask a question, only so many ways needed to address them.


How many? Care to name them all? Give me a list.

Quote:

Full text search is a script kid approach, and you know it better than I do.



No, I don't. And after having asked you repeatedly to name another one, I still haven't heard of an alternative. Simply saying that something is "bad", and calling professionals, who invented it, "kids", doesn't make it so. If you want to be taken seriously, you have to suggest a better alternative. So far, you have not done that.


Quote:

Well, let me elaborate then. Finite element analysis is one of those problems that are incredibly difficult to solve even with computers. Only the very best of modern students could have a shot of writing such a program.


I know what FEA is. I have been writing programs for it back in my first year in college. It wasn't particularly hard. There is some interesting mathematics involved, but once the algorithm is formulated, it is a trivial programming task. No, it' won't run for hours, if the programmer is competent.



Quote:

Of course you're sure. If you started with Assembler, you know just as well about the high-level language bloatware cancer.
It's not about syntax, there are only so many ways to do it.
Can you point me to a java program that can produce textured 3D-looking graphics on an 80386?


Yes. Autocad, and Pro/Engineer both had java applets back in the 90s that would run inside a browser a create visualizations of user models right on their desktops.

Quote:

A java program that delivers a demoscene-competitive audiovisual clip in 64 kilobytes, or, say, ever won Breakpoint?


On 386? I don't think you could do that in any language. But I don't know what "demoscene" or "Breakpoint" are, so, I maybe wrong (just guessing really). If you show me a program in any language that does what you are talking about, I can, probably, rewrite it in java to do the same thing. Once again, it's only syntax and semantics that are different between different languages. What you can do in one language, you can certainly do in another.

Quote:

A java program that can manage over 4,000 active torrents, all in a 380KB package, taking just 1% of a single CPU core and 31MB of RAM?



Sure. Several popular torrent trackers are written in java.

Quote:

A java program that runs real-time operations in any industrial or military embedded system?



Yes. That's exactly what java was originally invented for. There are many embedded systems controlled by java.


Quote:

Dedicate the entire server's resources to just finding your ticket, instead of sharing them between multiple users (accept requests from weaselman, else respond host unreachable). Presto!



"Entire server"? ITA executes a fare search request on about 2000 servers. Yes, they are shared ... between all 3 or so simultaneous queries they receive concurrently.
"When two people always agree one of them is unnecessary"
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
February 21st, 2011 at 10:05:55 AM permalink
Quote: boymimbo

Expedia comes back with a search from BOS to LAX in about 13 seconds. They took a great deal of pride in being the fastest out there and had a mega-project back when I was working there to change all of their code from one language to another to quicken the search results. They came up with caching algorithms and other methods to speed up the search.



Expedia is not the fastest out there. If anything, it is the slowest. That is because they insist on doing a really deep search, finding all the possible results, and they have a quite small, very short lived cache of fares (so that, when you click on the flight you like, you don't eften get a message that it is unavailable, unlike providers like cheapoair, where that happens quite often). The fastest service is ITA, but it is very inaccurate, especially when it comes to intraline solutions or split-ticketing. That is because their search optimisations are very heavily based on trimming the "least-promising" paths (sorta like what P90 is talking about), which results in getting fewer results faster.


Quote:

The problem with airline sites is that most fares are stored on one database that is slow so most of that time is taken up accessing the database, which must be in realtime because of the nature of how airfares are still based on a simple one-letter code with a number for the seats available in that code.


That's how it was about 10 years ago, before ITA came around. They have their own availability engine, that calculates availability of fares in real time, without consulting the GDS. Expedia, AFAIK, has it's own engine as well (and, I think, always did, at least since it spun off of the IAC).

Quote:

Each code has its own restrictions and rules that the search provider is responsible for decoding (ie, a V fare might be the lowest fare but it must be booked 15 days in advance and not include a stopover in denver).



Yes, that's a complicated calculation. But nothing at all compared to being able to answer any general question in under 3 seconds. I just used this example to show P90 that not everything that looks simple really is simple.
"When two people always agree one of them is unnecessary"
rxwine
rxwine
  • Threads: 212
  • Posts: 12220
Joined: Feb 28, 2010
February 21st, 2011 at 2:41:30 PM permalink
Quote: ItsCalledSoccer

But the brain does more than think. It emotes. Computers can't do this, even sufficiently advanced ones we might imagine ever being invented. A computer doesn't love to the detriment of its own well-being, but humans do all the time. They don't anger to the point of murder ... humans do. They don't lust, they don't greed, they don't hope, etc., etc., etc.



True. Though you could put an ape who's been trained for a few years in symbolic language on the other side of a terminal. He has lust, greed, and hope for a banana for whatever advantage that is - perhaps survival, which is not trivial for hanging around, I guess.

Not sure where I'm going with this, but now I'm hungry.
There's no secret. Just know what you're talking about before you open your mouth.
ItsCalledSoccer
ItsCalledSoccer
  • Threads: 42
  • Posts: 735
Joined: Aug 30, 2010
February 21st, 2011 at 2:50:37 PM permalink
Quote: rxwine

True. Though you could put an ape who's been trained for a few years in symbolic language on the other side of a terminal. He has lust, greed, and hope for a banana for whatever advantage that is - perhaps survival, which is not trivial for hanging around, I guess.

Not sure where I'm going with this, but now I'm hungry.



Heh ... made me laugh a little!

But apes are biological, which is what I meant, only I didn't use any animal other than humans.

FWIW, survival behavior is universal among biological beings, but only in humans is sacrifice - meaning, behavior to your detriment for the benefit of another in conflict with your survival instinct - prevalent. Maybe other species "sacrifice" in this sense, but since we don't speak ape-talk (or whatever-other-species-talk) and we don't really know any ape- (or whatever-) psychology, we don't really know if the ape (or whatever) is truly being sacrifical in contrary to its own survival or if we just anthropomorphize its behaviors.

Oh, computers don't sacrifice, either, so there's another difference with the biology.
JerryLogan
JerryLogan
  • Threads: 26
  • Posts: 1344
Joined: Jun 28, 2010
February 21st, 2011 at 3:34:22 PM permalink
Imagine ME getting involved in a discussion like this?
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
February 21st, 2011 at 3:54:14 PM permalink
Quote: JerryLogan

Imagine ME getting involved in a discussion like this?



Yeah ... Dreadful.
"When two people always agree one of them is unnecessary"
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
February 21st, 2011 at 4:02:45 PM permalink
Quote: ItsCalledSoccer


FWIW, survival behavior is universal among biological beings, but only in humans is sacrifice - meaning, behavior to your detriment for the benefit of another in conflict with your survival instinct - prevalent.



This is not true at all. Bees will attack the intruder and die (they die if they sting someone) to protect the queen. This is only one example, first that comes to mind, it happens all over the place. I would say, self-sacrifice is actually less evident in humans than in other species (it makes sense actually, because it goes against reason, and other species can't reason). But when it comes to sacrificing others in the name of a higher purpose, humans are indeed the unique species in the whole animal kingdom. No other animal does that.
"When two people always agree one of them is unnecessary"
ItsCalledSoccer
ItsCalledSoccer
  • Threads: 42
  • Posts: 735
Joined: Aug 30, 2010
February 21st, 2011 at 5:09:01 PM permalink
Quote: weaselman

This is not true at all. Bees will attack the intruder and die (they die if they sting someone) to protect the queen. This is only one example, first that comes to mind, it happens all over the place. I would say, self-sacrifice is actually less evident in humans than in other species (it makes sense actually, because it goes against reason, and other species can't reason). But when it comes to sacrificing others in the name of a higher purpose, humans are indeed the unique species in the whole animal kingdom. No other animal does that.



I think this could be Exhibit A for "What Anthropomorphism Looks Like."

I don't know this because I'm not a bee, but I don't think bees KNOW, before they sting something, that they will die if they do that. You don't know, either, because you've never spoken to a bee to discover its psychology. Anything you think you KNOW is really just observing their behavior, comparing it to a human's, and saying the bee is doing it for the same reason a human would. In other words, Exhibit A.

Bees sting under other conditions than to protect the queen, i.e., as a defense mechanism ... see also "reach into a glass jar with a bee in there and see if you don't get stung." KNOWING you'll die if you sting something, wouldn't you try to escape or, if you couldn't, just sit there in the glass jar rather than sting?

But they don't KNOW. They don't KNOW they're "protecting the queen." They're acting on millenia of evolutionary genetic imprinting. You observe it and say, "That bee died to protect the queen," but only because you KNOW it will die. You're assigning both a choice and a heroic motive to the bee. In other words, anthropomorphizing. The bee doesn't have a choice, and it doesn't have a motive. It's all raw instinct.

As to the comment on human self-sacrifice, nobody disputes that humans aren't always noble. But sometimes they are ... see also Flight 93 passengers KNOWING they will die by bringing down the plane in rural Pennsylvania. Humans, I think, are the only animals where choice enters into it; i.e., morality. I don't think bees have "morality" in that sense, only an evolutionary imprint as to what works best for the survival of the species.
rxwine
rxwine
  • Threads: 212
  • Posts: 12220
Joined: Feb 28, 2010
February 21st, 2011 at 5:53:16 PM permalink
Quote: ItsCalledSoccer

As to the comment on human self-sacrifice, nobody disputes that humans aren't always noble. But sometimes they are ... see also Flight 93 passengers KNOWING they will die by bringing down the plane in rural Pennsylvania. Humans, I think, are the only animals where choice enters into it; i.e., morality. I don't think bees have "morality" in that sense, only an evolutionary imprint as to what works best for the survival of the species.



Hmm, well, I thought of elephants almost immediately on thinking of altruistic animals. There's even a section on altruism in this wiki entry. (although it occurred to me from watching nature shows about elephants before I even looked it up)

Elephant intelligence
There's no secret. Just know what you're talking about before you open your mouth.
ItsCalledSoccer
ItsCalledSoccer
  • Threads: 42
  • Posts: 735
Joined: Aug 30, 2010
February 21st, 2011 at 6:59:02 PM permalink
Quote: rxwine

Hmm, well, I thought of elephants almost immediately on thinking of altruistic animals. There's even a section on altruism in this wiki entry. (although it occurred to me from watching nature shows about elephants before I even looked it up)

Elephant intelligence



Same song, second verse.

You (not rxwine-you, but generic-you) observe behavior and make the implicit assumption that, "hey, since a noble human would walk backwards to avoid harming a human, the elephant MUST have walked backwards for the same reasons!" Exhibit B of "What Anthropomorphism Looks Like."

I don't doubt an elephant's intelligence. I doubt that it has morality. There could be a zillion reasons why one particular elephant walked backwards at one particular time, the most unlikely of which is some sort of elephant-morality shared by all elephants.

Honestly, believing an elephant has morality in the same sense we do is, to greatly understate it, a stretch. IMHO, it takes a great deal less faith to believe that a god created the universe than it does to assign morality to elephants.

But let's just say it does (even though it doesn't). How would it communicate its "psychology" to you? How can you even trust your own observations? It's been said that, if a space alien tried to guess mankind's morality by just observing him, it would be way off on its conclusions ... until it spoke with us to know our psychology. So, how can you say that elephants are, in the same sense humans are, altruistic without making the infinite leap into the Faith of the Elephant Morality?
mkl654321
mkl654321
  • Threads: 65
  • Posts: 3412
Joined: Aug 8, 2010
February 21st, 2011 at 7:04:27 PM permalink
Quote: ItsCalledSoccer

FWIW, survival behavior is universal among biological beings, but only in humans is sacrifice - meaning, behavior to your detriment for the benefit of another in conflict with your survival instinct - prevalent. .



Not that I disagree with your basic point, but altruistic behavior IS in accordance with survival instinct--in terms of, passing along your genetic material. Altruistic behavior isn't confined to humans, either; not only primates, but birds, whales, elephants, and many other social species exhibit self-sacrificing behavior. The consideration of survival of the group may outweigh the survival of the individual, in terms of maximizing reproductive strategies. Also, nonfatal altruistic behavior can be rewarded/rewarding, if the group practices reciprocal altruism.
The fact that a believer is happier than a skeptic is no more to the point than the fact that a drunken man is happier than a sober one. The happiness of credulity is a cheap and dangerous quality.---George Bernard Shaw
rxwine
rxwine
  • Threads: 212
  • Posts: 12220
Joined: Feb 28, 2010
February 21st, 2011 at 7:38:52 PM permalink
Quote: ItsCalledSoccer

Honestly, believing an elephant has morality in the same sense we do is, to greatly understate it, a stretch. IMHO, it takes a great deal less faith to believe that a god created the universe than it does to assign morality to elephants.



Altruism is just a subset though of all morality or moral reasoning. When an animal shows some reasoning ability (in order to solve problems) I wouldn't entirely rule out altruism based on certain actions. As I can see an elephant, and I can't see god - I'm going with the elephant altruism as more easily believable.
There's no secret. Just know what you're talking about before you open your mouth.
P90
P90
  • Threads: 12
  • Posts: 1703
Joined: Jan 8, 2011
February 21st, 2011 at 8:43:48 PM permalink
Quote: weaselman

Your were the one who suggested to put encyclopedia into a computer. Now you are saying you don't need it. Well, suggest something else then.


Regular sized encyclopedia, not wikipedia. Even a normal encyclopedia is overkill.

Quote: weaselman

So, you would have to calculate the distance from every city in the world to that point? How fast do you think that would be?


Of course not. You would use only the Settlements.Poland list (at least as first resort).
How fast would that be? Let's see... 3-5 operations, at 30-50 million operations per second, for say 10,000 settlements (you'd have fewer)... About a millisecond, I/O not included. Will be a few delays, and if the list is not stored in RAM, you'd need to read it, that's plus one drive latency.


Quote: weaselman

I don't know. Do you? Are you suggesting to think of every possible question that can be ever asked, and estimate the odds of that happening?


Why are you insisting on approaching the problem from the wrong end - taking everything and reducing it to a subset? You only need to analyze the list of questions asked on Jeopardy before to estimate the most likely categories and types of questions. It's what a human does when preparing to a game show as well, not thinking of every possible question.

Quote: weaselman

Sure, you can name a thing or two it, probably, will not ask. But how do you create a list of all things that can be asked?


Same thing, wrong end. You only need things that are most likely to be asked.
How? An encyclopedia will contain virtually all the answers, the rest will be found in other public sources, like lists of sports competition results. Now you need to rewrite it explicitly for the computer.


Quote:

Are you suggesting that today's programmers only use bubble sort?


Of course not; it is just an example of how the same problem can be solved with a lot of computer power but very little coding effort, or with far less computer power but more coding effort.


Quote:

That would be the tag cloud you were criticizing in the beginning.


Tag cloud is one approach. The approach I was talking about is making the computer recognize the question. Read the input according to English grammar, then assign to it the exact category the question actually is in, and search an answer for the question actually asked.

How exactly? That's a million dollar question. But it's a question of programming, not processing power.


Quote: weaselman

How many? Care to name them all? Give me a list.


Ask someone who actually watches Jeopardy. I think the list includes cities, decades, and something else.


Quote: weaselman

No, I don't. And after having asked you repeatedly to name another one, I still haven't heard of an alternative.


As I said, the computing-efficient alternative would be to understand the question and do an optimized search for the answer using a purpose-written database. It's not *better*, just less demanding on the computer and more demanding on the human team.


Quote: weaselman

I know what FEA is. I have been writing programs for it back in my first year in college. It wasn't particularly hard. There is some interesting mathematics involved, but once the algorithm is formulated, it is a trivial programming task. No, it' won't run for hours, if the programmer is competent.


Precisely the point: if the programmer is competent. One of the challenges in FEA (one that made it so hard at first, OK, I exaggerated here) is optimizing the automatic discretization algorithms, so that the program doesn't take more computing power than System/360 has to offer, yet delivering not only sufficient precision, but high confidence in the precision.


Quote: weaselman

On 386? I don't think you could do that in any language. But I don't know what "demoscene" or "Breakpoint" are, so, I maybe wrong (just guessing really). If you show me a program in any language that does what you are talking about, I can, probably, rewrite it in java to do the same thing.


One old example
A newer one
http://pouet.net/prodlist.php?platform=Windows&type=64k&order=thumbup - The main database

These are not for 386 (although there are categories for computers as old as Commodore 64 as well), but most are at least a few years old.

Pay attention not just to what is shown, but to how it is, by competition rules, contained in 65,536 bytes or less - executable code itself, models, textures, sound, animation, et cetera. Try to estimate how much space it would take ordinarily, using normal model, texture, music formats, et cetera.


Quote: weaselman

Once again, it's only syntax and semantics that are different between different languages. What you can do in one language, you can certainly do in another.


Usually. The bottleneck here will be JVM and its inability to deliver sufficient performance and use some low-level tricks that are employed in these demos. There were some demos written in java, largely to see if it can be done, but they were all invariably slow and unimpressive.

The tricks that let demos fit into 4k or 64k aren't all that hard to figure out, and they show how much difference in required storage space there is between the regular approach done with "storage is cheap" in mind and using some finesse. Of course, the raw brute force approach, rendering the video and compressing it, would take even more space than regular models and textures.


Quote: weaselman

Sure. Several popular torrent trackers are written in java.


Ah, so you know. Good. Now compare how much resources are consumed by Vuze and uTorrent.
Resist ANFO Boston PRISM Stormfront IRA Freedom CIA Obama
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
February 22nd, 2011 at 6:06:02 AM permalink
Quote: ItsCalledSoccer



I think this could be Exhibit A for "What Anthropomorphism Looks Like."


It seems that you don't really know what "anthropomorphism" is.


Quote:

I don't know this because I'm not a bee, but I don't think bees KNOW, before they sting something, that they will die if they do that.



It depends on what you need by "know". They "know" as much as a dog "knows" it will die if it puts its head on tracks before a train.

Quote:


Anything you think you KNOW is really just observing their behavior, comparing it to a human's, and saying the bee is doing it for the same reason a human would. In other words, Exhibit A.



Actually, I don't know for what reason a human would do it. To begin with, I don't think, that every human would. But I do know the reason for the bee - it's called instinct.


Quote:

Bees sting under other conditions than to protect the queen., i.e., as a defense mechanism ...


Actually, no, they don't, and no it isn't. Think about it from evolutionary perspective - how much sense does a "defense mechanism" make if you die after invoking it?

Quote:

see also "reach into a glass jar with a bee in there and see if you don't get stung." KNOWING you'll die if you sting something, wouldn't you try to escape or, if you couldn't, just sit there in the glass jar rather than sting?



A bee won't sting in this situation, unless you scare it and make it panic.

Quote:

They're acting on millenia of evolutionary genetic imprinting.


And you aren't?

Quote:

You observe it and say, "That bee died to protect the queen," but only because you KNOW it will die.



The bee did die to protect the queen. It doesn't matter what I know or even what it knows. It's a fact. You are theorizing on WHY it died to protect the queen, as if it mattered, but it is actually irrelevant.

Quote:

You're assigning both a choice and a heroic motive to the bee.


No, it's you who is doing that. I am not assigning anything, just stating the fact.

Quote:

Humans, I think, are the only animals where choice enters into it; i.e., morality. I don't think bees have "morality" in that sense, only an evolutionary imprint as to what works best for the survival of the species.


Bees don't have morality, that's not in dispute. They do have self-sacrifice though. Self-sacrifice and morality is not the same thing. The former is a type of action. The latter is motivation. Bee's motivation for self-sacrifice is different (and way more efficient) than that of humans. Human morality in way, is an evolutionary imprint as well (how do you tell the difference between, for example, morality and maternal instinct?), but, bee's instinct has been evolving longer than our morality, and is, for that reason, a lot stronger and more efficient.

One could argue that all "morality" is is a form of a "young" instinct, not yet fully formed. As evolution progresses, the "correct" choices will get imprinted in human brains just like the choice to protect the queen is imprinted in the bees', so that we will no longer have to agonize over every decision involving moral trade-offs like we do now.
"When two people always agree one of them is unnecessary"
ItsCalledSoccer
ItsCalledSoccer
  • Threads: 42
  • Posts: 735
Joined: Aug 30, 2010
February 22nd, 2011 at 6:20:57 AM permalink
Quote: rxwine

Altruism is just a subset though of all morality or moral reasoning. When an animal shows some reasoning ability (in order to solve problems) I wouldn't entirely rule out altruism based on certain actions. As I can see an elephant, and I can't see god - I'm going with the elephant altruism as more easily believable.



Not to intentionally frustrate you, but ... same song, third verse. Altruism is a subset of all morality and moral reasoning ... for humans. Assigning it to animals is anthropomorphizing them.

At the end of the day, we observe animal behavior. We don't know their psychology, and we don't know how they think because the most we can do is analyze brain waves and ... umm ... compare them to humans, and say that "because humans waves mean this, it MUST also mean the same thing for animals" (which is yet again anthropomorphizing).

Yes, animals can learn behavior, both from humans and from other animals. But are they reasoning any further than, "if I do that, my elephant-mother will hurt me"? I would say no. The elephant doesn't ask why, it doesn't complain about doing it. I would go further ... that they don't even think as far as my posit, but just change behavior to avoid pain.

This also goes for monkey who "learns" sign language. It doesn't know it's communicating. It knows that it gets a banana when it moves its arms and hands a certain way. Stop giving it bananas, and it will stop "communicating." We've been training animals for millennia ... why we suddenly apply "hey, the monkey speaks sign language" when we're just doing what we've always been doing is beyond me.

So while, in a scientific, "it's not impossible" sense, I can't disagree that there *might* be altruism in the human-morality sense there, it's at best wishful conjecture in the same category as the existence of extraterrestrial life: something lots of people want to be true, but has not actually been discovered.

You, my forum friend, are a man of great faith! Let's hope you give it in the right object, and if that object is elephant-altruism, you win!

(Come back and quote me this post if extraterrestrial life is ever discovered and I'll revisit.)
ItsCalledSoccer
ItsCalledSoccer
  • Threads: 42
  • Posts: 735
Joined: Aug 30, 2010
February 22nd, 2011 at 6:28:02 AM permalink
Fun with quote-parsing, part one ...

Quote: weaselman

I



Quote: weaselman

am



Quote: weaselman

an idiot.



Don't be so hard on yourself, weez!

Seriously, if you won't take on the discussion in its whole, there's really no reason to try to have a discussion with you. Besides being dishonest, over-parsing makes your posts hard to read. If the whole is too large, pick a part and leave the other parts alone. And, it leads to silly things like ...

Quote: weaselman

Self-sacrifice and morality is not the same thing.



Which is a little like saying, Las Vegas and Nevada are not the same thing ... which is true, but one is part of the larger other. Hardly a foundation to build a counter-argument, but rather a distinction without a difference, obfuscatory, and argumentative.
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
February 22nd, 2011 at 6:41:02 AM permalink
Quote: P90

Regular sized encyclopedia, not wikipedia. Even a normal encyclopedia is overkill.



We've been there. I said that "regualr sized" encyclopedia would be about the same size, I only used wikipedia example because it was easiest to come up with a number for the size. You then said you did not need encyclopedia. I said - what else. Now you are saying "encyclopedia" :)
If it is "an overkill", suggest something else. Once again, it was you who suggested encyclopedia in the first place. If you want to back out now, I don't mind, but you'll have to come up with the alternative fist.


Quote:

Of course not. You would use only the Settlements.Poland list (at least as first resort).


The question does not mention Poland.

Quote:

How fast would that be? Let's see... three operations, at 30 million operations per second


What do you call an "operation"? How many "operations" do you think calculating of a distance requires?
BTW, 80386 was just shy of 11 million instructions per second, not "30 million operations" (1 simple arithmetic operation is about 5-6 instructions).

Quote:

About a millisecond, I/O not included. Will be a few delays, and if the list is not stored in RAM, you'd need to read it, that's plus one drive latency.


I am sorry, this is so naive, I can't even comment on it. "One drive latency"? WTF is that?
Even if it was in RAM, you don't think it's free to access it, do you? How about locating it there?
You keep forgetting, that your computer is supposed to be able to answer any question, not just the particular one you chose to look at this moment (if it was the latter, all you'd need to do is precompute a matrix of distances and throw it into memory for lightning-fast response to any question like this).



Quote:

Why are you insisting on approaching the problem from the wrong end - taking everything and reducing it to a subset?


Because that's the approach you suggested - take a good encyclopedia, and put it into memory. Then, when I showed you the hopelessness of that (on a 20-year old computer), you started talking about reducing it to a subset. Now I am insisting on it all of a sudden? No, I am not. I am simply saying that what you are suggesting is impossible.


Quote:

You only need to analyze the list of questions asked on Jeopardy before to estimate the most likely categories and types of questions.


What are the chances the same question will be asked again?

Quote:

It's what a human does when preparing to a game show as well, not thinking of every possible question.



Yeah ... human's brain works very different from computers. I think, we already discussed this at length.


Quote:

Tag cloud is one approach.


Yes. It is the one you criticized at first ... and the suggested as its own alternative :)

Quote:

The approach I was talking about is making the computer recognize the question.


This has no meaning. You have to describe how you want to make it recognize the question, and what you mean by "recognize" in this context. Tag cloud is once such description - the question is analyzed for key words, that are then cross-checked against the tags, assigned to documents, and the list of most relevant documents is composed. This is what "recognize the question" commonly means when taking about computers. You evidently want it to mean something else, but have yet to explain what it is.

Quote:

How exactly? That's a million dollar question. But it's a question of programming, not processing power.


Well, yeah ... In the same sense as finding the philosopher stone formula, the meaning of life and the loss-proof betting system, and also breaking the RSA cipher, finding extraterrestrial life, or an odd perfect number are all "questions of programming, not processing power".

Quote:


Ask someone who actually watches Jeopardy. I think the list includes cities, decades, and something else.


Three categories? You suggest to keep the list of all documents broken into three categories? Correct?
Well ... Let's see how it works ... How about this question: "His victims include Sirius Black and Mad Eye Moody, and you can't mention his name"? I suppose you will have to mine your "something else" category for the answer? Something tells me it will be way larger than the other two :)


Quote:

As I said, the computing-efficient alternative would be to understand the question and do an optimized search for the answer


Yes, you did say that. You just never explained what it means


Quote:

Precisely the point: if the programmer is competent.


Are you suggesting that all programmers in IBM are incompetent, or just those that worked on Watson?
When you say "incompetent", do you mean "less competent than I am" or "doesn't use bubble sort" or do you have some other competency criteria?

Quote:

Usually. The bottleneck here will be JVM and its inability to deliver sufficient performance


Do you have (recent) benchmarks you base your opinion on, or do you just rely on your global view that all programmers are incompetent, and therefore conclude that those who wrote the JVM are too, and thus JVM must offer bad performance? How about the guys who create the C or C++ compilers and optimizers? If they tend to use bubble-sort, the resulting code won't be very fast either. Before you say it - no, modern C/C++ compilers are not the same ones that were written 40 years ago by the "founding fathers".

Quote:

Now compare how much resources are consumed by Vuze and uTorrent.



I was actually talking about torrent trackers, not clients. Client part for the torrent is really thin, there isn't much to program there. Vuze is a pig because it has so much extra staff, that no one needs - like search and bookmarks, and images, and what not (I don't know very much of the functionality). If you want to see a good torrent client in java, look up BitLet. It's a very thin java app, that runs as a browser applet. There aren't very many torrent clients in java, because, like I said, the app is so simple, that makes java an overkill (not in terms of performance, but from programming standpoint) - I believe in the theory, that there is the right tol for any task.
"When two people always agree one of them is unnecessary"
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
February 22nd, 2011 at 6:49:07 AM permalink
Quote: ItsCalledSoccer


I am an idiot.


I explained my position to you. If you don't like discussing it, you can just not respond. There is no point in descending to the kindergarten-style behavior. I am not JerryLogan, you can't provoke me.
"When two people always agree one of them is unnecessary"
P90
P90
  • Threads: 12
  • Posts: 1703
Joined: Jan 8, 2011
February 22nd, 2011 at 7:36:16 AM permalink
Quote: weaselman

We've been there. I said that "regualr sized" encyclopedia would be about the same size, I only used wikipedia example because it was easiest to come up with a number for the size.


A regular sized encyclopedia would be about 25*800kB=20MB in size. If stored completely and in eight-bit format.

Quote: weaselman

The question does not mention Poland.


The question mentioned it being 47km from a specific Polish city.

Quote: weaselman

I am sorry, this is so naive, I can't even comment on it. "One drive latency"? WTF is that?


You need to access just one file, if you don't have it in RAM already.

Quote: weaselman

You keep forgetting, that your computer is supposed to be able to answer any question


No, it's not. It's supposed to be able to answer enough questions to win the game.

That is the difference you keep ignoring. Perfection is not required (neither has it been achieved), only a sufficient hit percentage to make good use of its quick trigger finger and get ahead of the humans. You don't need all the data in the world, you don't need to be able to answer any question, just a sufficient percentage.




Quote: weaselman

Because that's the approach you suggested - take a good encyclopedia, and put it into memory. Then, when I showed you the hopelessness of that (on a 20-year old computer), you started talking about reducing it to a subset.


Actually, you didn't exactly do that, but rather insisted on storing as much data as wikipedia contains.

I'm elaborating on how specifically you could change the encyclopedia to suit the computer better because we went into elaborating.

Quote: weaselman

What are the chances the same question will be asked again?


Similar, not same.
I don't regularly watch the show, but they seem to be quite close to unity.

Quote: weaselman

Yeah ... human's brain works very different from computers. I think, we already discussed this at length.


Humans also have much less intellectual resources than computers. Much smaller digital information capacity, much longer latencies. What humans do is utilize their small capacity much better at certain tasks than computers do.
Even as the amount of processing power is increasing greatly, computers aren't getting much better at these tasks. Their limitation is not FLOPS and terabytes, but programming that is incomparably less efficient at these tasks than the workings of a human brain are.

Quote: weaselman

Yes. It is the one you criticized at first ... and the suggested as its own alternative :)


I didn't suggest it. Tag cloud is a very specific thing, and that's not it. I suggested an algorithmic approach. You don't do full-text searches for every word in the question. You categorize the question and select an algorithm for solving it.
For example, for a question "What city is located 47km from Gdansk?" you would apply the settlement search algorithm, starting with locating the location of "Gdansk". Words "what", "located", "from" would never be processed in any way other than for determining the question.

Quote: weaselman

This has no meaning. You have to describe how you want to make it recognize the question, and what you mean by "recognize" in this context. Tag cloud is once such description - the question is analyzed for key words, that are then cross-checked against the tags, assigned to documents, and the list of most relevant documents is composed.


Yes. Which is what you don't do if you want to minimize computing cost. You have to read the question's grammar, which is difficult to do reliably, but is not considered impossible, unlike the things you listed below.

Quote: weaselman

Three categories? [You suggest to keep the list of all documents broken into three categories?


Now you are just cocking about...

Quote: weaselman

Yes, you did say that. You just never explained what it means


Figure out what task should be performed. In the example above, the task is to locate a settlement located 47km away from Gdansk.

Quote: weaselman

If you want to see a god torrent client, look up BitLet. It's a very thin java app, that runs as a browser applet.


I don't know. Can it seed several thousand torrents at once, via multiple public and private trackers and a large DHT network, at megabyte speeds, while using an unnoticeable fraction of a modern desktop's resources and providing minimal effect on user internet experience?

Quote: weaselman

Are you suggesting that all programmers in IBM are incompetent, or just those that worked on Watson?


I'm suggesting that programmers that worked on Watson preferred to use a large amount of computing power, which was available to them, to solve the task with the amount of human effort that was available to them.

Programmers in the 1960s, if they had to approach this problem (for instance, the ownership of unaligned nations was to be decided by a game of Jeopardy involving computers), would have approached it differently. That much should not even be in question.

From what I understood of your position, if you were to approach this problem in the 1960s (1970s, 1980s), your response would be "Sorry, not enough computer power. Let's surrender for now and ask for a rematch when we have at least 10 TB of storage and 1 TFLOPS."

My position is that the approach used by Watson does not solve the problem with minimum possible amount of computer resources, and, was this problem of as much importance as that before Colossus, it could be solved with far less advanced hardware than Watson uses to solve it.
Resist ANFO Boston PRISM Stormfront IRA Freedom CIA Obama
rxwine
rxwine
  • Threads: 212
  • Posts: 12220
Joined: Feb 28, 2010
February 22nd, 2011 at 7:39:47 AM permalink
Quote: ItsCalledSoccer

Not to intentionally frustrate you, but ... same song, third verse. Altruism is a subset of all morality and moral reasoning ... for humans. Assigning it to animals is anthropomorphizing them.

At the end of the day, we observe animal behavior. We don't know their psychology, and we don't know how they think because the most we can do is analyze brain waves and ... umm ... compare them to humans, and say that "because humans waves mean this, it MUST also mean the same thing for animals" (which is yet again anthropomorphizing).



Well, see, I was trying to simplify the moral reasoning (so I could apply it)

Possibly the simplest altruistic act an elephant might make is a slight side step of one foot while walking to avoid squashing a rodent. I can't prove that it knows the consequences of squashing live objects, or that the benefit it actually derives is not getting something squishy under its foot, but on the other hand it doesn't require Kantian reasoning either to suppose that the elephant is not squashing a rodent to no benefit of the elephant. I of course base this, on an animal that has some provable cognition of a higher order than most.

I'll stand on that assertion without too much worry as thinking its unreasonable.
There's no secret. Just know what you're talking about before you open your mouth.
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
February 22nd, 2011 at 5:40:12 PM permalink
Quote: P90

A regular sized encyclopedia would be about 25*800kB=20MB in size. If stored completely and in eight-bit format.



Where are you getting these numbers? What is 25? What is 800? Why not 20*750?
Encyclopedia Britannica, 11 edition, contains 44 million words. Conservatively allowing 20 bytes per word, it is 880 Megabytes. But you can't just use a pile of documents. At the very least, you need indexes for direct access, which will be about 5 times as much space, making it about 4G. Assigning multiple categories (tags) to the documents as you suggested earlier, would also increase the size, but, more importantly, require (lots of) additional indexing, generally, an extensively indexed database uses about 100 times as much space as the data it stores. That is about half-Terabyte already, but you are missing lots of data. All the latest movies, music hits, cartoons, books, news will need to be added. You don't think it's a lot? Think again. The data tends to lose actuality and get forgotten as it ages. There are at least as many events and facts, considered more or less "important" that happened between the last edition of Britannica and now, as there are described in it, if not more. You'll need to at least double the space to a Terabyte.

But what do you do with unstructured plain text? You can't possibly expect to parse and analyze all relevant documents in real time, you can't store them as plain collection of words, you need to impose structure upon them. Doing that will require at least 100 times as much space as storing unstructured data. Here we are, same 100 terabyte ballpark we had with wikipedia.

The fact of the matter is, Watson's programmers are not the "script kids" you were referring to earlier. If they needed 300T of memory, it should tell you something. Ok, maybe, they were wasteful and could have saved some if they were on a tighter budget. Even if they saved the whole two thirds (66% - that's really really a lot!), they'd still require a hundred Terabyte.



Quote:

You need to access just one file, if you don't have it in RAM already.


Who do you know which one file you need to access?


Quote:


No, it's not. It's supposed to be able to answer enough questions to win the game.



Yeah, it is. If your program cannot theoretically answer every question (perhaps, incorrectly, but algorithmically), you have no hopes of winning the game.

Quote:

you don't need to be able to answer any question, just a sufficient percentage.


Fine. Substitute everything that ever exited for sufficient percentage of everything that ever existed. What does it change qualitatively? I asked you to show me your "reduction" approach on one sample page, and you could not. I asked you to show me the list of categories you considered "relevant", and you named ... three.
The question remains: how, by what criteria, and by what method, are you suggesting to come up with the list of things that are worth knowing?.

Quote:


Actually, you didn't exactly do that, but rather insisted on storing as much data as wikipedia contains.



I did not do what? You suggested to put an encyclopedia into a computer. I used wikipedia as an example, because it does not matter. Some encyclopedia might contain twice fewer data, some might be twice larger, but qualitatively, it makes no difference, the answer stays the same - we are talking about hundreds of terabytes.

Quote:

I'm elaborating on how specifically you could change the encyclopedia to suit the computer better because we went into elaborating.



You are? I have not noticed you do that despite my repeated requests.


Quote:

Humans also have much less intellectual resources than computers.
Much smaller digital information capacity, much longer latencies. What humans do is utilize their small capacity much better at certain tasks than computers do.



Yes. That was exactly my point.

Quote:


Even as the amount of processing power is increasing greatly, computers aren't getting much better at these tasks. Their limitation is not FLOPS and terabytes, but programming that is incomparably less efficient at these tasks than the workings of a human brain are.



That is exactly my point too. We don't know how humans think, we can make an algorithm, and we cannot build a model of human brain. We can program a computer to do similar things, but it does it very differently than the brain, requiring different kinds (and amounts) of resources.



Quote:

didn't suggest it. Tag cloud is a very specific thing, and that's not it.
I suggested an algorithmic approach. You don't do full-text searches for every word in the question. You categorize the question and select an algorithm for solving it.



That is not "algorithmic approach", that is "wishful thinking" :)
Come on, you do know what "algorithm" is, don't you?


Quote:

For example, for a question "What city is located 47km from Gdansk?" you would apply the settlement search algorithm, starting with locating the location of "Gdansk".


How do you know Gdansk is a settlement?

Quote:

Words "what", "located", "from" would never be processed in any way other than for determining the question.


This is wrong actually. You have to process "what", because it tells you what to look for. "Whose city is located 47km ..." is a very different question. You need "located" because it specifies the question in a very important way - that is the only indication that you are looking for a geographic information. And "from" is also important in similar way. "What city is located 47 km below ..." is asking something very different. And don't tell me there is nothing below Gdansk - you don't know that!.


Quote:


Yes. Which is what you don't do if you want to minimize computing cost. You have to read the question's grammar, which is difficult to do reliably, but is not considered impossible, unlike the things you listed below.



Actually, grammatical analysis is the easiest of what needs to be done. You have told me many times what you "don't do", but all my requests to show what you do do go unheard.
Look, I described "my approach" to you in a way that you can program treating questions as input. That is algorithm. All you keep doing is saying to answer such-and-such question, I'd do such-and-such thing. This isn't saying much. You have to come up with an algorithm that does not know which question it is asked.

Quote:


Quote: weaselman

Three categories? [You suggest to keep the list of all documents broken into three categories?


Now you are just cocking about...



I am not getting the sarcasm. You said that only some categories are relevant, and others should be ignored. I asked you which ones, and you said "geography, decades and something else".
That is the three categories I am referring to.

Quote:

I don't know. Can it seed several thousand torrents at once, via multiple public and private trackers and a large DHT network, at megabyte speeds, while using an unnoticeable fraction of a modern desktop's resources and providing minimal effect on user internet experience?


Yeah, it's actually a lot easier than it sounds (because most of work is actually done by the OS kernel).


Quote:


I'm suggesting that programmers that worked on Watson preferred to use a large amount of computing power, which was available to them, to solve the task with the amount of human effort that was available to them.


Preferred it to what? I am guessing, that they, like me, simply did not know any other approaches, and either did not knowthey should have consulted you, or, like me, have spoken to you, but could never begin to understand what it is you are suggesting they should have done.

Quote:

Programmers in the 1960s, if they had to approach this problem (for instance, the ownership of unaligned nations was to be decided by a game of Jeopardy involving computers), would have approached it differently. That much should not even be in question.


Yes, they would definitely have to approach it differently (there was simply no technology to approach it in this way). And they also would have probably failed to solve it too. Either that, or we would be living in a very different world now - the world, described by Asimov et al., where machines actually can think. I mean, that's what would happen if those programmers in the 80s, faced with this task did not fail, but indeed invented a way to solve it using the technology available to them.

Quote:


From what I understood of your position, if you were to approach this problem in the 1960s (1970s, 1980s), your response would be "Sorry, not enough computer power. Let's surrender for now and ask for a rematch when we have at least 10 TB of storage and 1 TFLOPS."



Something like that, yes. Granted, I was much younger than, but still ... I was never really interested in pursuing theoretically impossible goals for the kicks of it. When I was a student, it was popular to be looking for a proof of Fermat Theorem. I never did that. I knew, it could, probably, be proven, but I also knew, that I did not have enough resources for the task, it was a waste of effort. I have also frequented some math forums in the past, and every now and then somebody would show up there claiming they had found a formula to generate prime numbers. I would always tell them, if you really think you have it, stop bragging about it, go to RSA challenge website, and make yourself a few million bucks. I have the same thing to tell you. If you can make something like Watson run on a modern PC (forget 386) there is some real money to be made. I am willing to pay you for the prototype. I am not rich, but you don't need a lot - if the thing works, there will be investors lining up in no time. How about $1000 will you accept the challenge?
"When two people always agree one of them is unnecessary"
Keyser
Keyser
  • Threads: 35
  • Posts: 2106
Joined: Apr 16, 2010
February 22nd, 2011 at 5:49:58 PM permalink
If you build a complex enough computer then life will leak into it.
P90
P90
  • Threads: 12
  • Posts: 1703
Joined: Jan 8, 2011
February 23rd, 2011 at 12:51:40 AM permalink
Quote: weaselman

Conservatively allowing 20 bytes per word, it is 880 Megabytes.


That's very conservative of you.
Just about everywhere else, a word is considered to be 5 characters.

Quote: weaselman

indexes for direct access, which will be about 5 times as much space

Quote: weaselman

extensively indexed database uses about 100 times as much space as the data it stores.

Quote: weaselman

you need to impose structure upon them. Doing that will require at least 100 times as much space


Now you are just pumping the volume, applying the same over and over again, to suit the desired result.
Why not impose another set of indexes upon these indexes for a further 100 times more space?

Quote: weaselman

That is about half-Terabyte already, but you are missing lots of data.


Which you don't need, and neither did you need the above in the first place.

Quote: weaselman

Yeah, it is. If your program cannot theoretically answer every question (perhaps, incorrectly, but algorithmically), you have no hopes of winning the game.


Did humans answer every single question correctly? Or did Watson even buzz on every single question? No, but it won the game.

Quote: weaselman

I asked you to show me the list of categories you considered "relevant", and you named ... three.
The question remains: how, by what criteria, and by what method, are you suggesting to come up with the list of things that are worth knowing?.


I named just two categories, actually, not three.
The method is this: you take transcripts of ever previous episode and make a list of categories that come up, particularly ones that come up more than once.

Quote: weaselman

You are? I have not noticed you do that despite my repeated requests.


Actually I did, but your response was absurdly interpreting "cities, decades, and something else" as representing three categories rather than a non-exhaustive list of examples.

Quote: weaselman

That is exactly my point too. We don't know how humans think,


We can learn. We can theorize. We can test out these theories and come up with approximations.
Quote: weaselman

We can program a computer to do similar things, but it does it very differently than the brain, requiring different kinds (and amounts) of resources.


A perfectly programmed computer would require no more resources than a human brain for the same problem. This perfection can not be even approached, of course, and we don't know how much resources exactly a human brain has, although as said above estimates can be made.

Quote: weaselman

How do you know Gdansk is a settlement?


1) Deduce from the sentence that it is a location, then find "Gdansk" in the list of locations.
2) If that does not succeed, view "Gdansk" entry in the main index.

And no, that does not imply a billion different indices. In fact, Locations can be just a subset of the main index, not a separate one. The only additional index really necessary is an alphabetic index to allow for near-instant lookup of "Gdansk" or any word for that matter. From that point on, you have all the information you need to proceed with solving the question.


Quote: weaselman

You have to process "what", because it tells you what to look for. "Whose city is located 47km ..." is a very different question. You need "located" because it specifies the question in a very important way - that is the only indication that you are looking for a geographic information. And "from" is also important in similar way.


Yes. Exactly. You have to process them to determine what question is being asked.

But not run a full-text search on the terms "what", "from" and "located". Process the question as a sentence, not as a simple collection of words.


Quote: weaselman

Look, I described "my approach" to you in a way that you can program treating questions as input. That is algorithm. All you keep doing is saying to answer such-and-such question, I'd do such-and-such thing. This isn't saying much. You have to come up with an algorithm that does not know which question it is asked.


Or you have to come up with an algorithm that can determine which question it is asked, say, 95% of the time.


Quote: weaselman

Preferred it to what? I am guessing, that they, like me, simply did not know any other approaches


I seriously hope they did. Because there are other approaches. Ranging from the low-performance approach I mentioned to the extremely computing-demanding approach of building a simulated neural network.

If you are denying the very existence of other approaches, imagine a highly simplified version of the show. The only questions asked [99% of the time] are to name the largest city in one of 50 states, to name one of 50 states knowing only one of its three largest cities, or to name one of said 150 cities knowing only two other cities in the same state. Would you still insist the only (or even the best) approach to solve this problem would be to run full-text searches on every word in the question, and/or that the minimum required database size is ((50+150)*20)*5*100*100=200,000,000 bytes?

The difference between that and the full version is, much like you said above, quantitative. More questions, more categories, more trivia referred to. You can argue the relative efficiency of other approaches; you can not argue their nonexistence.


Quote: weaselman

How about $1000 will you accept the challenge?


How about a billion for starters? Using less computer power will require more human effort to deliver better optimized code.

Some hundred years down the line it might well require 500 exabytes to solve the problem, but it will literally be doable by a single person on a lunch break. Perhaps one of your grand-grandchildren will even conclusively prove that it's impossible to do with less than 400 exabytes.


Quote: weaselman

Yes, they would definitely have to approach it differently (there was simply no technology to approach it in this way). And they also would have probably failed to solve it too. Either that, or we would be living in a very different world now - the world, described by Asimov et al., where machines actually can think.


Or, perhaps, a world where search engines can deliver specific answers to "where is Albuquerque", "what does antidisestablishmentarianism mean" or "what is 17*86" questions. Wow, what a world would that be!!
Resist ANFO Boston PRISM Stormfront IRA Freedom CIA Obama
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
February 23rd, 2011 at 5:28:25 AM permalink
Quote: P90


Just about everywhere else, a word is considered to be 5 characters.



I have not been to "everywhere else". I just took a random page of text and counted the number of characters and divided that over the number of words. Don't forget, that you also need to count spaces and punctuation.


Quote:


Now you are just pumping the volume, applying the same over and over again, to suit the desired result.
Why not impose another set of indexes upon these indexes for a further 100 times more space?


Yes, there are lots of indexes that are needed. I am not "pumping it up", it's an inherent property of high-dimensional databases.
Imagine a phone book. If you want people entries to be accessible by either first name or last name, or both, you need two indexes.
If you want to index by three attributes, and their combinations, you need four, etc. It grows exponentially, and very quickly.


Quote:

Which you don't need, and neither did you need the above in the first place.



I am explaining to you how these things are done it practice. You keep saying "you don't need this, you don't need that", but are not suggesting any other way to do it. Well ...
Yes, I do need this, and I also needed the above very much.


Quote:

Did humans answer every single question correctly? Or did Watson even buzz on every single question? No, but it won the game.



I think, the concesus is that humans would answer all questions if Watson gave them a chance.
No, Watson did not buzz in on every question, but that does not matter. I said your program had to be capable of answering every question (perhaps, some incorrectly) as a necessary condition of winning the game.


Quote:


I named just two categories, actually, not three.



I asked you name all of them ...

Quote:

The method is this: you take transcripts of ever previous episode and make a list of categories that come up, particularly ones that come up more than once.



I think there are very few categories that came up more than once. Your approach is actually good for picking categories that will not come up. Except that it is hopeless, because it will let you exclude a few hundred entries out of a set of many trillions.

Since you don't watch the show, here are a few sample categories for you to get a taste of it:

Before & After,
Rhyme Time,
Stupid Answers
Name's the Same.


Quote:

Actually I did, but your response was absurdly interpreting "cities, decades, and something else" as representing three categories rather than a non-exhaustive list of examples.



I asked you name all the categories. You named two, and "something else", So I had a choice between thinking that you have a habit of intentionally ignoring the questions you can't answer and silently substituting them with different ones, and that "something else" is a category. I chose the option that was more favorable to you.


Quote:

A perfectly programmed computer would require no more resources than a human brain for the same problem.


This is the root of your mistake.
No, it would not. A computer in the commonly accepted sense of the word, would require much, much, much more resources.
What you call "a perfectly programmed computer" seems to be nothing other that the human brain. Computers cannot think like people.

Quote:


This perfection can not be even approached, of course, and we don't know how much resources exactly a human brain has, although as said above estimates can be made.



And no, this is not about unapproachable perfection. We are not even trying to approach that "perfection" by making computers better, and polishing programming techniques. We are not getting any closer, and never will (not in this field of science). We are moving in a totally different direction. To give you an analogy, a plane does not fly like a bird. At all. Not even a little bit. It can fly as high (higher), and as fast (fast), and as long (longer), but it just has nothing to do with how birds fly, and not because it is imperfect, but because it just uses completely different techniques. Same with computers. They are nothing like human brains, and they are not getting any closer, they are completely different apparatus.
Quote:


But not run a full-text search on the terms "what", "from" and "located". Process the question as a sentence, not as a simple collection of words.



No, of course not. I never suggested that you run full text text search on articles and prepositions. Like the earlier incident with naming the list of categories, this leaves an impression, that you are arguing with somebody else (yourself?) instead of me ...


Quote:

Or you have to come up with an algorithm that can determine which question it is asked, say, 95% of the time.


No, you have to come up with it.
But first you have to define what "which question" means in "computer terms". Computer terms are numbers, in other words, your first step should be to come up with an algorithm, mapping any possible question (ok, ok 95% of all possible questions - how many is 95% of infinity?) to one or more numbers.
Do you have that algorithm? Can you describe it?


Quote:


I seriously hope they did. Because there are other approaches. Ranging from the low-performance approach I mentioned to the extremely computing-demanding approach of building a simulated neural network.



1. There is no "low-performance approach you mentioned". Just because you mention something, does not mean it exists.
2. Look up neural networks. It is not what you think it is. In any event, they did not exist in the 80s. And they won't run on 386.




Quote:


If you are denying the very existence of other approaches, imagine a highly simplified version of the show.


I don't deny the existence of other approaches for a simplified problem. I am talking about this problem.

Quote:

The only questions asked [99% of the time] are to name the largest city in one of 50 states, to name one of 50 states knowing only one of its three largest cities, or to name one of said 150 cities knowing only two other cities in the same state. Would you still insist the only (or even the best) approach to solve this problem would be to run full-text searches on every word in the question, and/or that the minimum required database size is ((50+150)*20)*5*100*100=200,000,000 bytes?



Yes, this one would be a nice problem for a college freshman to program on his grandfather's Commodore 64. So what?


Quote:

How about a billion? Using less computer power will require more human effort to deliver more optimized code.



Like I said, I don't have a billion, but it will be worth your while: if this works, you will make much more than just a measly billion. You will also become the most famous and regarded figure in Computer Science, the figure of the scale of Einstein and Steven Hawking (or Babbage and Turing, if you prefer, but your achievement will be way bigger than theirs).



Quote:

Or, perhaps, a world where search engines can deliver specific answers to "where is Albuquerque"

, "what does antidisestablishmentarianism mean" or "what is 17*86" questions. Wow, what a world would that be!!



Oh, that's trivial. These are done pretty well in our own world:

Where is Albuquerque
The "antidises..." thingy
What is 17*86 (if you have firefox, try typing this question in the search bar, it's even cooler that way :))

Like I told you before. Computers are not people. Just because something looks hard to you doesn't mean it's hard for a computer, and the other way around.
"When two people always agree one of them is unnecessary"
P90
P90
  • Threads: 12
  • Posts: 1703
Joined: Jan 8, 2011
February 23rd, 2011 at 6:44:13 AM permalink
Quote: weaselman

I have not been to "everywhere else". I just took a random page of text and counted the number of characters and divided that over the number of words. Don't forget, that you also need to count spaces and punctuation.


Well, when "word" is used as a unit of measurement, it is not considered to be 20 characters, but five.
link=http://en.wikipedia.org/wiki/Words_per_minute


Quote: weaselman

Yes, there are lots of indexes that are needed. I am not "pumping it up", it's an inherent property of high-dimensional databases.
Imagine a phone book. If you want people entries to be accessible by either first name or last name, or both, you need two indexes.
If you want to index by three attributes, and their combinations, you need four, etc. It grows exponentially, and very quickly.


It does. Which can lead to ridiculously large databases. But indexing everything by every possible parameter is not the only method of building databases. There comes a point where choices have to be made to limit indexing to only the most used parameters. Otherwise the databases would grow infinitely.


Quote: weaselman

I think, the concesus is that humans would answer all questions if Watson gave them a chance.
No, Watson did not buzz in on every question, but that does not matter. I said your program had to be capable of answering every question (perhaps, some incorrectly) as a necessary condition of winning the game.


But they answered some of them wrong.
Adding a capability to deliver a low-confidence answer is not that hard. After all, even the full text search approach I berated could be used - but as the last resort.


Quote:

I don't think there are any categories that came up more than once. Your approach is actually good for picking categories that will not come up. Except that it is hopeless, because it will let you exclude a few hundred entries out of a set of many trillions.


Are you sure? It's hard to imagine that the Watson episode was just so lucky to be the first and one to ever hit a category as generic as "US cities". Nonetheless, if the categories indeed are changed every time, they still have to repeat at least partially (as in "Canadian cities", etc), enough to be of use.


Quote:

I asked you name all the categories.


That's an overreaching question. It would take me hours of watching the show (which I have no interest in watching) just to compile a fraction.
One thing I did notice watching the show, though, is that it does not require deep encyclopedic knowledge of its participants as much as largely generic trivia.


Quote:

What you call "a perfectly programmed computer" seems to be nothing other that the human brain. Computers cannot think like people.


Not necessarily nothing other than human brain. Computers are better at some things than people. Data retrieval is one of the things a computer could potentially do better, thanks to perfect memory. Show's format complicates it, but only to a point.


Quote:

And no, this is not about unapproachable perfection. We are not even trying to approach that "perfection" by making computers better, and polishing programming techniques. We are not getting any closer, and never will (not in this field of science). We are moving in a totally different direction.


We aren't indeed trying to do it. We find it cheaper to throw more silicon at our problems than to develop high-efficiency algorithms. Most modern software doesn't even have a small fraction of a steam engine's efficiency, just wasting storage space and cycles.

If we hit the brick wall of processing power and storage density say in the 1990s, we would see things moving in a different direction.


Quote:

No, you have to come up with it.
But first you have to define what "which question" means in "computer terms". Computer terms are numbers, in other words, your first step should be to come up with an algorithm, mapping any possible question (ok, ok 95% of all possible questions - how many is 95% of infinity?) to one or more numbers.


No, not 95% of infinity. It's closer to mapping the center 95% of the normal distribution. Either end of the curve can go to infinity, but 95% of the area falls into a definable and finite subset.

If you develop and perfect a set of algorithms that solves 95% of all the previously asked questions, chances are good you'll beat 90% in the next episode as well. It would be a massive undertaking, the scale of the effort trumped only by its uselessness, but compiled program code even for thousands of question formats still takes less space than databases sorted by 50,000 parameters.


Quote:

2. Look up neural networks. It is not what you think it is. In any event, they did not exist in the 80s. And they won't run on 386.


I'm aware of what they are and how far out of reach ones of considerable ability are. That was a move in the opposite direction - into the far future. Where the problem will [if infinite progress continues, etc] be possible to solve with almost no human effort (using AI), but another dozen orders of magnitude more computing effort than used for the Watson solution.


Quote:

I don't deny the existence of other approaches for a simplified problem. I am talking about this problem.


Which is still a problem of finite complexity. The extension of the solution for the simplified problem to a broader set of potential questions would be a valid potential approach. What can be argued is relative computing requirements and hit percentages, but not the nonexistence of other approaches.

Take chess for another example of multiple approaches. The brute force approach for chess, analyzing all possible outcomes, is far more computationally expensive than we have capacity in the entire world. But chess has been beaten by computers, through decades of algorithm development, with an insignificant fraction of the computing power a brute force approach requires. Could Jeopardy be? It's impossible to tell, just like it used to be impossible to tell if chess could be beaten before programs started to become competitive (not only due to extra MIPS, but just as much due to better algorithms). But it's not impossible, that much is certain.


Quote:

Like I said, I don't have a billion, but it will be worth your while: if this works, you will make much more than just a measly billion.


Teaching people how to solve a thousand dollar problem by throwing a billion dollars at it? No, I don't think it would be very profitable.


Quote:

Oh, that's trivial. These are done pretty well in our own world:
Where is Albuquerque
The "antidises..." thingy
What is 17*86 (if you have firefox, try typing this question in the search bar, it's even cooler that way :))
Like I told you before. Computers are not people. Just because something looks hard to you doesn't mean it's hard for a computer, and the other way around.


Of course. That was the point - the world would be just like this one, because computers already can directly answer questions like these.
Currently the list of questions google answers directly rather than as simple search results is rather short, but it's certainly not an limitation of insufficient processing power, rather of small human effort being invested in it. Just that nobody cares all that much, users are satisfied enough with regular search.
Resist ANFO Boston PRISM Stormfront IRA Freedom CIA Obama
thecesspit
thecesspit
  • Threads: 53
  • Posts: 5936
Joined: Apr 19, 2010
February 23rd, 2011 at 8:56:16 AM permalink
"1. There is no "low-performance approach you mentioned". Just because you mention something, does not mean it exists.
2. Look up neural networks. It is not what you think it is. In any event, they did not exist in the 80s. And they won't run on 386."

Neural networks were around in the late 80's. The defining paper was written in the mid-80's.

You can program one on a 386. Not to level you guys are arguing about, but I'd be surprised if you couldn't, as one of the only successful C++ projects I ever wrote was a 2 layer Neural Network on a Windows 3.1 486 box. And I'm awful at coding. I wrote one to try and predict football (soccer) results. It did a rather good job of hitting the same percentages as the general numbers.
"Then you can admire the real gambler, who has neither eaten, slept, thought nor lived, he has so smarted under the scourge of his martingale, so suffered on the rack of his desire for a coup at trente-et-quarante" - Honore de Balzac, 1829
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
February 23rd, 2011 at 9:30:36 AM permalink
Quote: P90

Well, when "word" is used as a unit of measurement, it is not considered to be 20 characters, but five.
link=http://en.wikipedia.org/wiki/Words_per_minute



I am not using it as a unit of measurement. I just used the number of words in Britannica to estimate it's size, by taking a page of text and dividing its size by the number of words, and then multiplying by the number of words in Britannica.

Quote:


It does. Which can lead to ridiculously large databases. But indexing everything by every possible parameter is not the only method of building databases.


I did not say you had to index everything by all combinations of parameters, that would be impossible. But a reasonable number of indexes to support real time queries will still take much more space than the data itself.

Quote:


Quote: weaselman

I think, the concesus is that humans would answer all questions if Watson gave them a chance.
No, Watson did not buzz in on every question, but that does not matter. I said your program had to be capable of answering every question (perhaps, some incorrectly) as a necessary condition of winning the game.


But they answered some of them wrong.



I think, the consensus is that humans would answer all questions correctly if Watson gave them a chance.

Quote:

Adding a capability to deliver a low-confidence answer is not that hard. After all, even the full text search approach I berated could be used - but as the last resort.


Yes, Watson does have that capability. If you need to support full text search, you need resources required to support it. Last resort or not is irrelevant.


Quote:


Quote:

I don't think there are any categories that came up more than once. Your approach is actually good for picking categories that will not come up. Except that it is hopeless, because it will let you exclude a few hundred entries out of a set of many trillions.


Are you sure? It's hard to imagine that the Watson episode was just so lucky to be the first and one to ever hit a category as generic as "US cities".


Some categories are more generic than others, but that's rather an exception than the rule. More often then not, categories look like the ones in my example above that you chose to ignore.
Note also, that even "U.S. Cities" is not usually as generic as it looks to you. A question like "An author of a book about the boy who made a chore of painting a fence a profitable enterprise was born in this city" is something that perfectly fits that category. There can be even more levels of indirection.

Quote:


That's an overreaching question. It would take me hours of watching the show (which I have no interest in watching) just to compile a fraction.


Exactly. That's the whole point. What you suggested is not simply hard to do. It is impossible. Let me, again, draw your attention to the samples from my previous post. The possibilities are infinite. That's what makes the game interesting. It is not a trivia game per se. A big part of finding the answer is unwinding the long chain of associations.


Quote:

Not necessarily nothing other than human brain. Computers are better at some things than people.


Yes, computers are better at some things. Exactly for the reason that they do not function like a human brain. They are worse at other things for the same reason. Just like planes and birds. A plain can fly faster and higher than a bird, but it can't descend vertically, and it needs huge airfields to land. That is not because it is imperfect, but because it uses a completely different technology.

Quote:

Data retrieval is one of the things a computer could potentially do better, thanks to perfect memory.


I think, you are confusing data retrieval with storage capacity. These things are usually negatively correlated.

Quote:

We aren't indeed trying to do it. We find it cheaper to throw more silicon at our problems than to develop high-efficiency algorithms.


No, not at all. This is not about money. We simply don't know how to do it differently. That's why I keep saying - if you think you do, you should market your approach, make it known, get some publications. If you are right, you are the only person on Earth who has it.

Quote:


No, not 95% of infinity. It's closer to mapping the center 95% of the normal distribution. Either end of the curve can go to infinity, but 95% of the area falls into a definable and finite subset.



What are you basing your assertions on?

Quote:


If you develop and perfect a set of algorithms that solves 95% of all the previously asked questions, chances are good you'll beat 90% in the next episode as well. It would be a massive undertaking, the scale of the effort trumped only by its uselessness


Uselessness? It would be the biggest scientific breakthrough since ... I dunno ... Newton, perhaps.
The applications of this technology are everywhere. You'd really become the richest and most famous person in the world if you could come close to what you are talking about.


Quote:

I'm aware of what they are and how far out of reach ones of considerable ability are. That was a move in the opposite direction - into the far future.


What's "a move into far future"? Neural networks are very heavily being applied in the present. There is nothing "futuristic" about them (except, maybe, the name).

Quote:


Quote:

I don't deny the existence of other approaches for a simplified problem. I am talking about this problem.


Which is still a problem of finite complexity.


Yes, it is a problem of finite complexity (Watson did solve it). Just of an immeasurably larger one than everything you are talking about.

Quote:

The extension of the solution for the simplified problem to a broader set of potential questions would be a valid potential approach.


"Potential"? Maybe. "Practical"? Not at all.

Quote:

What can be argued is relative computing requirements and hit percentages, but not the nonexistence of other approaches.


The "argument" for other approaches would be to suggest one. Simply saying that they might potentially exist isn't it.
I asked you to substantiate your claims several times, but you are not responding.
You said "come with a list of categories", but failed to enumerate them. You said "understand the question", but failed to suggest an algorithm that would be suitable for that. You said "then apply an optimized algorithm", but never named what it would be. I gave you a few sample categories to make the discussion more substantial, but you ignored them, and stuck to "US cities". This is not an "argument". It might deserve to be called a supposition but you are insisting on it too much for that.



Quote:

Take chess for another example of multiple approaches. The brute force approach for chess, analyzing all possible outcomes, is far more computationally expensive than we have capacity in the entire world. But chess has been beaten by computers, through decades of algorithm development, with an insignificant fraction of the computing power a brute force approach requires.


It is not an example of multiple approaches, not at all. There was never a brute force approach (it was understood by everyone, that it was impossible). The tree of the game is trimmed to limit the number of combinations to consider.
At the beginning, the computers were slow, and the amount of storage was small, so they could only consider few variants in reasonable time, and also lacked the knowledge base to do the trimming intelligently. Now, they are many orders of magnitude faster, so there are many more branches of the tree they can consider, and they also have massive databases, containing every game ever played, that allows them to do trimming a lot more efficiently.
This by no means a different approach than what it was twenty years ago. The approach is the same, but more resources are available to throw at it. This is actually exactly the same situation as Watson - 20 years ago, you could have built a computer, that would answer some (categories of) questions some of the time. But to be able to do what Watson does, you need today's technology, just like you can't expect a 386 to win a match with Kasparov.


Quote:

Teaching people how to solve a thousand dollar problem by throwing a billion dollars at it? No, I don't think it would be very profitable.


"A thousand dollar problem"? Which "thousand dollar problem" are you referring to? Getting instant medical diagnoses? Unlimited advances in scientific research? Real time communications without language barrier? Unmanned libraries, stores, any consumer-facing businesses? I can go on and on ...
"A thousand dollar problem"? I think, you just don't understand what you are after. This is a FREAKING REVOLUTION.

Quote:

Of course. That was the point - the world would be just like this one, because computers already can directly answer questions like these.


Do you think google uses less resources than Watson to answer those questions?
Also, perhaps, it would make sense for you to watch a few episodes of Jeopardy to get a feel of the questions (and categories) that appear there. They are usually nothing like the ones you suggested.
"When two people always agree one of them is unnecessary"
P90
P90
  • Threads: 12
  • Posts: 1703
Joined: Jan 8, 2011
February 23rd, 2011 at 11:10:00 AM permalink
Quote: weaselman

I did not say you had to index everything by all combinations of parameters, that would be impossible. But a reasonable number of indexes to support real time queries will still take much more space than the data itself.


More, but not 50,000 times more, if you are at least trying to limit the storage space used.

Quote: weaselman

I think, the consensus is that humans would answer all questions correctly if Watson gave them a chance.


That was not the case. I've seen humans answering questions incorrectly, including in that Watson episode. IIRC it was the rally question - Watson answered "auto racing", a human answered something wrong. There was also one where they all were wrong, IIRC.
And in other episodes, there were quite a few wrong answers as well. It wasn't even a 90% hit percentage, based on what I've seen, even with top contestants.


Quote: weaselman

Yes, Watson does have that capability. If you need to support full text search, you need resources required to support it. Last resort or not is irrelevant.


If you only keep a reasonable amount of plaintext data, say, 100-200 megabit* - that's around what it takes for Jeopardy, their questions tend to be really shallow, common knowledge stuff, designed so that the average American would often think "wow, I knew that" - then these resources are not at all enormous. Even with indexes added.

*I'm using megabits, because storing text for computer search in 8-bit format should border on criminal negligence. There are different ways, it can be 5-bit encoding, it can be dictionary encoding, etc.


Quote: weaselman

Note also, that even "U.S. Cities" is not usually as generic as it looks to you. A question like "An author of a book about the boy who made a chore of painting a fence a profitable enterprise was born in this city" is something that perfectly fits that category. There can be even more levels of indirection.


There can be. In the two Watson episodes, most questions were quite straightforward, though.


Quote: weaselman

Exactly. That's the whole point. What you suggested is not simply hard to do. It is impossible. Let me, again, draw your attention to the samples from my previous post. The possibilities are infinite.


Then let me, just as well, reiterate what was posted below in the previous post: you only need to hit the center of the curve. On questions you get right, you win by having a quicker trigger finger. You can even afford a lower accuracy than humans, because when you do hit, you win, while humans depend on their button pressing skills.

Watson didn't just win the game, it aced it. Obliterated, walked over, Terminator going through a kindergarten. It was like shooting two unarmed humans with a 14.5x114mm machinegun.
There is a term for that: overkill. Because not only would a single shot per target from that gun suffice, but even a shot from a much smaller gun like a 9mm pistol. Watson's performance went far above and beyond what was required to just win.

And, what's also to be noted, the times Watson won, it won with a large margin of confidence (as could be seen in that episode where they showed its three answers). The times it didn't win, it lost with an even larger margin. This indicates that increasing its processing power and storage capacity, say, 1,000-fold would have negligible if any at all effect on the results, programming being the same. Conversely, even with same programming, a major reduction in performance and capacity could be afforded with negligible effect on the results. There were very few if any marginal cases. It was 99%-ish confidence and it hit, or it was 30% confidence and it didn't, or it was 99%-ish confidence, it hit, and was proven wrong. A computer is only good as its software, and if anything, Watson was a perfect example in case. Even a 1,000,000-fold computing performance increase would not change the outcome in any of these cases, neither would a major (albeit not as overwhelming) decrease. That is with same/similar programming, which, while effective, was also overkill.


Quote: weaselman

Yes, computers are better at some things. Exactly for the reason that they do not function like a human brain. They are worse at other things for the same reason. Just like planes and birds. A plain can fly faster and higher than a bird, but it can't descend vertically, and it needs huge airfields to land.


Can't resist but call your attention to AV-8, Yak-38 and F-35B.


Quote: weaselman

That's why I keep saying - if you think you do, you should market your approach, make it known, get some publications. If you are right, you are the only person on Earth who has it.


Aside from a hundred thousand or so other persons who have it, of course. Admittedly, it's not used for solving Jeopardy, more like for questions "This aircraft has a 1200F exhaust temperature moving at 420kts" - "What is, Fishbed?", but nonetheless.


Quote: weaselman

Uselessness? It would be the biggest scientific breakthrough since ... I dunno ...


The toilet snorkel?


Quote: weaselman

The applications of this technology are everywhere.


No, just in a few places. Customizing algorithms to questions likely to be asked on a particular show is a dead-end waste of effort that has no utility outside the show. Since it's hard to argue the show has any utility either, coders' time would be better spent making geek porn.
Places where this 'technology' is needed or practical already use it. A dinosaur set of algorithms designed to solve a specific narrow task with strict limitations on computer resources is worth as much today as the invention of storing only the last two digits of current year.


Quote: weaselman

What's "a move into far future"?


Human-level AI.


Quote: weaselman

"Potential"? Maybe. "Practical"? Not at all.


Practical, not. Possible to implement with less computer resources, yes.


Quote: weaselman

The "argument" for other approaches would be to suggest one.


As I did. And outlined the general principles. I'm not going to make the leap to actually dedicating the remainder of my life to attempting to implement it. I could give example solutions to a simplified subset of the problem, but if you want a complete solution without any simplifications, no luck here - Watson's software wasn't created on a lunch break either.


Quote: weaselman

It is not an example of multiple approaches, not at all. There was never a brute force approach (it was understood by everyone, that it was impossible).


Of course there was. For a short while in the 1970s, it was the winning approach, you brute-searched as deep as you had the time to. It not only was there, it outperformed chess programs that tried to trim the tree. How come? Early algorithms for the latter were so poorly developed that their utility was negative. It took years of development and professional chess grandmasters' involvement to develop algorithms that were better than brute force.


Quote: weaselman

This by no means a different approach than what it was twenty years ago. The approach is the same, but more resources are available to throw at it.


No. No and no and no.

Chess engines have evolved massively, particularly in terms of decreasing computing requirements for given performance. Rybka is to original Computer Chess what the 0.953L engine in Kawasaki Z1000 is to the 0.953L engine in Benz Patent Motorwagen.


Quote: weaselman

"A thousand dollar problem"? Which "thousand dollar problem" are you referring to? Getting instant medical diagnoses? Unlimited advances in scientific research?


Virtual reality? Eternal life? Cubic time? Faster than light travel? Zero interest mortgages? Vulcan death grip? Nuclear pumped X-ray lasers? Unlimited penis length? Parallel universes? Operating thetans in garlic cheese sauce?


Quote: weaselman

Do you think google uses less resources than Watson to answer those questions?


Yes, certainly. Watson's uptime is worth, I think, at least a few grand an hour. Electricity alone is probably over a hundred. Since google permits me to use its services for the lousy cost of a couple text ad displays, I suspect its costs per query are much lower than Watson's, even considering that Watson does its queries in sub-second times as well.
Resist ANFO Boston PRISM Stormfront IRA Freedom CIA Obama
thecesspit
thecesspit
  • Threads: 53
  • Posts: 5936
Joined: Apr 19, 2010
February 23rd, 2011 at 11:27:50 AM permalink
There's two different approaches here... do you write custom software to beat Jeopardy (yawn) or software that can generally parse a question and answer it better than a human being with the same access to the same resources, and in a much shorter time.

The latter would be useful. The former, well not so much. I don't think you'd learn much useful by the former, though it's an interesting exercise.
"Then you can admire the real gambler, who has neither eaten, slept, thought nor lived, he has so smarted under the scourge of his martingale, so suffered on the rack of his desire for a coup at trente-et-quarante" - Honore de Balzac, 1829
  • Jump to: