Reversing our searching habits
"Power searching without google"
Fravia's contribution 
for the Recon 2006 conference 
(Montreal, Canada - 16 June 2006)
This file dwells @ http://www.searchlores.org/reversing_our_searching_habits.htm
Introduction 
  
Today's quarry     ~     Look ma, no google!
Sliders, SEOs beasts,  and Sinking google     ~    
Google weaknesses (and strengths)
The golden rules of searching     ~     
Conclusions 
(Assignments     ~    
Forms)
Behold reversers' foresight, tremble before their insider knowledge...
Thank-you for the invitation. Please excuse three shortcomings of mine: 
1) English is not even my first foreign language, so bear with me 
for my language shortcomings
2) I'm not and 
I'm not going to be politically correct, so bear with me for my attitude shortcomings
3) I won't use any boring powerpoint style presentation, 
 so bear with me for my formal shortcomings
 
       I am happy to speak -once again- to a bunch of reversers. I like reversers. Today I intend to discuss with you some 'alternative'  
       searching techniques. As a proof of concept we will search  some assembly books, given that 
       assembly knowledge is a most powerful weapon, and in  doing so 
       we will, maybe, demonstrate a) that everything is on the web; b) that you can  search effectively even without google; 
       c) that  some "alternative searching paths" can be quite useful for seekers.
       Of course we will find anything: whatever. Any book you fancy is there for the take. In fact 
anything that may have been digitized:
        books, music, films, software, is 
       there for the take. And not only "tangible" targets. 
Solutions ar there for the take. 
       Ideas. Methods. Techniques. Line of attacks. 
       Tactics.  
        Approaches. Strategies...
    But we won't overdo it today. We'll just search some assembly-related books. 
       And we will find our targets, of course: we will find our books. 
      
Note that somebody, some time ago, told me -politely- that we should not download  books that  are patented, or have not  yet 
       been released in the public domain. OK. I can understand that, as silly and strange as it sounds: in fact patents and 
       copyrights are just ways to negate all freedom of speech. Remember Freenet: 
       
You cannot 
      guarantee freedom of speech if you enforce copyright law...
 
       But "today let's obey", for the sake of pur experiment, and
       therefore we will just have an innocent look at our targets 
on line. Don't download forbidden fruits, please,  
       this whole talk is just an exercise, 
       aimed at empowering reversers,  giving them some sprinkles of 
web-seeking cosmic power. 
       You should become seekers, not leechers.
(In general) reversers are a  bunch of 'matter of fact' fellows: their most 
peculiar endowment is the capacity to reverse any "coded reality" around them.
In  a world of codes noone is supposed to understand, reversers 
are among the very few that have the decoding knowledge or -maybe even better- the capacity 
to find out any necessary  
	decoding knowledge...
 
As a consequence of the useless 'commercialisation' and hypermarketing of the web, some important searching tools, like the 
main search 
engines, are less and less capable of delivering useful results. This is -alas- true for google as well, whose results have kept getting 
worse  over the last months, mainly due to the growing spamming and link-farming activities of the 
SEO beasts.
Since google is the most used among the main search engines, today's talk will concentrate on google's limits, 
but I surely do not intend to blame only google for problems that in most cases all other search engines have as well.  
 
Note that google-marketing hypes, like google being 'your friend', or google being 'the mother 
of all modern search engines', with a 'nice brand' face and a 'commitment  to the 
community of searchers', do not mean anything to us. Frills and hypes never work with reversers. 
Google is just another commercial bastard without soul...  
exactly like microsoft. The only difference is that google is (still) waay more useful than microsoft for knowledge gathering purposes.
We use search engines as starting tools, 
in order to gather knowledge, and we use  tools only as long as they are useful and 'deliver' results. 
As soon as they become useless, we ditch them without regrets and switch to different ones. 
This happened five years ago with 
altavista, once upon a time the 
favourite seekers' search engine (note that altavista still has a 
powerful & useful Boolean 
NEAR operator, btw), 
and it may happen anew and again with google. 
Panta rei.
The real point is that the 
main search engines are 
just ONE among 
many possible ways to search the web. As we will see together today. And the main search engines 
are not the  friendliest tools for searching purposes either... just remember the very REASONS they exist...
Privacy concerns are only a small part of the "non friendliness" problem of all main search engines: 
  search engines data provide a look into people's privacy, 
but privacy awareness & consumer protection have not kept (and will never keep) 
  pace with this: if you allow private companies to collect 
 such reams of data, 
any bogus attorney with half an excuse will easily get hold of them. 
Just imagine the volume of personal information that  a search engine 
can provide: every search query you've ever 
made on a given computer with a given browser.
 (That's btw a good reason by itself to use    not only 
different searching approaches, but also 
different browsers and 
different internet access providers (in general you are 
 better served 
wardriving and    
 avoiding 
your own 
provider as much as you can :-) ...
Searching with google, using a GSM or 
buying stuff with your credit card is always the same sad story: 
you just smear your own data around for everybody and his dog to see, gather, collate and use against you.
But seekers know how 
to maintain some (relative) anonymity...

.  
And seekers also know how 
to maintain some (relative) speed wen searching. Your browser (
Opera) is of tantamount importance for this, 
and can do wonders, if correctly trimmed...

.
The real problem -when searching- is not anonymity and it is not speed: it is the 
relevance, 
coherence and 
reliability 
of our searching results.
 
Unfortunately we are  now in a phase where even clueless bystanders notice how google is getting spammed much too much. 
It's usefulness as the "best and only" search engine has been therefore severely reduced.
This is sad, but it does not matter very much: 
we began long ago to prepare  
alternative paths to web-knowledge. Let's present some of them -today- to 
this community of distinguished reversers. 
      
  
  
| Today's quarry (the simple, direct way to shoot a quarry down) | top | 
O gosh, I didn't know all these books were already in the public domain! 
Ah! Assembly! The most dangerous weapon a reverser can master!
The following books about assembly could represent an appropriate "quarry" for today's example queries:
   
-                                                                                
Assembly Language Step-by-Step: Programming with
DOS and Linux, 
by Jeff Duntemann. (John Wiley, ISBN:0471375233), 2000
A fairly old text to begin with; should be available all over the web -   
   Linux Assembly Language Programming by Bob Neveln. (Prentice Hall,  ISBN: 0130879401),  2000
   Another fairly old text, hence relatively easy to find.
 - 
The Art of Assembly Language by Randall Hyde (a MASM expert). (No starch press, ISBN 1886411972), 2003 
Our target is the 'published edition' of this book, which is also freely available 
on the web in html, pdf and chm format
(This book should not be confounded with "The Art of Disassembly", by my friend Zero and alia)  - 
Professional Assembly Language by Richard Blum. (Wrox, ISBN: 0764579010), 2005
 
	-    Disassembling Code IDA Pro and SoftICE by Vlad Pirogov (A-List, ISBN: 1931769516), 2005
		the last two examples will show how  even relatively recent books can  easily be found.
	
	   
	
	Now, of course
	we all know that 'printed' books on such matters are often both obsolete and   superficial, 
	and we also know  that the best knowledge (and material) about assembly is to be found   instead 
	in some 'grey areas' of the web, for instance visiting 
Iczelion's wondrous 
	site  or my friend Woodman's  
incredible repositories. 
	
This query is just a proof of concept, the specific quarries don't matter that much: you'll be able to adapt the  following 
	approaches to OTHER, different, targets of yours...  
	
	    replace for instance "assembly" with "python" and you'll obtain a library of python books instead. Or whatever. 
	    The approaches and the techniques we will see  together 
	    are important, the targets themselves are irrelevant.
	
	
As you will see another interesting side effect of a correct web-seeking approach 
	is  that often, when searching for something, you will find on the same servers 
	
many other targets related to your topic that you did not even know existed. Imagine you are  
	retrieving a book inside a library, so that you can have a look at the books -related to the same topic- that are 
	
physically  located on the same shelf, next to your target 
	when you arrive to pick it up...
 
         Searching for 
books on the web is -most of the time- extremely simple. 
       Please note that -in general- you can find most targets just inputting a title "tel quel" in any search engine: for instance a simple 
       query 
"Reversing: Secrets of Reverse Engineering"
       will  almost immediately give us 
Eilam's   brick (complete), 
       quite useful -if you bother to print it- to fix the height of your screen.
       
        
Therefore, let's begin with our 
first target 
       simply inputting 
       
Assembly Language Step-by-Step: Programming with DOS and Linux
       
        in google 
tel quel... and whoop!  
       We receive  SERPs full of clowns that want to "sell" us this target :-(  
       
Now -in theory- I should say 'see how useless is google', and show you how to find this book without using google.
       Unfortunately (for the purposes of this talk) adding a simple 
&as_filetype=pdf 
       to this 'tel quel' searchstring we can fetch through google 
       the  complete pdf 
       version of this target 
at once, 
       	much too easy! Go figure! 
         
         hxxp://www.coltech.vnu.edu.vn/ttmt/ebooks/John.Wiley.And.Sons.Assembly.Language.Step-by-Step.Programming.with.DOS.and.Linux.Second.Edition.iNT.eBook-DDU.pdf
        
       Well what can we do? This was too easy. Since we landed here, let's have a look at the folder where we found this 
       target. Let's  'peel back the URL-onion', something you should routinely do, and -see- we'll land here:
http://www.coltech.vnu.edu.vn/ttmt/ebooks/
   Woah! A whole collection. 
Quod erat demonstrandi. And now we have already found our 
second book: 
Linux 
	Assembly Language Programming, Prentice Hall.  Two bingos with just one arrow.
For our 
third target  
The Art of Assembly Language by Randall Hyde, a simple google 
tel quel 
query won't apparently deliver any useful results.
Good. Let's try an alternative, non-google, 
(classical) approach to this kind of quarry using MSN Search and its very useful sliders and looking for one of the many possible 
file 
	repository target, for instance 'rapidshare':
                                  
http://search.msn.com/results.aspx?q=%7Bfrsh%3D94%7D+%7Bmtch%3D69%7D+%7Bpopl%3D33%7D+rapidshare+%22Art+of+Assembly+language%22&FORM=QBRE
Please notice the 
{frsh=94} {mtch=69} {popl=33} in the 
querystring...
The rapidshare query worked: here we already have 
The Art of Assembly 
	Language, our third target (beta draft: do not distribute). It is worth noting that
	 this result was retrievable through google as well, just massaging 
	the querystring a little: 
ebook "The Art Of Assembly Language" "No Starch Press".
	
	Another possible approach is to use a phrase from the text itself: a text-snippet. 
	In google: 
"Hello, World of Assembly Language", 
	or in yahoo: 
"Hello, World of Assembly Language". 
	This is the preferred method to fetch on-line html (or pdf) editions (also non *.zip, *.rar or *.chm files). 
	
The "complete snippet" trick is most useful for 
	fiction books. You want Harry Potter? Have it: 
	
"There was a definite end-of-the-holidays gloom in the air when Harry awoke next  morning. Heavy rain was still splattering against the window as he got dressed in  jeans and a sweatshirt; they would change into their school robes on the Hogwarts  Express."
	 and it does not have to be so long, of course: 
"There was a definite end-of-the-holidays gloom" 
	 will suffice, albeit including some beastly spammers in the SERPs. 
   
   Let's now try our 
fourth example: 
Professional Assembly Language by Richard Blum, searching 
   in google 
   
tel quel 
   we fetch a lot of commercial noise, but with the simplest trick (adding a geographical limitation) if we search
   
   
tel quel in russia, we 
   fetch this target at once.  It may be worth noting in this context that when perusing 
   your search results it is ALWAYS a good idea 
to open the cached version of a target first,  
   and only then consider using the original link. This saves bandwidth (most crap goes away), 
   saves time (interesting servers are usually slow or overloaded), and does not unnecesaryly increase the popularity of the target URL...

 
     So, in this case, a possible link (among many) would be 
   
http://66.249.93.104/search?q=cache:45kcLLqeyd0J:avaxhome.ru/ebooks/professional_assembly_language_2.html.
   
     
  
             Finally, for our 
fifth title, we just try a "chinese" search: 
             
 "Disassembling Code IDA Pro and SoftICE", 
             note the     
&lr=lang_zh-CN snippet.
             And  
bingo!  
A possible alternative for our fifth title 
             is the 
 adding "htm" or "html" searching trick. 
             Here an example with yahoo: 
disassembling-code-ida-pro-and-softice.html 
     
     
   Note that we could also search 
all our titles together with a nice "potpourri" approach. 
   
Potpourri searches     
        
        At times simply guessing that interesting places MUST have all your targets on the same page can
        be useful... in order to find interesting places 
and your targets :-) 
        
Here a "potpourri"  example:  
         
"Disassembling Code : IDA Pro and SoftICE" "Professional Assembly Language"
               
  
   
     Anyway, as you have seen, any and every book is at seekers' disposal.
       
      
    (
Caveat: 
Please download these books only if you are positively sure they have been released 
on the public domain. 
Real seekers do not need to waste harddisk space downloading dubious stuff. Besides it could even be constructed as 'illegal' by the beastly patents 
holders and their political lackeys. Downloading is not necessary! Seekers will always find again and again on the fly 
-and consult on line- 
whatever they fancy).
      
Ok, ok. It was too easy. Much too easy. In order to continue 
let's imagine for a moment that it would NOT have been so excessively easy to find these books through such simple searches. 
Imagine we could not find them.  Imagine we will not find them again on those specific URLs. 
After all 
the web is a quicksand and the specific locations where we found these  targets could 
soon disappear (and in fact probably will, after today's talk :-)
  
  
Sprinkles of cosmic searching power 
Let's  find again our 'assembly' targets WITHOUT using the simple querystrings above
	
- We could go for the  format.
 For obvious reasons, given that our quarry are books, the most probable formats will be 
.chm,
.pdf,
.rar (or .zip or .ace). 
Of course we could 
take advantage of this through google as well: -inurl:htm -inurl:html intitle:”index of” +(“/ebooks”|”/book”) +(chm|pdf|zip) +”assembly”.
    Here a similar query for yahoo: +(“/ebooks”|”/book”) +”assembly” intitle:"index of" 
    (note the &vf=pdf inside the querystring).
Let's examine these two querystrings step by step... 
 
- We could go for the  name.
A banal yahoo/rapidshare search:  "Professional Assembly Language" rapidshare will 
give us not only some locations, but also some (much more valuable) names: 
- Wrox.Professional.Assembly.Language.Jan.2005.eBook-DDU.zip.html   (DDU for "Day Day Up")
	
(hxxp://rapidshare.de/files/993649/Wrox.Professional.Assembly.Language.Jan.2005. eBook -DDU.zip.html)
                                               - WPALJ2005-DDU.pdf.html
                                              	 (hxxp://rapidshare.de/files/7456769/WPALJ2005-DDU.pdf.html)
                                             - W.P.A.L.rar 
(hxxp://rapidshare.de/files/3119399/W.P.A.L.rar.html)
                                            	 - WPAL.rar 
(hxxp://rapidshare.de/files/2595619/WPAL.rar.html)
                                            	 
                                            		Now, even if these specific repositories' URLs disappear, and they will, you already 
                                            		have  the golden  
                                            		names, and hence the keys to find these files again and again whenever 
                                            		you want.     Note also that some guessing can be useful 
                                            		when searching a target on the web. WPAL is clearly the acronym of  
                                            		Wrox.Professional.Assembly.Language, but the author name, Richard Blum, could also be used, hence
                                            		
            RBlum.rar,    RBlum.zip,    RBlum.pdf are all possible 
            valid options.
            And indeed, again and for the last time with google: (RBlum.rar OR RBlum.zip OR RBlum.pdf)
  
 
- We could do it like  lamers.
  	
  	The lamers' way: using P2P and torrents
 Using P2P specific search engines  
     is a somehow banal, but often useful way to find your targets.
     Let's exempli gratia give filedonkey a chance:   
     let's just input assembly and see what we fish. See?
     Another "lame but valid" possibility is of course to check what torrents can deliver 
     when you search -say- assembly.
     Anyhow real seekers        deprecate   P2P and Torrents "searches". It is always 
     much quicker  and more elegant    
     to fetch your targets from URL locations    nobody is visiting, than to slowly download them from overwhelmed servers together with 
     a zillion other zombies doing the same thing.
     
      
   
      
      
 
- 
	
	We could search  elsewhere
	
	
		  
		
		- 
FTP
 
Using ftp specific search engines is in many cases a good way to quickly find your targets.
Let's give our lithuanian friends a chance:   let's just input professional assembly.
Here we have two results: 217.16.23.33 root/books/edocs/Wrox (in rar format) and        85.30.196.165 root/pub/Info/Books/_ebuki.powernews.ru_/_downloads_/downloads.ebuki.apvs.ru/Wrox (in pdf format).
  
  We could also try Dalian's ftp: assembly        
                           
   
  
    
-  
	
	Going local (Homepages, Usenet, topic-related messageboards & Webrings)
 
 At times it can be useful to "throw a narrow net" in the searchscape, going  "specific".
 
 Searching webrings you'll discover that there's a
 x86 Assembly Language Webring and  
 the     Win32Asm ring. 
 On usenet you have comp . lang . asm . x86 , 
  alt . lang . asm and
 alt . os . assembly (this last -alas- spammed to death).
 
      
 You could also Search private homepages 
 or AOL 
 for "assembly language".   
 A further very good idea is to individuate  
 relevant and authoritative topic-related messageboards. There you may even find topic-related specific search engines, 
 like  for instance community.reverse-engineering.net 
 and   woodman's.
 It is worth underlining that on the web much knowledge can be found in "grey areas" that are completely outside the "academic circuits". 
 In fact if you are  interested in assembly you'll soon realise that the four books we have chosen as targets today are 
  obsolete 
 and quite irrelevant if compared with the knowledge you can gather visiting the messageboards pointed out above.
 
   
   
    - Going regional
  
    Going "regional" is ALWAYS a very good idea when searching. We have already seen how adding a simple .ru to 
    our queries can help. But WHERE should we search? Which are the, how should I say? the "less copyright-obsessed" countries? Here on the right you can 
    see a interesting 
    "piracy subdivision"  published a week ago by The 
  Economist...
  
   
      Ok, ok, I know: obviously this kind of bogus "home made" researches, promoted 
      by US-lobbyist Robert Holleyman's  "Business Software  alliance", are clearly just intended to scare 
      the pants off some corporate clown in order to scrap some money, yet   
       since we are reversers, we may as well reverse such data for our own purposes... :-) 
      And look! As you can see,      Vietnam, Zimbabwe, Indonesia, China, Pakistan, Kazakistan, Ukraine, Cameroon, 
      Russia, Bolivia, Paraguay and Algeria seem indeed to have a more relaxed attitude towards the  
        "patents mafiosi". Good to know :-) 
      Here the relevant country codes: 
             .vn, .zw, .id, .cn, 
             .pk, .kz, .ua, .cm, 
             .ru, .bo, .py and  .dz, 
             that we could use to restrict searches to such relaxed places.
              
             Of course some of these countries are just "local niches" with next to inexistent activity and 
             extremely weak signals, and can be ignored. 
              
             Let's say that -in general-    
             .vn(Vietnam),
             .id (Indonesia),   .cn (China),  .pk (Pakistan),
                      .ua (Ukraine) and   .ru (Russia) look promising enough.  We may add
                      to these countries  
                      -out of experience-   
                      Iran, Korea, Bulgaria and India (.ir, .kr, .bg and .in).
                     
                      So let's go local: let's visit China,  
                      where we can find following this link, 
                      some other interesting assembly books.
                       Of course we should also have a look 
                      in    
                      Vietnam, 
                      in    Russia/Ukraine (where 
                      we will at once retrieve our Target and 
                      even other related books), 
                      and here is how you would search in KOREA with 
                      
                        MSNSearch. 
                        
                        
  This is all just academically speaking, duh. Once again: seekers don't need to download anything from 
                        the web, since they can always find their targets again and again if and when needed :-P
                        
                      
  
   
   |    |       
 | 
 
     
- 
IRC channels and blogs
 
Searching through IRC channels and blogs can be -for some targets- quite useful.       However the ratio noise/signal is
quite bad on these channels, and therefore IRC-searching and blog-searching is -mostly- a 
waste of time if compared with more effective searching 
techniques. 
    
After all, and behind the hype, blogs are just messageboards where only the Author can start a thread.
I'll just point you to some 
blogs search engines and to some 
IRC search engines like 
this one. Nuff said.           
   
         
     - 
Various "uncommon" search engines
     
        
        At times simply switching to less known (but interesting) search engines can cut mustard. 
        
Here a assembly-related search with 
         kartoo
              
              and here another search using gigablast.
        
            
           
            
            
       
       
        
 
          
  
| Sliders, SEO beasts,  and Sinking google | top | 
Google alone and you'r never done
Many clueless zombies consider "searching the web" tantamount to digit one term inside google and then clicking enter.
 
In fact such a simplistic approach is as wrong as it may get. And not only for the "one-termness" of it.
The real problem is that google covers only a tiny part of the web. 
  
The power of its servers and the beautiful simplicity of its interface 
notwithstanding, google is only one of the many main search engines, and its database, 
while currently already past the 
20 billions sites mark, covers at most one third of the visible web (and less than 1/50th of the "invisible" one). In fact 
Yahoo's database is bigger (albeit saddled with a lot of useless commercial crap).
 
The web is just too big for a single search index alone, and  still growing quickly. Moreover 
the "invisible" web content "the real bulk of the web" is hidden behind 
firewalls or commercial services (that will try to restrict access 
asking for "subscriptions" 
or "money" or a "valid id").  
In order to access (part of) it you will need to use techniques that go from stalking to social 
engineering, 
through trolling and passwords breaking. |        | 
  | 
  
Note also that 
links (the food used by all search engines, and especially by google, in their algos) 
does not convey any real 
meaning. 
In fact a link, 
per se, does not mean nothing, you can just count 
the number of links, as most search engines in fact do, and then try to decide what those links really mean using 
a bunch of  "best-guess" algos. A rather crude 
approach if you ask me: this gives all advantages to the beastly spammers, none to the users.
Of course some correct developments are already under way: 
users -rather than spammers- should be able to influence the ranking of search results and some 
search engines (MSNsearch's 
sliders 
and Yahoo's 
philtron) already 
provide users with such a possibility 
of influencing -at least rudimentary- their own ranking algorithms. 
A classical case is the infamous 'popularity' ranking criterion. This you should by all means slide to insignificancy, since 
it is 
eo ipso tantamount to crappiness.
 Contrary to 
what search engines' algos designer still seem to believe, we  immediately 
TURN POPULARITY DOWN if given half a 
chance, since what -say-  some idiots in Idaho are massively looking for 
has no relevance whatsoever for  humans with brains.   "Sites popular for zombies" are exactly those you never need. Remember the beautiful 
 old trick of adding a 
-".com" specification to all your searchstrings per default: 
 all the com sites will disappear: good riddance.
 
 For search engines that do not allow any algo fine-tuning, a possible approach is the "
yo-yo" approach: 
 jumping  from the start 
 onto lower 
SERPs.
 
The old "best-guess" and link related algos are what makes life so easy for 
the beastly spammers:
 google's results, after many years of legendary quality, are
nowadays  being spammed more and more by the beastly commercial clowns that call themselves  "Search engines optimizers".
 
Using a plethora of methods (cloaking, doorway pages, hidden text, blog-farms, you name it) 
these criminals deny everybody the possibility to gather  real knowledge 
 in order to push up 
their crap commercial sites into the first positions of the 
SERPs.
In fact, searchers are directly affected by these criminal deeds. As even SEO-spammers  
managed to admit: 
 "searchers have something to gain if they obtain the search results that best match their queries and, 
 consequently, something to lose if they cannot do this".
SEOs, these "Judas of the web" sell -for money- their knowledge and insights  of search  algos' weaknesses in order to purposely   
deliver 
dubious and crap results to our queries...  
Quelle vulgarité! 
Acquiring a working knowledge 
of the many alternative searching paths is 
eo ipso useful and may already now allow even beginners 
to find valuable results more quickly  and reliably.
Such alternatives can soon prove 
even more crucial for Internet searching purposes:
 while google may not be yet a sinking boat, anyone can  
see how much water is already entering through its many holes. 
So we have 
to reverse our own searching habits.
Instead of just using google à la "va banque" every time  we 
begin a search, we should 
carefully consider 
how and where  we start our searches, delve a little more 
inside 
our searches own specific requirements, else we will waste too much time 
on irrelevant side paths... 
    
And since we are all reversers, an ancient and savvy race, 
incredibly "apt to adapt", we'll be able to reverse first and foremost ourselves and 
our own working habits. 
Google has its weaknesses and its strength, let's analyze them.
  
| Google weaknesses (and strengths) | top | 
 Le plus fort est celui qui n'oublie pas sa faiblesse
Google is/was/became the best search engine because of its clean interface in a 
frantically commercial on screen world, and BECAUSE it didn't pollute its results with advertisements and BECAUSE 
it didn't practice  any censorship whatsoever on the results. 
It is -alas- now losing ground on both last terms.
Try any search for mp3s, for instance, and you'll see at once both advertisements and censorship at work... 
 
The third 'raison d'être', its effective and simple interface, still survives  somehow,
 its more and more frequent "cartoonish" cracks on the querymask logo  notwithstanding.
More annoying is the fact that 
today up to half of each SERP screen is dedicated to paid ads, compared to the ad-free original "Old-Google".
 
Google's (relative) cleanliness was 
so powerfully convincing that many rivals went "back" to a similar clean approach, ditching their useless heavy-commercial portals 
(compare on 
alexa the evolution of 
Yahoo's portal... 

 )
The biggest weakness of google, is that it's 'patented ranking algos' are now pretty well known.
Their 'secret combination' of 'thousand of algos' was all just hype from the very beginning, and  
their ranking approach -never really hidden- is now well known 
by countless commercial spammers, thus making it a liability rather than an asset.
In the main search engines panorama there are at the moment hundreds of different 
prototypes and companies that all utilize more or less the same algos.  Yet even slight 
variations can make the difference: their results overlap only 
for a small part (around 1/4 of the SERPs do overlap). This is where the depth and freshness of 
the supporting database plays a bigger role than the cleverness of the ranking algos.
In fact (as per spring 2006) only four  contenders:  
Google, Yahoo, Microsoft and Teoma/AskJeeves  have enough muscles  to guarantee a relatively 
useful and regular indexing of the web. 
But -as we have seen- these four cover together, at best, just one half of the visible web and a tiny part of the invisible one.
This makes it extremely important to use alternative approaches when searching. 
Since google is still a useful and powerful quick search engine, and since it 
owns the whole archive of newsgroup postings, we will never be able to ditch google 
completely anyway... 
 
 There's also a google bias towards "established sites", due to its 
links algos:  
         if you are searching content that is likely to have been on the Internet for a LONG TIME, 
         google is a good choice. On the other hand, if you are looking for "fresh" content, you better use MSNsearch (or even 
         good ole altavista). 
Google's real strength is its "quality database" of useful sites. It is not a matter of the quantity of sites listed, it is a matter of  quality. 
Yahoo's database, while bigger than google's, hosts an abominable amount of ".com" sites (five times the amount of google) which 
heavily skew all results towards irrelevance. 
But how do you judge results? How do you prepare your search? 
Knowledge of some basic searching rules can help.
   
| The golden rules of searching | top | 
Quaerite et invenietis
      
       There are some basic rules for seekers. Of course things are different depending from the KIND of search you are 
       performing. There are rules for 
long term web searching 
and rules for  
short term web searching ...
 
       
       But almost every query can be subdivided into the following steps: 
think, find, refine, evaluate, collate.
      (
The 
Finder 
Reverses 
Every 
Corner)
       
       	
- 
           	 THINK about your query
              
                   Seekers do not "plunge" into 
                   a search out of the blue. Like artists, they visualize the correct result 
                   before they begin. The 'perfect' answer is driving their queries. The perfect answer creates 
                   the correct question(s)
                   What kind of results do you want? Books? Doctoral thesis? Images?
                 News?  Biographies? How many results do you want? Three hundred pages of material? 
                 One single authoritative book? A dozen pdf-articles? 
                A short and concise essay?
                   Obviously you cannot be an expert in all single field of  
                   any and every query you will launch. But you must 
                   be an expert in the field of finding the right resources for 
                   each and every kind of query.
                   
A seeker needs TWO skills: 
                   to formulate a question correctly and   to know where to look. 
                   And this means knowing which 
                   resources you should use for your searches. And this means you must first of all know how to search those very 
                   resources you should use for your searches.
                     In fact each 'part' of the 
                   web requires a different approach. For instance, searches on usenet, on    
                   blogs or on ftp servers  are 
                   not ruled by the same 
                    lore. 
                   Also each  kind of target, each  quarry, 
                   requires a different approach:   for instance 
                   when searching 
                   news, images or books.  
                   
You must also decide if  for a given query you will have to use 
                   combing techniques like stalking, 
                  luring     or trolling.
                   
Before even beginning, think about your query: prepare your question(s)
                    for the perfect result and decide which resources you will use.
           
 
- 
	     FIND what you are looking for
 Easier said than done, I know, I know.  
	           In fact this  very complex step is  at the same time the whole point of the exercise, duh. 
	           However, depending on the previous "thinking about your query" step, you will at least already know where you should be looking for and 
	           what kind of techniques you'll have to use.
 
	           A general advice is to comb as much as you can, i.e.  
	           use the knowledge that others have gathered, search those that have already searched, 
	           and do not 'reinvent the wheel' at every query.  
A second generally useful advice is to go 'regional' as much a you can, that is to use 
	           		information and resources that are located on the same plane (geographically, temporally, academically, conceptually)  as 
	           		your quarry.
	           		
Anyway, if your      question has been formulated 
	           	correctly and  if you already know where to look, the 'finding' part will not 
	           		be too hard.   
	           		 
 
- 
	    REFINE while searching
 
	    
Your queries are usually either too wide or too narrow. Usually -in fact- they are too wide. If a subject is too wide, 
as it is most of the time, you have to limit and narrow your search. 
Using boolean operators (AND and NOT or + and -) will narrow the search adding and/or eliminating terms. You can also limit 
your query temporally (for instance only 2005/2006), geographically 
(for instance only .ru) or formally (for instance only .pdf 
files).   
These limits allow you to restrict results to items meeting specific criteria. I.e.:
 a particular type (newspaper articles, journal articles, complete books, small snippets of text); 
 a particular language (English, German, Spanish, Russian, Italian, French, etc.);  
a target published or produced within a particular time frame (2000-2004)
	     
  
 - 
	    EVALUATE your results
  This is easier said than done, again. The evaluation 
	  phase is of paramount importance, but -alas- far from being simple. 
Whatever you are looking for, 
	    you are bound to find very good quality results, good quality results,    average quality results and poor quality (or no quality at all) results. 
	    This is not only due to the spam, but to the simple fact that the web allows   anyone to publish anything he fancies.
	    A possible approach to evaluation is to use as a rough evaluation guide the seven 
old classical questions: quis, quid, ubi, quibus 
auxiliis, cur, quomodo, 
quando: Who, What, Where, Helped by whom, Why, How and When. 
Well, first of all you should maybe ask yourself why the heck  you need to search for something at all :-) 
Continue only if you have an answer to this fundamental question.
If you  manage to answer the fundamental question and continue,  
whenever you find  a result, it is useful to ask yourself  for evaluation purposes 
the whole bag of classic questions.  Does not take long and helps a lot. 
Let's begin: quis WHO is the Author (and therefore, given his 
biography,  
what qualifies him to write about the matter at hand);    quid WHAT is in fact the result you found 
(a complete  explanation, 
a proof of concept, a small addition, an hypothetical solution...);  check   ubi WHERE you did find your result (look at the URL, 
look at the server, look at the links pointing to it...); quibus 
auxiliis, WHO helped the Author? (look at who OWNS the server hosting the result, look at eventual references, links, etc...);
         and ask yourself cur: WHY the result has been produced and put on the web;  quomodo HOW the result 
         has been produced (again, similar to quid/what: years of research or one half-afternoon sudden jerk?);  
         and finally 
         quando: WHEN was the result produced/published/updated, when was the web site created/updated.  
         (Archive  org may prove 
         invaluable for such dating purposes. Note that you can also retrieve a site for specific date of the 
         past).
         Simply answering the seven classical questions  will already allow you to proceed towards a proper evaluation of a set of results. 
          
Finally a word about those "ready-made" evaluations you can find on the web. Should you use them? Yep, Cum grano salis.  
          
          First of all there's a tendency to ignore "grey areas" of the web when evaluating targets. Some seem to believe that  
          a pdf file should automagically be eo ipso more worthy than an html file, independently from its actual content. No way. 
          A text is not worth 
          anything just because it has been printed and published in a book.  
          Its worthiness is always and only intrinsic.  Many ready-made evaluations on the web 
          are blinded by frills' bells and excessive 'formal-bowing' and  utterly 
          incapable of judging content at face value.
          
          It may also be worth noting that -in general- east european places (.ru, 
                     .bg, .cz etc.) are (still) 
                     "culturally" less commercially oriented and therefore 
                     offer more "sound" valid evaluations of books/software/targets, instead of the bogus fanbois "evaluations" 
                     that are purposely  planted on -say- amazon or ebay.    In fact you can hardly find a non-paid -sorry- non-biased review 
                     or comparison of certain products on the west-side of the web.
                      
 
	    
	  - 
	    COLLATE your results
 
	   Ok, you have performed a long search. Gathered tons of results. Painstakingly weeded out bogus and crap sites, understood which are the 
	   most important authoritative results... and now you stop your search and go to sleep satisfied.
	   This is a serious mistake. A query is not finished when you have found your results. Most 
	   will be lost if you don't COLLATE your results, squeezing the most authoritative 
	   results into a coherent and valid interpretation. A 'conclusion' of sort.
	   
Systematic record keeping is OF PARAMOUNT IMPORTANCE when searching. A classical mistake 
 is to 'forget' to keep records during  complex searches.
For this purpose I suggest you simply use the NOTE function in Opera: just highlight the target 
text you are interested in, 
rightclick, and then choose copy to note (or use the keyboard shortcuts, either
 CTRL+SHIFT+C or CTRL+ALT+E depending on the version of Opera you'r using): 
*the URL* of the page you'r viewing at that moment *and the date* will be automatically stored in your note 
*together with the highlighted text*.
You may want to create ad hoc note folders (for instance 
"research_assembly_books_29MAY2006") and, at the 
end of your search, before switching the computer off and go to sleep, just move all your 
related notes inside some correctly named folders. 
Opera's Notes are just text format, very easy to edit, 
cat, search or prune.
Alternatively use something to take notes, even a pen and a sheet of paper will do. DO NOT rely on your memory alone (or on your 
extraordinary seeking capabilities to re-find at once what you may have lost :-)
If you do, you will regret it. Sooner 
than you believe.
Once you create some crumbs-paths of well kept records, collating the results will be 
a quick and easy process.
    
 
- A final note about your "searching environment"
  Listening to music while searching   is NOT a good idea,  chatting 
	while searching is NOT a good idea, being interrupted while refining a query is verry, verry bad: 
Always search & seek in a quiet and relaxed environment, 
with as few disturbances as possible. No music, no telephone, no skype, no email distractions, no IRC, no chat 
(and of course no TV, duh). 
A serene, calm atmosphere, will allow you to 
take full advantage of your seeking efforts in an optimal way.  
Serenity CREATES serendipity. 
This does not have to mean soberness, austerity or ascetics, though. 
If you fancy something to drink, have it ready before starting, and always chose excellent products: wines like 
pomerol or refosco or  
the most finest teas (Darjeeling second flush, for instance). 
Seekers can often fetch 
such seemingly expensive items for next to nothing using the  old 
usual   barcode tricks. 
Let's cut it short: 
Consider yourself a monk of the early middle ages, sitting in his  peaceful cell, seeking old forgotten 
knowledge, sipping good wines, while barbarians and zombies are burning everything in sight and torturing each other
 not far from the abbey's walls...  consider yourself a monk of the early middle ages among barbarians 
because this 
is exactly what you are and that is exactly what is happening nowadays :-(
	    
	       
	    	     
 
 
  
Quaeras ut possis, quando non quis ut velis
  The various techniques described above can and should be used together with the 
main 
search engines:       on the ever moving web-quicksands 
it does not make much sense to give a "list of links" to places where you can 
"alternatively search". 
Of course there are various important 
non commercial databases, like Infomine (
http://infomine.ucr.edu), 
Librarians Internet Index (
http://lii.org), The Internet Public Library (
http://www.ipl.org/) Resource Discovery Network (
http://rdn.ac.uk), 
Academic Info (
http://www.academicinfo.net/), The Front (for journals: 
http://www.arxiv.org/multi?group=math&%2Ffind=Search)
and finally the best one of all: The Open Directory Project (
http://dmoz.org).
These are of course all  
possible alternatives to the main search engines approaches.   
Yet lists of links are and remain just  lists of links. Bound to decay into 
obsolescence. It is much better to describe the different APPROACHES, 
that will remain  valid for many many years even on our extremely 'quicksandish' web-environment. 
We have seen some of these approaches during our search for 
assembly books, let's quickly summarize them again:
- 
All the various main search engines (a "treasure chest" for seekers)
 - 
Regional searching: the paths beneath the horizon
 - 
Combing the web: dos and do-nots (usenet, messageboards, blogs, irc, ftp...)
 - 
Unorthodox searching: guessing, stalking, social engineering, luring, trolling, klebing & more 
 
  
| aap | Regional searching:  paths beneath an anglo-centric horizon
  | top | 
      our vietnamese, chinese, persian and russian friends... 
 
  
| aap | Combing the web: dos and do nots
  | top | 
        "potpourri" searching (not only for books, but also for 
music, or
        many 
other things. 
        And note that with both searches we are 
        "yo-yoing" to page 10 or 8 of the SERPs to avoid spammers nested in the first pages  :-) 
        
snippet searching,
 "html adding", name-guessing and all the other tricks. 
Messageboards, usenet, ftp, blogs, rapidshare alike repositories, 
        torrents... 
 
  
| aap | Unorthodox searching: guessing, stalking, social engineering, luring, trolling, klebing & more
  | top | 
         ... 
 
Tools for seekers
Using 
wget for fun and pleasure
wget, which  
exists for windows as well,  
 is a file retrieval tool that can be used via FTP or via HTTP  (the two most widely used Internet protocols). 
Used mostly in order to mirror websites, it can also be used to find files across the web. 
Wget supports proxy servers, and most of the features are configurable. An invaluable tool for searchers.
 
Nil perpetuum, pauca diuturna sunt
Two assignments, one easy, one not.
		1) For the "lazy occasional searcher" a simple and easy assignment:
(just in order to practices the various techniques explained above):  find 
Python 2.1 Bible, by Dave Brueck & Stephen Tanner, 
(Wiley, ISBN: 0764548077, 2001). 
Be careful: around the web there are few "unencrypted" pdf editions and many "encrypted" pdf editions. Obviously 
you  want to copy, use, extract whatever you fancy. 
So find the   "unencrypted" edition of this target. 
This search should take you at most 10 minutes.
	
2)
For the "serious seeker warrior" a more complex assignment: Find data to stalk and highlight 
the wrongdoings of 
these dangerous clowns... 
 
You can see how naïvely direct they can -and do- propose to 
sell disinformation services for dictators & dictatorships  to be used 
against democratic media and against citizens at large.
 
As usual there's no shortage of scumbags, ready to sell their informatics skills to governments, military establishments 
or private buyers for whatever 
dirty and undemocratic use or purpose.
There's instead, alas, a shortage of reversers that would care countering and denouncing this.  
Using google AND especially 
using some of the alternative approaches you have seen today,
 this search/stalking exercise can be accomplished in less than one week.
    And now I'm finished. 
    
Thank-you for your patience. Any questions?
  
 
  
 
 SEARCHING THE PAST
(DISAPPEARED SITES)
http://webdev.archive.org/ 
~ The 'Wayback' machine at Alexa: explore the Net as it was!
Visit The 'Wayback' machine at Alexa, 
or try your luck with the form below. 
Alternatively, learn how to navigate through
 [Google's cache]! 
  Alternatively a new US-centric "preservation" 
  project 
Webcapture is coming along.     
       
       
  
| aap | All the various main search engines.   | top | 
A quick tour of the 
main search engines...  
    
 
 
 
 
     
back to portal  
 
back to top
(c) 1952-2032: [fravia+], 
all rights reserved, all wrongs reversed