Intermediate Search is an internal search engine for your website, very similar to Matt Wright's Simple Search. This script expands on the simple one by taking into account new Meta tags, such as keywords and description. The list of hits for a particular query are also weighted by relevance..Intermediate Search allows for complex but easy to use Boolean operators, grouping, case control, string-vs-word queries, and basic wildcard searches. Includes both a setup guide for webmasters and a searching guide for visitors.
View the Code (text file)
- Search our Website
- Note that this script will search the demonstration directory of Genesis as well. If you'd like to see how the script deals with custom pages, you can create them here and then run a search.
Download the Code and Instructions (zip file)
Intermediate Search Version 1.0 had two bugs. First, it performed all searches case sensitively (1.0 Beta did not, but the full version had a new feature added which side affected case sensitivity). Second, Boolean operators did not function when used as the first term of the search. Both bugs have been fixed.[TOP]On a related note, Boolean operators which appear as the only term or the last term result in strange output. "NOT" or "OR" return no files if they aren't followed by a real search term, while "AND" returns all files when it is the final or only term.
We have successfully ported this search engine to Windows NT and designed an infinite-depth recursive procedure for finding files. Later versions will have these features, but for now we're focusing on AXS 2.0.
The new "+" version allows Meta tags to have multiple spaces between the attributes and gives a meaningful error message when the base directory is specified incorrectly.
A searchable site is much preferred by visitors, especially those who know what they're looking for (of course, no search engine can replace a well-structured index). The ability to use highly specific search language to retreive a sorted list of hits makes Intermediate Search the tool of choice for scanning archives of mailing lists, Usenet posts, or other archives which lack a logical table of contents.[TOP]In addition, Intermediate Search provides the webmaster with critical information about what his visitors are looking for, and whether or not they find it - a copy of all search terms and the number of hits generated is saved for his use.
To run Intermediate Search, you'll need a web page that allows custom CGI and a UNIX-based webserver that is compatible with Apache version 1.1 (most are).[TOP]Depending on the power of your server, Intermediate Search will begin to slow down excessively when scanning more than a hundred large documents. If you need to scan a larger set, you may wish to try Simple Search.
This script is fairly easy to install. Download the search.cgi script and open it in a text editor.[TOP]First we have to tell the script which files to search, and where they can be found. The basedir variable should be the path to your web directory - remember to include the trailing slash:
If you don't know which directory you're in, telnet to your webserver, head over to your web directory, and type "pwd" (no quotes). The server will return your directory location.$basedir = '/usr/www/users/xav/';Next we provide the base URL that corresponds to files in the above directory. Again, remember to include the trailing slash:
Now we describe, in as much detail as we'd like, which files to search from the base directory. Wildcards are allowed, so if you want a simple search that goes three levels deep on .htm and .html files, you'd enter:$baseurl = 'http://www.xav.com/';On the other hand, if you want people to look only in your scripts and freeware directories, and not in secure or www_logs, you could enter:@files = ('*.html', '*/*.html', '*/*/*.html', '*.htm', '*/*.htm', '*/*/*.htm');Intermediate Search saves a copy of all search terms used on the site. This gives the webmaster valuable feedback on what people are looking for. The summary_file variable should hold the system location of this file; we recommend that it be in a secure directory so that the rest of the world cannot read it:@files = ('*.html', 'scripts/*.html', 'freeware/*.html');Intermediate Search will now operate fine on your web site. The next few variables are optional (but recommended). Variables link_url and link_title will be used to create a hyperlink back to your main page. They can be just about anything you'd like, such as:$summary_file = '/www/zoltan/secure/summaries.html';The java_toys variable determines whether or not we'll use JavaScript on the search page. A small bit of Java code causes the cursor to blink in the main search window. Most systems should deal with this fine, but you can turn it off if you'd like with:$link_url = 'http://www.milosevic.org/~zoltan/index.html'; $link_title = "Zoltan's Home Page";As we move towards more Java and graphics enchanced interfaces, we've started to include pictures. The search script includes a picture of an E3 search-type aircraft. You can download the picture here, or you can just reference our local copy. Enter the URL of the picture in the corresponding variable:$java_toys = 'off';You could also enter an alternate picture to be used on the search page, or edit it out of the HTML block.$searchpict = 'http://www.xav.com/images/search.gif';Because this script calls itself, it needs to know either it's relative filename or full URL. You should only need to change this value if you've renamed the script:
Intermediate Search will sort files based on how many times a given search term appears in the text. The more times it appears, the more weight it is given and the higher it occurs in the list. Special weight is given when search words appear in the title, filename, Meta keywords, or Meta description. By default, a filename or title match count as two word hits, a match on a Meta keyword counts as four words in the document, and a match in the Meta description counts as two. These can be customized if you like with the following array:$cgi_url = 'search.cgi';Now that we've set all the variables, transfer this script in ASCII format to your web server. Give the file read and execute permissions by typing "chmod 755 search.cgi" from the telnet prompt. If you set permissions with Wsftp or a similar client, mode 755 corresponds to "owner read write execute, group read execute, all read execute".($name_x, $title_x, $keywords_x, $description_x) = (2,2,4,2);Next we need to make the summaries file writable - type "chmod 777 summaries.html" (no quotes) from telnet. This is equivalent to giving everyone all permissions.
There are two ways to search your site. People can either go to the script URL and enter their searches, or you can have a form on another page that triggers the script. In your HTML, you could add:
<B>Search Our Site</B><BR> Enter your search terms below, or visit our <A HREF="http://myprovider.com/~me/search.cgi">search page</A> <FORM METHOD=POST ACTION="http://myprovider.com/~me/search.cgi"> <INPUT TYPE=TEXT NAME="terms"><SUBMIT></FORM>
If you get a "malformed header" or "premature end of script headers" message, it may be because the script was transferred as a binary file at some point (which scrables the hidden end-of-line characters and confuses the server - always transfer scripts in ASCII format). If you open the file with Pico, create and delete a line, and then save it, the problem usually goes away.[TOP]The most common problem is to not have permissions set correctly. Make sure the script is readable and executable by everyone (type "chmod 755 search.cgi"). Also make sure that the summaries file is writable (chmod 777 summaries.html).
In addition, when you open the file in a text editor to set the configurable options, the editor might wrap long lines, which will prevent the script from working. We've tried to make all the lines 70 characters or less but some long commands went over that limit. Scroll through the script and make sure that no commands have been interrupted in mid-line.
On the other hand, if the script works fine but does not return any search hits, double-check your file specifications ($basedir and @files).
Fluid Dynamics will provide free, limited support via email for this script. Send requests for assistance to noc@xav.com. Please include the relevant URLs in your message, and cut & paste the telnet response to the "perl -w search.cgi" command if possible. Note that we will not support nor respond to those operating sites whose material offends (adult sites & the like).
Custom installation is available at a reasonable rate; custom coding currently runs $40/hour and we have a it-works-or-its-free guarantee. Installing this script would take less than an hour.
Hosting Intermediate Search for Remote Sites[TOP]
Although Intermediate Search will only search local files, you can create a text-only mirror site on your website, and redirect searches to the true site. For example, your could set:Under this configuration, visitors will be redirected to the corresponding file on the remote server based on a hit from the local mirror directory. Because the mirror only requires tag-stripped HTML documents (no images, sounds, and other bother), mirrors can be very space-friendly.$basedir = '/www/zoltan/etc/mirror/'; $baseurl = 'http://www.geocities.com/SomePlace/1522/';Searching Your WebServer
Similar to the example above, there is no code-based requirement that you own the files that are searched. If you want to search every page on your webserver, or documents in a neighbor's directory, you could set:Under this configuration, you'll be able to see all files on other's websites, including those protected by .htaccess restriction. If you want to see the contents of those files (even password locked ones), we have developed tools for doing so.$basedir = '/www/'; $baseurl = 'http://www.myprovider.org/';
Intermediate Search is freeware. There are no restrictions on its use (save for the U.S. export restrictions below), nor are any warranties made or implied about its durability or fitness for a particular purpose.[TOP]While use is unrestricted, distribution requires the consent of the copyright holder, Fluid Dynamics. For purposes here, installing Intermediate Search for your customers isn't considered distribution, but including it in an archive of freeware CGI scripts is. Just write to us and ask permission, we'll probably say yes.
To receive notification via email when an updated version is released, send an email message to noc@xav.com to that effect. To help drive new updates, please send any suggestions and bug reports to the same address.
Primary credit goes to Matt Wright for the outline of code and the file listing module. Additional credit is due to Jeff Carnahan for the code which provides the "Last Modified" times for files. Some credit goes to Altavista for their query syntax and output format.Those who point out glaring errors in the code will receive mention in this section. For example, Wender Hwang of Interact, Linda White of Three Rivers Free-Net, and Marcel Da Silva of Spagnol's Wine & Beermaking Supplies all pointed out the error in case sensitivity. The bug involving first-term Boolean operators was pointed out by Alexander Goncharov of NW Nexus. Linda White also pointed out the bugs involving multiple blank spaces in Meta tags.
[ Email | Search | Main Page | CGI Scripts | Intermediate Search ] | © 1997, Fluid Dynamics |