Practical Technology

for practical people.

Hunting Data on the ftp Line with Archie

| 0 comments

Is there anyone alive who doesn’t thrill to the idea of exploration? While we can’t be Indiana Jones, it is possible to search for temples of software treasure in the Internet. To computer fans, a new program or an interesting data file can be as exciting as any golden idol. And, besides, we never ever have to run into snakes!

In our last go-around, I explained the basics of using ftp (file-transfer protocol) to find and download files from Internet sites. Thanks to sites that allow anonymous ftp logins, it’s possible for anyone with net access to download files… if you know where to find them.

While ftp enables you to search around in a public directory for files, if the files you’re looking for aren’t there, hunting for them won’t do you much good. Worse still, it can be like looking for specific file can be like looking for a needle in a haystack. The virtual universe of cyberpunk science-fiction is already with us in that the Internet really can be a confusing and confounding maze.  Fortunately, for all would-be network explorers, there’s a faithful guide to lead you through the deepest, darkest network nodes: archie. Archie is a database program that does the hard work of finding and indexing files throughout the Internet. Instead of tracking down elusive files on your own, you can contact an archie site and use archie to find your elusive quarry.

Archie sites are scattered across the world. [See Table 1] While you can use any archie server, it’s fair play to use the one for your geographical area. Not only should this take up less online time, but by spreading out the load, the service is more likely to avoid being overburdened.

To use archie, you have three choices. The most popular one is to use telnet to contact the closest Internet site with a copy of archie. To perform a search, you’d go through a process like this:

telnet archie.sura.net

When you have a connection, you then login as “archie.” You will not be asked for a password. The next thing you’ll see in a prompt like:

archie:

From here on out you’ll be talking to the archie program rather than the operating system the way you normally do with telnet.

The first command you should enter is:

archie:; show

This displays all of archie’s settings. For the most part you can ignore these, but two you may care about are pager and search.
Pager tells archie whether you want it to dump its results to screen or whether you want it to wait for either a carriage return or space bar after filling the screen. If you’re unable to scroll your screen back to catch something that just disappeared off the top of your display, set the pager on with:

archie: set pager

Search defines how archie goes about looking for your entry. Different archies have different search defaults. For example, some default to case sensitive searches, while others could give a hoot as to whether your search string is in all caps or all lower case.

If archie comes back with a search value of “exact” then you’ll know that’s it currently set to find only exact case matches. That is to say, a search for CHess would not find Chess. File names being what they are, that probably isn’t what you want.

You can set this parameter to your liking with the command:

archie: set search X

X can have one of four values. Exact, we’ve already seen, your other choices are: regex, sub, and subcase. When you use regex, the search string works like a Unix regular expression. In short, wildcards like * and ? are allowed. I could search for Ch* and get all references to Chess or Chessmate. Unfortunately, I’d also miss CHESS.

If you search using sub, your expedition will find any program names that contain your search phrase regardless of case. For instance, a search with “CH” would now uncover Chess and CHESS. It’s also going to dig up files with words like patches or PATCHES in the title. Sub is very powerful but it can deluge you with false hits, i.e. files that contain your search string but have nothing to do with what you’re actually looking for.

Subcase works like sub but is case sensitive. Regardless of which method or methods you use, the most important thing you can do before beginning an archie hunt is to decide how you want archie to look for a file before beginning. You’ll not only save your own time, you’ll save everyone’s time.

Now, you’re finally ready to start a search. Searching is done by simply entering:

archie: prog search_string

Archie then proceeds to give you a running score on how many hits its found and the percentage of the database that has been searched. After that, archie reports on the files that meets your specifications.

This is done by presenting you with a listing describing the files’ location. This list comes in a multi-line format. At the top of each record is the site name for the system with the file. This is followed by the file’s directory and finally the file’s name along with some size and date information.

Without telnet access, you may still be able to use archie with a local archie client. In this case, you’d just run archie from your system’s command line. This is the best way to use archie since it saves wear and tear on the network.

You’re not out of the hunt even without telnet or a local archie. Archie can be used by mail. To do this you send a message to an archie site addressed to archie. For example:

archie@archie.rutgers.edu

In the text of the message you enter archie commands. A sample search message could look like

path you@your.mail.address
prog what_ever_it_is_you’re_looking_for
quit

That’s it. The first line tells the system where to send its findings. Archie can read your from line and would normally send its results there, but its safer to use the path command. From lines can be misleading.

The prog line gives archie its search orders. The default for mail searches is regex, but you can reset it using the normal commands. Quit does what it sounds like. If there was nothing else in your message, you wouldn’t need it, but since many people have .signature files, which are automatically added to the end of their messages, it’s a good idea to use quit.

Once you have the file name and the location, you can use anonymous ftp to retrieve the file. Suppose, however, you don’t have ftp access? Sometimes you can use mail to obtain files even though without full Internet access.

There are three ways of doing this. Some servers are set up to respond specifically to mail requests for files. Others, especially on BITnet, have listserv servers that use a different method for file requests, but like direct mail requesters, are limited to local files.

In a mail request, you get a file by sending a message to the site with your request in the subject line. Such a request might look like:

To: mail-server@some_machine.somewhere.edu
Subject: send /directory/file_name

And that’s all there is to it. You put absolutely nothing in the text part of the message.
For a BITnet system, your message would run:

To: listserv@an_ibm_mainframe.somewhere.com
In the message body, you’d write
get file_name file_type

Since BITnet runs on IBM VM systems, file type is mandatory.

More generally useful are ftpmail application gateways. The most popular one is ftpmail@decwrl.dec.com. Just to make things annoying, it uses ftp syntax rather than anything that looks like the first two methods. A sample of this:

To: ftpmail@decwrl.dec.com
Subject: File Hunt
connect system.somewhere.com
chdir pub/file_location
get file
quit

Subject is actually irrelevant, but it can help you remember what’s that large, mysterious looking letter in your mailbox is. The important part is that you enter ftp commands, (be careful of typos!) one command per line, in the text.

Presuming that everything went right. You can expect to see your file in anywhere from a few hours to a few days. Speaking of getting it right, you can always send the single command ‘help’ to any of the mail servers or archie to get more help.

A version of this story was published in Feb. 1993 in Computer Shopper

Leave a Reply