Uniform Resource Locators

By Randy D. Ralph, MLIS, Ph.D.

In place 1996.  Last update December 29, 2000.  Copyright © 1996 - 2002 Randy D. Ralph

The Structure of URLs

A Uniform Resource Locator (URL - pronounced Earl, as in "Duke of...") is nothing more than an Internet address. Don't let the format throw you. Basically, it's set up like this:

HOW WHERE WHO/WHAT
Protocol Host Domain Name Directory Path
http:// www.mydomain.net ~rdralph/rdralph/

The PROTOCOL (http://) is the HOW

The HOST DOMAIN NAME (www.mydomain.net) is the WHERE

The DIRECTORY PATH (~rdralph/rdralph/) is the WHO/WHAT

So the whole URL is:

http://www.mydomain.net/~rdralph/rdralph/

Meaning: take me to rdralph's randy directory at www.netunlimited.net using the http (hypertext transfer) protocol and load the default page.


Common Internet Protocols

There are several protocols used commonly on the Internet to get to a variety of sites which support them. They are listed in the table below. The protocol which supports the World Wide Web - just one component of the Internet - is http - hypertext transfer protocol.

Protocol Type Example Description/Function
file:/// Local File file:///c|/netscape/bookmark.htm Loads a local file from your PC or from a network.
ftp:// File Transfer Protocol ftp://oak.oakland.edu/SimTel/ Opens a file transfer session that allows you to download and upload (if allowed) between your local PC and the remote computer. Anonymous access may be permitted. Some FTP sites will require valid accounts.
gopher:// Gopher gopher://gopher.msen.com/ Gopher predates the WWW and was very popular not long ago. It has largely been superseded by the WWW but gopher sites still store a great deal of electronic text information. For pure text gopher is still the best.
http:// HyperText Transfer Protocol http://www.uncg.edu/~bucknall/tim/ The protocol for transfer of hypertext documents written in HTML and JAVA. The primary protocol for the WWW.
mailto: Email mailto:rdralph@netstrider.com This protocol calls up the browser's email screen and posts the completed message to the email address provided. The browser needs to be set up properly to identify the email server and the identity of the sender.
news: Newsgroups news:comp.infosystems Provides access to Bitnet, Usenet and other newsgroup systems. You need to know the name of the newsgroup you want to access. Your Internet service provider has to allow access. Some newsgroups are not for everyone.
telnet:// Telnet telnet://steffi.uncg.edu/ Telnet provides a link to a remote computer. In many cases you'll need an account to login. In others, you may be allowed to login as a guest or with a special visitor's ID. You'll need to know login procedures. You'll also need to have a telnet application set up for your browser.
wais:// Wide
Area
Information
Server
wais://isr.umd.edu:8002/ Opens a connection to an searchable online database or information service. WAIS sites are generally reachable via http:// nowadays since direct access requires a browser WAIS proxy.

NOTE: Most computer systems reachable via the Internet will be running some flavor of Unix. Unix systems are generally case-sensitive; that is, upper and lower case makes a difference to them, so observe and copy URLs carefully, noting upper and lower case letters. Also remember that all computers are literal unto the death, so you'll have to get the URL exactly right or it's no go. If you get an error message from your browser when you enter an URL yourself, check it carefully for upper and lower case letters and remember to include the all the colons (:), slashes (//) and tildes (~) in the right places.


Host Domain Names

Host Domain Names can look daunting but if you understand the structure and naming conventions they start to make a lot of sense. Generally, they have the form:

Service/Machine Location Domain
www uncg edu
Example: www.uncg.edu

The only really tricky part is the Service or Machine name. Most WWW hosts use the service name WWW - it makes sense. The Location name is almost always mnemonic - an abbreviation of the location name or an acronym for it. A lot of the time the location name is not abbreviated at all. The Domain gives you some interesting information about the WWW site, especially if it's located within the United States. Below is a table of common domain acronyms used for WWW sites in the United States:

Domain Description
.com Commercial or corporate sites.
.edu Educational institutions.
.gov Government sites.
.mil Military sites.
.org Sites of associations, organizations, etc.
.net Network sites.

So, it's clear that the domain can tell you what type of site you can expect to be visiting.

Some sites in the United States and most sites in Europe and elsewhere use a geographical approach in their domains. The last two positions in the domain of a WWW site outside the United States will often tell you the country. Universal two-letter country codes are used. For example: fi = Finland, za = South Africa, uk = The United Kingdom, etc.

Sites in the United States that use the geographical approach can look pretty complex. For example, the domain for Guilford Technical Community College is: technet.gtcc.cc.nc.us - meaning, technet at Guilford Technical Community College, community colleges, North Carolina, United States. So, even the most complicated-looking ones do make sense.

Armed with this information you should be able to guess at the proper Domain Name for a WWW site. Try making one up for Stanford University or the IBM Corporation, for example. Try it!


Paths in URLs

There's not much to say about this. You've just got to know the correct path to the information you want at the site. This information is almost always provided to you correctly as part of the link you followed to get to a site, but remember that HTML developers are human and make errors, too. The paths to informatin at Internet sites really can't be divined, even with a crystal ball, except, possibly, when it comes to the names of people, and even that's problematic.

By now you've probably seen the tilde - ~. The tilde - pronounced like the name "Tilda" - generally precedes the name of a directory assigned to a person (although it's not uncommon for people to use "cute" aliases on the Internet instead of their real names). In the URL http://www.netunlimited.net/~rdralph/, for example, the rdralph part indicates an account name associated with the author - Randy D. Ralph. As you become more familiar with the WWW you'll begin to recognize and understand how to navigate using partial paths. Do watch out for the tilde - leaving it out is one of the most common errors in keying URLs.


If you have any questions, suggestions or comments please contact:

Randy D. Ralph
333 Washington Drive
Clemmons, NC 27012-7258

Email: rdralph@netstrider.com