The World Wide Web for the Clueless

Or, a really really basic overview of how things work

...without all the jargon


Originally written in 1995. Last revised 2004.

Real Quickly: What's the Web?

Let's start with a cliche' one line description of what the World Wide Web is. It's basically a lot of different files (all over the world) that are linked to each other, so that you can look at a file that has a link to another file and then follow that link to read the next file. (These days, it's not just files but useful programs, too.)

What makes this so powerful is that these files can contain graphics, or snippets of animation or music, and they can contain information that you normally would have to find with special programs like ftp or gopher (which is fine for nerds, but is a real pain otherwise). Or they can let you use statistical programs, user surveys, games, and so on. These days all this is taken for granted, but pre-1994, things were rather different!

There. Now on to things that are more interesting, like ...

What's the deal with Internet Explorer, Netscape, and Web Servers?

OK, so I have to use a little jargon.

Internet Explorer and Mozilla allow you to view World Wide Web files. They provide the pretty little window with all the neat buttons. Other competitors that do the same thing (basically) are things like Netscape, Mosaic, Opera or LYNX (LYNX is pure text, and hence not much fun unless you're stuck dialing in over a phone line and hence can't get graphics anyway). Explorer and Netscape and their kin are clients, of a type called web browsers (just a type of computer program). They're like different types of TV sets that show you what's being broadcast. (Not a great analogy, but there you have it). To extend the TV set analogy just a bit, some clients allow certain features that others don't, sort of like the way that some TVs have stereo sound and others don't.

If Internet Explorer and Netscape are like types of TV set, at the other end is the broadcasting station, the server. The server basically lets other people (all the people out there with their clients) look at a set of files --- sort of like how a broadcasting station lets viewers see TV programs. There were/are different types of servers, ranging from the now defunct Plexus (which was written in a language called Perl), to NCSA's old HTTPD (a C-based program), to Apache, to commercial servers like Netsite.

Now, the actual files, called pages (or TV programs, if you will), sit around on a machine that the server knows how to get to (like videotapes of a TV show sitting around inside VCRs), waiting for someone to come look at them (maybe something like the video-watching system you find in some hotels). And if you happen to own a file (the videocasette), you can always rewrite it if you want to. Your primary personal page is often called a home page, and corporations and schools often have their own home pages as well.

(These days, many pages are a bit more complex - they are more like computer programs themselves, assembling pieces together or sometimes using specialized software to produce moving images and so on.)

So, just to go over this again, you have a web server (broadcasting station) that sends home pages (TV shows) over to your client (TV set). There! Not so bad so far, eh?

There are many, many servers all over the world, and their pages point to other servers. That's why it's like a giant web.

Web Server Names, HTML, and URLS

[Everything from here is new 2002!]

First off, web servers have unique names so that people can tell them apart. For example, www.cnn.com is a web server, as is www.nytimes.com, and www.mit.edu, and so are news.bbc.co.uk (in the United Kingdom), www.asahi.co.jp (in Japan), and so on. (The "www." helps identify a web server, but not all web servers use those letters - as an example, Yahoo has news.yahoo.com, maps.yahoo.com, store.yahoo.com, shopping.yahoo.com, and www.yahoo.com, and probably others - each one is completely different.) If the name is different, it may be a different web server (e.g., for a while web.mit.edu and www.mit.edu were totally different), or it may be the same web server with multiple "aliases" that all go to the same place. (Note: for the name of a web server, lower case or upper case doesn't matter - WWW.CNN.Com is the same as www.cnn.com. However, capitalization does matter later on.)

You've probably heard of URLs ... A URL is a "Universal Resource Locator," a long fancy phrase for what is essentially a street address for a particular house - or in this case a web page or file.

You may be aware that on most computers, files and programs are stored in folders (also known as directories). Hence, in Drive C, you might have a folder called "FolderA," inside which is "FolderB," inside which is your poem about your pet kitty cat (we'll call it "KittyCat.htm" on a Windows machine, or "KittyCat.html" on a UNIX machine). The full sequence might be written in this form on a Windows machine: "C:\FolderA\FolderB\KittyCat.htm" ... on a UNIX machine you might have it written in this form: "~/FolderA/FolderB/KittyCat.html".

(The ".htm" or ".html" at the end of the file merely means it is written in a form of computer code called HTML - Hyper Text Markup Language. It's one way that people put in pictures and links into a web page. More on HTML later.)

So, back to URLs. A URL is what tells a browser where to find the poem about the pet kitty cat, and it uses exactly this directory/folder hierarchy.

It works like this: just as we have country->state->town->street->housenumber in our normal mail, the URL lists the web server name first, and then adds in the directory/folder information, and finally it specifies a particular "thing" to get.

So, if you had a web server on your computer called www.MyWebServerName.com, and suppose it allows people on the Internet to look at everything in Drive C ... suppose you wanted people to find your poem KittyCat.htm. The URL for your poem might thus be: www.MyWebServerName.com/FolderA/FolderB/KittyCat.htm - now doesn't that look familiar? (Notice it uses the UNIX-style forward slash "/" instead of Windows-style backslash "\" - these days either way works, but the forward slash was the original and is the standard format.)

And please remember that capitalization often matters for folder and file names - just not the server name.

Index.htm(l)

What happens if someone just enters "www.MyWebServerName.com/FolderA"? The default action is for the web server to try to list the contents (all the files) of FolderA! That is, unless you have an index.html file inside FolderA - the index.html is shown by default if no other page was specified. To restate: when a server is asked to go to a folder or directory where no file name is given, if it finds an "index.html" file there, it will put that index.html page up. Otherwise, the web server will try to list the contents of the folder or directory. (Sometimes, if preset/programmed not to do this, it will just give an error instead - this is now very common on big commercial servers.)

A useful thing to know is that, because index.html is put up automatically, this is usually why you usually don't have to type in anything after "www.ServerName.com" or "www.ServerName.com/FolderName" - the server handles finding the appropriate default file for you. And this is why so many files are named "index.html."

(Minor trivia: technically every folder or directory should end in "/" including the web server name itself - so when you see "www.cnn.com/" or "www.mit.edu/people/rei/" this is actually correct etiquette. Hardly anyone does this, though.)

HTTP://

And you've probably seen "http://" ... well, that icky-looking "http" stands for (long icky words) "Hyper Text Transfer Protocol." All the "http" does is set the computer "language" and the type of server by which web pages are transmitted from computer to computer. In the old days, there were other common ways of getting information around the internet, including "gopher" and "ftp" (ftp stands for "File Transfer Protocol" and is still commonly used for uploading and downloading files). Both gopher and ftp use their own types of servers - gopher and ftp servers - that are not the same as web servers. So, if you see a URL that says "ftp://blah.blah.com/blah/blah" - you'll know it's not using a web server at all! You may be using a web client (aka browser), but the server giving you the file is an ftp server. (And someday something new might replace http - wouldn't that be fun.)

To clarify between HTML and HTTP: HTML is for humans to tell computers how to display and format a file so that it looks pretty, whereas HTTP is for computers to tell other computers how to transfer information.

Playing with HTML

Below I will include some simple HTML - the language whereby one can make italics and boldface and even do links. Note: this can be useful on some Web message boards that allow HTML formatting - you can look like a real pro in your message boards posts.

(If you want to be serious about HTML, there are books and more detailed websites available, such as Creative HTML Design.2 by Lyda Weinman and William Weinman.)

You can play around with HTML on a Windows machine even if you don't have a web server - because the file is sitting right there on your machine and your web browser can read it right there. People who can access your machine through a local network may also be able to view the file - and that can include people who break into your machine!

But in general, if you want a web page to be seen on the internet by other people, you need a web server to "serve it up" like that TV transmission station. However, setting up a web server is way beyond the scope of this document. Let's just stick to HTML here....

These days Microsoft Word will automatically convert things like boldface and URLs and line breaks and so on into HTML ... which can be convenient but it's not as much fun, and it also means Word won't recognize your HTML code as HTML, and their HTML code is full of random cruft that proclaims that Word wrote it. Knowing raw HTML is still useful on many internet message boards and web site editors. To avoid Word's "helpful HTML-ifying," I suggest using WordPad or NotePad instead.

So, go ahead, open a text document in WordPad or NotePad and play around, and then view the results in a web browser. Remember to make sure the file name ends in .htm (Windows) or .html (UNIX) so that your browser knows it's HTML. For Windows, once you're done making your .htm file, just save it and double click on its icon in its folder, and Explorer will automatically display it.

Some Sample HTML

Here are some simple bits of HTML to try.

Underlines, Bold, Italics, New Lines

To underline you start the section to be underlined with <u> and then end the section with </u>. So <u>foobar</u> yields foobar. <b>bold</b> is bold, <i>italic</i> is italic. (Make sure to use the forward slash "/" in the </i>)

To start a new line (line breaks)
you can simply use<br> ...
Like this.
The lazy person's way of putting in a line of extra space is to use <p>

Like this (note empty line below).

(Note that on some message boards or online web editing sites, just putting in an extra line by hitting the "Enter" key is enough to leave a line. Otherwise you do have to resort to using <p>.)

Making Links

To make a link, you use this format: <a href="URL Here">Visible Name</a> ... so to link to this page from your homepage, you might put it as:

<a href="http://www.mit.edu/people/rei/wwwintro.html">Rei's ancient WWW intro</a>

(Don't forget the http:// part, with both forward slashes! Also, you don't need the " " around the URL any more, but I have the old habit.)
Anyway, the above HTML comes out as:

Rei's ancient WWW intro

Adding Images

You can put in pretty pictures by this method:

<img src="image URL">

So, as an example (using one of my offsite images):

<img src="http://www.art.net/Studios/Visual/Rei/DarkWoodsOwl-mini.JPG">

produces [you should see my little owl picture]:

Images and Links

And I can make the picture itself a link to somewhere else this way (this will go to a different page):

<a href="http://www.art.net/Studios/Visual/Rei/woodsowl.html"> <img src="http://www.art.net/Studios/Visual/Rei/DarkWoodsOwl-mini.JPG">My Owl Picture</a>

My Owl Picture

Well, That's About It...

Well, if you've come this far, and you started off clueless, I must congratulate you for wading through all these explanations. I hope this page was helpful to you.

When I first wrote this page, it was back in the days when Mosaic was the main browser, and people regularly confused Mosaic with the Web itself ("What's this Mosaic thing I hear about?"). Back then, I think this page served as a useful introduction to the Web for a surprisingly large number of people. These days, with URLs splashed on TV and radio and billboards, there's probably much less of a need for a page like this. Still, just in case, here it is.

Enjoy the World Wide Web!

P.S. <hr> produces a nice flat line:


Text copyright 1995, 1998, 2000, 2002 by Eri Izawa

rei (at) mit.edu