HTML Primer I

HTML Primer

Basics

HTML stands for "HyperText Mark-up Language." Hypertext means that you can format a document so that when your reader clicks on a word or an image with his or her mouse, a new document opens. But not all words are hypertext; not all words take your reader to new documents. Some words are intended only to be bold or italic and do not link anywhere. HTML was invented in order to format text over the internet, to make some words bold, others italic, others indented, others link to documents, and so forth.

Why? There are so many word-processing programs on the market, and so few of these programs can read each others' documents. There are Macintosh machines, UNIX machines, LINUX machines, SUN machines, O/S machines, and Windows machines. Each one of these machines speaks a different computer language. If one produces a document, chances are high that no other machine can read it. So the gurus at the World Wide Web consortium decided to find a way for millions of different machines, speaking different computer languages, to read the same document. HTML is the result: a universal document mark-up language which any browser-equipped computer can read. Now any client machine (that's you) can request a document from a server (that's the source of the website) and read it.

HTML is called a mark-up language because you mark-up a document with tags, which are simply short codes to tell your reader's machine when to make a word bold or italic, or when to indent a paragraph, or when to link to another page, and so forth. So no matter what machine your reader is using, whether it's a Macintosh or a PC, he or she can read your document just as you intended it to look. Let's say, for example, that you want to write a document on your new blueberry imac and you want people who own IBMs to be able to read it. There are three steps. 1) You compose the document, 2) you mark it up with HTML tags, and 3) you post it to an internet server (now the document is called a page). Chances are that the server is not an IBM or an imac, but a UNIX, SUN, SPARC, or Windows NT machine. More machines, more languages! How can anyone read it?

Your text will be read by a browser application. So what is a browser? Netscape is a browser. Internet Explorer is a browser. Lynx is a browser. It's an application (or program) which sits on your home computer and interprets data that streams in through the phone lines. When you open a web page such as this one, your home computer sends a request to the computer which houses BedeNet, "Please send the data in the directory 'res'. The file is called 'primer.html'". BedeNet's host then sends the document in the form of data through the telephone lines and your browser interprets it. The file "primer.html" stays on BedNet unaffected. The data is transferred, but the document/page remains the same. As an analogy, imagine a piece of music which can be played by any number of instruments. The music, a collection of staves and notes on a sheet, makes no sound. It requires an instrument to interpret those notes. Similarly, a marked-up document will display no images, and neither will it show formatted text such as bold words or italicized words. It is merely a collection of letters and tags. You require a browser, or instrument, to bring your text to life.

Like a musician reading notes, your browser reads an HTML document sitting on a distant machine. And just as each musician has his or her own characteristic way of interpreting music, so does a browser. You can configure a browser to your own specifications: you can choose your favorite display font, the background color of the web pages, which options you want enabled, and so on. The HTML documents, unaffected by your choices, sit on some distant computer where they will be read by hundreds or thousands of other browsers, and each of these browsers will have different configurations. This means that in order for an HTML document to be legible to a number of browsers, it must be simple. And HTML is nothing if not simple.

For example, to tell your reader's browser to make a word bold, you insert the code "" into your marked-up document. A line marked like this in an html file:

[Text reads:] It sure was cold. will be displayed by your reader's browser like this:

cold

The in the text document tells the browser to make the word bold. You'll notice that the second B tag had a backslash. This tells the browser that the bold formatting is over. Remember, the data streams in a single line. The browser sees the bold tag, then makes everything bold until it encounters a command to stop making the text bold. This is why, typically, one tag begins the formatting, and a second tag ends it. For example, to set a text in italics, you insert the code , then type the word, then insert the code . cold will be displayed as cold

All HTML is like that: one tag opens the formatting, a second one closes it.

Alright, enough prelude. Down to business.

Writing HTML

First, you need a text editor. You can use any text editor or word processor to write an html file. What's important here is not so much how you write it, but how you save it. Your text needs to be in a format the browser can understand. Any simple text editor will do, but I recommend BBEdit Lite. This is a free application, small, and quick. It is designed to write HTML. You can find it on virtually any freeware or shareware site. Applications like Corel WordPerfect allow you to write in HTML, which means that they automatically save the file as text. In fact, most writing applications allow you to Save As Text. This is what you want. Whatever the application you choose to use, make sure you can Save As Text. Check the manual of your favorite writing application.

Next, if you want pictures on your site, you will need a graphics editor. Any number of applications will do, and many are available free on the internet. Make sure your graphics application can save as GIF or JPEG. These are data formats. Putting pictures in your documents requires that they be in any one of a number of formats. One of the most popular is "gif" (pronounced giff). This format was developed by Compuserve. It is great for small images with large areas of similar color (for example, a large area of red such as on the Canadian flag or a field of green such as on the Irish flag). Another format is "jpeg" (pronounced jay-peg). This format is good for compressing photographs and other detailed work. There are usually options for jpeg compression offered by your graphics program (light, medium, high). The smaller the file, the quicker a browser can display it. One rule of thumb: try to keep every page under 100K.

And finally, you can put sounds on your web page. This is slightly more complex, so I'll save it for a later installment.

So now you have a text editor and a graphics editor and you're ready to go. There are only two rules:

RULE 1. Every HTML document, no matter what it does, has a basic format. At the top of the document, you must write <HTML>. This alerts any browser that the document it is reading is to be displayed as an HTML document. And, as you learned only moments ago, every tag has its complement. So the end of every HTML document closes with the tag </HTML>.

RULE 2. And the only other requirement is that you include <BODY> and </BODY> tags somewhere between the HTML tags. Every head must have a body. So, every HTML document looks like this right out of the box:

<HTML>

<BODY>

</BODY>

</HTML> If a browser reads this four-line document, it will not complain. Anything less is a problem. Anything more is a challenge.

The Challenge

There are two sections of an HTML document: a)the head, and b)the body.

A) The head fits, as you might expect, just above the body. The two tags are <HEAD> and </HEAD>. Save for a few exceptions, the head contains information that will not be displayed by the browser. For the most part, you will never use the head area. The major exception here is the title of your page. If you look at the top of the browser window that is displaying this page, you'll see "HTML Primer." That's the title of this page. To put in a title, you use the <TITLE></TITLE> tags. These go between the head tags. So, here's an empty page with the title "My Page":

<HTML>

<HEAD>

</HEAD>

<BODY>

</BODY>

</HTML> That's a legible HTML document. You can post it to your server and anyone with a browser can read it. The head also includes information which will be picked up by search engines. If you want your page to be recognized by search engines, you use the <META> tag. This allows you to specify author and keywords. For example, <META AUTHOR="Harris" KEYWORDS="HTML,primer,bede" >. But few people bother with these. Remember, you don't need HEAD tags for your HTML document to be legible. (One note, don't worry about blank spaces between words or quotes: browsers default to a single space if they encounter so-called white space. After all, it's just a data stream to a browser.)

Finally, Java and Javascript code go in the HEAD area. This is not basic HTML, so I won't address it here except to say that these scripts offer both increased functionality and increased infiltration of your home computer. Javascripts can find your email address, assess what type of machine and which browser you use, and which sites you've visited. Unfortunately, these same scripts sometimes allow for secure browsing and internet shopping. I'll discuss HTML security in a later installment.

B) The BODY area contains most of your text and tags. This is where you'll format much of your text. To begin, you can simply type unformatted text between BODY tags, and it will be displayed. For example:

<HTML>

<HEAD>

</HEAD>

<BODY>

This is my Home Page.

</BODY>

</HTML> This will produce a page with the title "My Page" and displaying the sentence, "This is my Home Page" in the browser window.

There. We've just covered all the basics. You now can code HTML. The next installment describes how to format text. Click below to go there.

Next Installment