A Tour of HTML Forms
and CGI Scripts

Overview
Forms on a Web Page
A Sample Web Form
A CGI Script in Perl
The Preamble
Reading the CGI Data
Perl Variables
Building a Response Page
Script Summary
Installing a CGI script
Exercise 1: Your Form, My Script
Exercise 2: Your Form, Your Script
Other Readings

Return to CGI Resources

This is a quick introduction to HTML Forms and CGI Scripts. It reviews some of the common form elements and then describes how a simple CGI script interacts with a web form. As prerequisite, you should already be acquainted with HTML, since I use it without explanation here. To create your own CGI scripts, you'll also need to know some programming language. I use Perl here since it's a fairly readable language.
     There are risks associated with CGI scripts. As you'll see, you are essentially allowing anyone on the Internet to execute a program on your system, as often as they like. If you write a script with flaws, it can pose a serious security risk to your account, your files or the entire system, and can also be a massive drain on system resources. This document is a simplified introduction to the elements of web forms and CGI scripts; it's not complete and is not guaranteed to be accurate.

Overview

It's first important to understand what HTML forms and CGI scripts are. They are very different, but work closely together. A form is simply a web page with some additional markup tags to instruct a web browser how to display the various form elements, such as checkboxes, selection lists, buttons and user-editable text areas. However, the web page itself does not process the data, nor does the web server, which doesn't know what you'd like to do with the user's answers. A separate program or script, must process that data, in whatever way you wish.
     HTML forms are just markup tags on a web page. CGI (Common Gateway Interface) is the language or protocol that the browser uses to communicate the data from the form to the web server. When the user submits her answers on a form, the browser bundles them up and sends them to the web server, which passes them on to your script for processing. A CGI script is any program which knows how to read that bundle of data. Some important points are:
  • The web page itself does not process the data entered on the form. Neither does the web server. There must be a separate script which the web page tells the server to send the data to, and which knows how to speak the language (CGI) that the server will use to send the data. You need both the web page and the script.
  • For security reasons, most web servers will not execute a file (even a script or program with the right permissions) unless it is in a designated directory, or sometimes has a designated filename extension. Even if you can put a web page on your system, you may not have write permissions in that directory. You'll have to ask your webmaster for the location of and how to write to that directory. You can't write forms without this, unless you use a script that's already installed.
  • Your script processes the data however you want, and then almost always returns an acknowledgement page. So the script must build up and return the html source for a web page. You occasionally see a C program doing this because they are faster to execute, but shell and Perl scripts are easier for this kind of text manipulation and are more commonly used for CGI scripts.

     In what follows, we'll first describe the various form tags you can use in your web page and give the HTML source for a sample page using most of them. The page works so you can try it out. We'll then go through the Perl CGI script which does the processing for the page.

Forms on a web page

It's pretty easy to place things like radio buttons, selection and check boxes, and interactive text areas onto your web page. It's a little harder to do anything with them, but we'll get to that later. Right now, we'll talk about how to incorporate form elements into a web page.
     Forms on a web page usually are included inside a single set of FORM markup tags. Like this:

    <FORM ACTION="http://www.site/your_script" 
             METHOD=POST> 
    ... 
    </FORM>  
ll the form tags described below should be inside this FORM region. The opening form tag specifies an ACTION attribute, which gives the URL of the CGI script you want to process the user's form data when she submits it. This is where you supply the link between your form and your script. The form tag also specifies a protocol or method for sending the data, which can be either GET or POST. The latter is both more secure and flexible and is recommended.
     Inside a FORM region, you can have any HTML elements you wish, including text, images or links. You can also have various form elements. Here's a list of the major form elements. Each has an example of what it looks like on a page, a template for the corresponding markup tag that's used in the source for your page, and a brief comment or description.
     For most of these markup tags, you'll specify attributes, such as type, display characteristics, and especially a name and a value. The name is essentially a variable name, which is passed to your script so it can refer to the information (the value) the user entered for that variable. The name can be anything as long as it's different for each kind of information you want the user to supply.
    
Text line:
<INPUT TYPE="text" NAME="vble_name1" SIZE=30 MAXLENGTH=50>
This yields a one line entry box where the user can enter text. SIZE is the length (in characters) of the displayed text. MAXLENGTH is the number of characters the text line will accept. Whatever the user types will be passed to your script as the value of vble_name1.
    
Password:
<INPUT TYPE="password" NAME="vble_name2" SIZE=30 MAXLENGTH=50>
Same as a text line, except the text won't be visible to the user as she types it. Try it.
    
Text Area:
<TEXTAREA NAME="vble_name3" ROWS=2 COLS=30>Default text. </TEXTAREA>
This form tag is actually a region. It yields a scrollable text box, ROWS high and COLS wide. The default text is optional and can be edited by the user.
    
Check Box:
<INPUT TYPE="checkbox" NAME="vble_name4">
A simple on/off flag. If the user checks it, the variable is on or true; otherwise, it is empty, or false. In your script, you would use this like if vble_name4 is true, then ...
    
Radio Buttons: Hehe Haha Hoho
<INPUT TYPE="radio" NAME="vble_name5" VALUE="he" checked>Hehe
<INPUT TYPE="radio" NAME="vble_name5" VALUE="ha">Haha
<INPUT TYPE="radio" NAME="vble_name5" VALUE="ho">Hoho
The user makes a single selection. Notice there is one input tag for each choice. They all have the same name but different values. In your script, this variable can take on only one of the values specified here (or perhaps no value): if vble_name4 equals he, then .... You can also "check" an initial default selection for the user. You still need to describe each button to the user with some text (Hehe). The values are often abbreviations for the descriptions.
    
Selection List:
<SELECT SIZE=2 NAME="vble_name6">
<OPTION SELECTED> ooh
<OPTION> eeh
<OPTION> aah
</SELECT>

Another form region. You can select a default for the user. Unlike radio buttons, the descriptions are also the possible values. If size is not specified, you get a popup menu instead. (If you have 100 choices, do not use a popup, since some of the choices will be off the screen.) Add a MULTIPLE attribute in the <SELECT> tag if you want to allow users to select more than one choice.
    
Submit and Reset:
<INPUT TYPE="submit" VALUE="Ok, let's go">
<INPUT TYPE="reset" VALUE="Oops, start over">

You need a SUBMIT button so the user can send the data back to the server. RESET is optional and clears all the user's answers. VALUES specify the text that will be displayed in the button. These don't do anything on this page.
    
Hidden:
<INPUT TYPE="hidden" NAME="vble_name7" VALUE="I am invisible.">
A form tag that won't be displayed at all to the user. (So I can't display an example of it.) But it's useful for passing information to the script that you don't want the user to see, such as preserving state information from previous pages. It's not secret though, since it's available if she views the source of the web page.


A Sample Web Form

Here's the HTML source for a simple web page with many of these form elements. You can try out the page in action.

  <html><head><title>Your Title</title></head>
  <body><h1>Your Heading</h1>
  <form 
    action="http://www.halcyon.com/sanford/
           cgi/perl_form.cgi"  method=post>
  Type something here: 
    <input type="text" name="some_text" 
      size=30 maxlength=50><p>
  Here's a checkbox:
    <input type="checkbox" name="box"> <p>
  Select one of these:
  <select name="choice">
    <option selected> Ha
    <option> He
    <option> Hi
    <option> Ho
  </select> <p>
  Now some radio buttons:
    <input type="radio" name="radbut" 
      value="oop" checked> Oop
    <input type="radio" name="radbut" 
      value="eep"> Eep
    <input type="radio" name="radbut" 
      value="urp"> Urp <p>
  <hr>
  Finally, you need to submit it:
    <input type="submit" value="Send it">
  or  
    <input type="reset" value="Erase all"> <p>
  </form>
  </body></html>  


A CGI Script in Perl

When the user submits her form data, the browser bundles all her answers up in a package and sends it to the script whose URL was specified in the ACTION attribute. CGI is the language or protocol used to construct this bundle and a script that knows how to unconstruct the bundle is a CGI script.
     You needn't be concerned about the details of CGI since in Perl (as in shell and C) there are packages which can read this bundle for you and return each of the user's form data in special variables which you can manipulate or process in any way you wish. The best known packages for Perl are cgi-lib.pl, written by Steven E. Brenner and cgi.pm, by Lincoln Stein. These packages contain a number of functions which are very useful for CGI scripts.
     I'm not going to describe all those functions, or even show how to include them in your script. Including a package is a somewhat advanced feature of Perl. Also, these package are popular and there are a lot of versions around, many of them older versions, which may not work as I shall describe. Instead, I'll give you the most useful of those functions, and show you how to copy and use it in your script.
     So here's a line by line account of a simple Perl CGI script. Remember, you can try out the page in action.

The Preamble

A Perl CGI script should always begin with the following lines:

    #!/usr/bin/perl
    # perl_form - a simple illustration of forms and Perl CGI  
The first line is mandatory for all Perl scripts. You may need to change the path to Perl for your site. The second line is a comment, giving the name of the script and what it does. Put lots of comments in all your scripts and programs. Everything to the right of # is ignored by Perl, as are blank lines.

    print "Content-type: text/html\n\n";  
This line begins some work. The web server knows it is executing a script, but has no idea what to expect in return. So the script must first tell the server what is coming, usually a web page of some sort. The print command simply writes back to whoever executed it, the web server in this case. It sends the magic words indicating a web page is about to follow. After the server sees this, it will pass the contents of further print statements back to the browser. This is how a script can return a web page.

Reading the CGI Data

To read the user's form data into your script, it's as simple as this:

    &ReadParse;  
This is the really useful function from the cgi-lib.pl package. It reads all the form data from the user and puts them into a Perl variable called %in. The Perl variable called %ENV has some good data in it as well. I'll talk about how to use these in a moment.
     In order to use a function, you must define it somewhere. Perl has special syntax for the use and definition of functions. To use a Perl function, preface it with an ampersand (&), as we did above. To define a function, use the special keyword sub, then the name of the function, then the block of code which defines the function, enclosed in braces. Something like this:

    sub ReadParse { 
        ... a lot of code ... 
    }  
Programmers will often place the definitions of their functions at the end of the script, and we will too, in the Summary section below. I'm not going to explain how this function works here (though more advanced or bold readers can view a separate tutorial devoted to reading CGI form data). It's fairly elegant code, but you really need to know a few of the gory details of Perl to understand it. Just copy and paste it onto the end of your script and use it happily somewhere near the beginning.
     Use it how? So the form data is in something called %in. What's that?

Perl Variables

There are three kinds of variables in Perl, distinguished by the character preceding the variable's name:
   
$     a scalar, like $in
a variable which can contain a string, an integer or a real number;
@    an array, like @in
a simple list of any kind of scalar data. E.g., the first one, the second one, ...;
%     an associative array, like %in
a list, but instead of being indexed by numerical order, you refer to the items in the list by any set of keywords of your choosing. E.g., the red one, the green one, the blue one,....
So %in is an associative array which contains the data the user submitted on the form. The keywords used to access the elements of this array are just the variable names you specified on the web form page. If you look at the source for The Sample Web Form above, you'll recall that we used variable names of: some_text, box, choice, and radbut. Consequently, the values that the user submitted for each of the form elements are stored in $in{some_text}, $in{box}, $in{choice}, $in{radbut}.
     You'll notice that to access a particular value in an associative array, you put the keyword inside braces following the name of the array. You also use a $ in front instead of a %. Many people find this confusing, thinking you should use a % instead, but it has a certain logic when you note that the particular value you want is in fact a scalar, even though it's coming from an array. You still use % when you want to refer to the array as a whole, as we'll see.

Building a Response Page

We can now start processing the form data in the script. In this case, we'll simply return a page reporting what the data was. To do this, we first need to start building the HTML source for a web page, to return to the web server that called the script. Recall Perl's print function does this:
    print "<title>The Response</title><h1>The Response</h1><hr>";
    print "Here is the form data:<ul>";  
We want an unordered list of the variable names and corresponding values that the user submitted. This is just the keyword and its corresponding value in %in. Rather than code each keyword by hand, Perl has some built-in functions and control loops that make this easy to do, and also means this script will work with any web form. So you can specify this script as your target ACTION in the form tag when you try building your own web form. (That's an important point -- read it again. You'll find this simple script is quite useful for debugging a web form.)
     Perl's built-in function, keys, returns a list of all keywords in an associative array. Then, Perl's foreach function will cycle through every element in an array, and execute a block of code once each time. Here it is:

    foreach $key (keys %in) {
        print "<li>$key: $in{$key}";
    }
    print "</ul>";  
This says, set the scalar variable, $key, to be successively, each of the keywords in the associative array, %in. Then each time, print a <li> tag followed by the keyword, a colon and a space, then the item in the associative array corresponding to that keyword. This works, no matter how many keywords (named variables in your script) there are, or what they are called. Finally at the end, print one </ul> to close the unordered list.
     Note carefully the variable substitution that occurs in the print statement. You don't literally print the characters "$key" since it's a scalar variable. Perl finds the value of that variable and prints that instead. If you actually wanted to print out "$", you would need to "escape" it by using "\$" inside the print statement, so Perl knows you don't want to do variable substitution. The same is true for arrays, "@" and "%". On the other hand, Perl does print literally any characters it doesn't recognize as a variable, such as <li>, and the colon and space. Perl makes printing very easy.
     I mentioned above that another associative array, called %ENV, has interesting information as well. As you might guess, these are a set of environment variables, that browsers send when they request a page from a web server. In fact, these variables are always sent for every web page, not just pages with forms, but you need a CGI script to read them. Are you curious about what your web browser is saying about you behind your back? Let's find out:

    print "and here are all the environment variables:<ul>";
    foreach $key (keys %ENV) {
        print "<li>$key: $ENV{$key}";
    }
    print "</ul>";  
And that concludes our CGI script. If you haven't tried out the page in action yet, you should now.


Summary

To bring everything together in one place, Here is the script again, including the definition of the ReadParse function.

    #!/usr/bin/perl
    # perl_form - a simple illustration of forms and Perl CGI 
    
    print "Content-type: text/html\n\n"; 
    
    &ReadParse;
    
    print "<title>The Response</title><h1>The Response</h1><hr>";
    print "Here is the form data:<ul>";
    
    foreach $key (keys %in) {
    	print "<li>$key: $in{$key}";
    }
    print "</ul>";
    
    print "and here are all the environment variables:<ul>";
    foreach $key (keys %ENV) {
    	print "<li>$key: $ENV{$key}";
    }
    print "</ul>";
    
    
    # Adapted from cgi-lib.pl by S.E.Brenner@bioc.cam.ac.uk 
    # Copyright 1994 Steven E. Brenner 
    sub ReadParse {
      local (*in) = @_ if @_;
      local ($i, $key, $val);
    
      if ( $ENV{'REQUEST_METHOD'} eq "GET" ) { 
    	$in = $ENV{'QUERY_STRING'}; 
      } elsif ($ENV{'REQUEST_METHOD'} eq "POST") {
    	read(STDIN,$in,$ENV{'CONTENT_LENGTH'});
      } else {
    	    # Added for command line debugging
    	    # Supply name/value form data as a command line argument
    	    # Format: name1=value1\&name2=value2\&... 
    	    # (need to escape & for shell)
    	    # Find the first argument that's not a switch (-)
    	    $in = ( grep( !/^-/, @ARGV )) [0];
    	    $in =~ s/\\&/&/g;
      }
    
      @in = split(/&/,$in);
    
      foreach $i (0 .. $#in) {
    	# Convert plus's to spaces
    	$in[$i] =~ s/\+/ /g;
    
    	# Split into key and value.
    	($key, $val) = split(/=/,$in[$i],2); # splits on the first =.
    
    	# Convert %XX from hex numbers to alphanumeric
    	$key =~ s/%(..)/pack("c",hex($1))/ge;       
    	$val =~ s/%(..)/pack("c",hex($1))/ge;
    
    	# Associate key and value. \0 is the multiple separator
    	$in{$key} .= "\0" if (defined($in{$key})); 
    	$in{$key} .= $val;
      }
      return length($in);
    }  


Installing a CGI Script

Unfortunately, I can't tell you precisely how to install a CGI script on your web server, or even whether it's possible. Each server is configured a little differently. You must ask your system administrator if CGI is enabled on your server (or read the documentation yourself) and if user-installed CGI scripts are permitted (some systems permit only the administrator to install CGI scripts). Then you'll typically need to find out: what's the path to Perl (needed for the first line of a Perl script), where to place the script, what to call it, what permissions to set for the script, and if your script needs to write to some files, how to set those files' permissions.
     I can briefly illustrate how Apache might be configured, a free and popular web server for Unix, but only one of many. Apache has three configuration files, httpd.conf, srm.conf and access.conf. On my system, all are located in /etc/httpd/conf/, though they could be anywhere, often somewhere under /usr/ or /usr/local/.
     The second of these files is for server resource management, and specifies how the server should handle requests from browsers. If a browser requests a web page, the server returns the html source for that page, that is, the contents of the file. But if a browser requests a CGI script, the server must know it's not supposed to return the contents of the file containing the script, but instead should run that script as a program, and return to the browser the results of the program. The server must be told the difference between an HTML file and a CGI script.
     There are a couple of different ways to do this in srm.conf, as the following two server directives in my configuration show:

    ScriptAlias /cgi-bin/ /home/httpd/cgi-bin/ 
    AddHandler cgi-script .cgi  
The first line tells the server that any file it finds in the directory home/httpd/cgi-bin/ is a CGI script to be run as a program when requested by a browser. The second line tells the server that any file (anywhere under the server's document root directory--specified by another server directive) whose name ends in .cgi is a script to be executed. The ScriptAlias directory and this .cgi file name extension could be anything. If your server isn't configured with these directives (or they have been commented out), then you can't run CGI scripts on your server. If it has only the first directive, but not the second, and if you don't have write permission for the cgi-bin directory, then your system administrator can install CGI scripts but you can't.

Exercise 1: Your Form, My Script

You can use this script, even if you're not on my site. (Web browsers usually don't care where they send a request, and my server will accept yours.) So try it with your own form. Compose a web form on your server and specify

    http://www.halcyon.com/sanford/cgi/perl_form.cgi  
as the action attribute in your opening <form> tag. No matter what or how many form elements you use in your own web page, or what variable names you use for them, the script will report the form and environment data, just as it did above. Of course, that's all it will do. For anything more interesting, you'll need to write your own script.

Exercise 2: Your Form, Your Script

If you have write access to your web server's cgi-bin directory, you can copy this script to your server and use it there. You'll need to make your script world-readable and world-executable: chmod 755 </path/script_name> at the unix prompt should do that. You'll also need to ask your webmaster for the directory or naming conventions and the URL of your script, which will typically be different from the path to the file name. Then use that URL in the action attribute of your web page's <form> tag.
     To make things more interesting, specialize the script so it only reports form data specifically mentioned in your web form. For example, if your form tags have NAME attributes of my_first_tag and my_second_tag, you could use a print statement in your Perl script like this:

    print "<ul>";
    print "<li>my_first_tag: $in{'my_first_tag'}";
    print "<li>my_second_tag: $in{'my_second_tag'}";
    print "</ul>";  
Try reformatting them, putting them in tables, adding images. Add a link which points to the contents of the $ENV{'HTTP_REFERER'} variable. Remember also to copy and paste the definition of the ReadParse function from above into your script.


Other Readings

For the most authoritative information on CGI, see the collection of references assembled by the folks at the World Wide Web consortium. I know of a few other CGI tutorials on the Web: Learn to Write CGI-Forms and a CGI and Perl Tutorial. I like Building-blocks for CGI Scripts in Perl; it has site-specific material but is quite good.. A sample chapter from a book on CGI scripts in shell and Perl is available on the Web. There is also Carlos' Forms Tutorial, which discusses forms but not CGI. Yet Another HTCYOHP Home Page discusses CGI scripts written in C. A good FAQ on CGI is A CGI Programmer's Reference.
     You can now continue to the more advanced material in CGI/Perl Tips, Tricks and Techniques or return to the CGI Resource index



CGI Resources. Copyright 1995-98, Sanford Morton
Last modified: Tue Mar 24 04:04:56 PDT 1998