Calendar

March 2010
S M T W T F S
« Oct   Sep »
 123456
78910111213
14151617181920
21222324252627
28293031  

Categories

Links

Archives

Blogroll

Doofus Watch

Archive 2010 March 31

PHP and Lynxcgi

31 March 2010

The problem here is to make a PHP script work with lynxcgi as well as it does when it is served by an http server.

Lynxcgi is a dummy protocol which Lynx, the one true browser, can use to run CGI scripts by pretending to be a server. This ability has to be compiled into Lynx and enabled in the lynx system-wide configuration (lynx.cfg). A real server that allows PHP knows which interpreter to run on a script by the extension on the filename. That is, a server knows to use PHP on index.php. But Lynx does not know this, so lynxcgi can only run scripts which start with a shebang line, which looks something like #!/usr/local/bin/php . And unfortunately that only works on Unix-like systems (Unix, Linux, BSD, and so forth).

To use a document that contains PHP usually you just point your browser at the document's uri, which is something like http://example.com/index.php. Many times you will only be dimly aware (on account of the .php on index.php) that you are making the server run php to produce the HTML document it sends you. If index.php is the default index, you may not know at all if you do not put the filename on the uri: http://example.com/. It is even fairly easy to get a server to run PHP on documents that do not have the .php extension, allowing people who want to hide the fact that they use PHP to do so.

A special, but common, case of this is using PHP on your own server on your local machine. In this case the uri will look like: http://localhost/index.php if you are the web master and have put the PHP document in the server's document root, or http://localhost/~username/index.php if the document is in an ordinary user's public web directory. You may do this often if you are using a web interface to a database (like phpMyAdmin) or a blog or wiki on your local machine, or are writing and testing scripts intended for the web.

Lynxcgi lets you do this sort of thing without a server, or without bothering the server you have. Sort of. Some scripting languages work and play better with lynxcgi than others. And to be nitty-picky about it, PHP is usually called a preprocessor language rather than a scripting language. Lynxcgi uri's (which are not really uri's because lynxcgi is not an official protocol) look a little different. When you go through a server you may not know and do not have to know where index.php is in the machine. Uri's are not filepaths. But lynxcgi has to have the filepath. So http://localhost/~username/index.php might look like this to lynxcgi: lynxcgi://localhost/home/username/public_html/index.php. Lynxcgi will not work with server shortcuts such as the tilde (~).

Now one particular problem with PHP is also one of its virtues. PHP passes everything through that is not enclosed in <?php…?> brackets. This is pretty much all there is to the distinction in calling PHP a preprocessor language instead of a scripting language. So if you want something in PHP to show up in PHP output you can echo or print from inside PHP tags or you can just not put stuff in PHP tags to begin with. Scripting languages do not do this. You have to use a specific command within the script to get stuff out. This is why the shebang line is not a problem in scripting languages.

To work, the shebang line must be in the very first line of a file and must start in the very first column of that line. So you cannot put it in <?php…?> tags because then the > would be in the first column of the first line. And because it is not in PHP tags, PHP will put it out unmolested. If you do not have the shebang line, lynxcgi will not know what interpreter to use on the script.

The next problem is that when you use a script with php in the shebang line (something like #!/usr/local/bin/php, your path may vary) what you get is the command-line version (CLI) of PHP, and it is not expecting to get queries. If you use forms in your scripts PHP CLI will be none the wiser and will not populate the $_REQUEST, $_POST, or $_GET global arrays.

This is why you cannot use lynxcgi on PHP scripts that are intended to be served by a real http server. But you can write a wrapper document that lynxcgi can use to run the script.

This comes about because I am writing a MySQL table editor in PHP. I want it to do stuff similar to what phpMyAdmin does, but without frames, Javascript, cookies, and bunches of silly graphical buttons. What I am trying to do is really irrelevant, but it is why my example documents are named tabled.php and tabled_lynx.php. That's "tabled" for "table editor."

The plan is to point lynxcgi at tabled_lynx.php, which would mean opening a link in lynx something like this lynxcgi:/home/webworker/public_html/tabled_lynx.php (the /localhost/ thing is optional). And from that script we will run tabled.php which could be accessed directly from any browser with http://localhost/~webworker/tabled.php or something like that if we were running a server and wanted to use it.

So the first line of the wrapper has to be the shebang line.

#!/usr/local/bin/php

Lynxcgi does not know how to run the script without it. But because this is PHP, the shebang line leaks out into the document and looks ugly. It is also invalid HTML because it is plain text that comes before the DOCTYPE. So we need some PHP to get rid of it.

<?php
ob_clear();
?>

That ob stands for the output buffer. This says just erase it. There are a number of op_ functions to manipulate things that you think you might have already sent but have not. It is sort of like if you leave a letter out for the lettercarrier, but then think better of it. You can take it back if you grab it before the lettercarrier gets it.

Now the whole point is to run tabled.php. You do this by using include. So the whole tabled_lynx.php now looks like this:

#!/usr/local/bin/php
<?php
ob_clear();
include("tabled.php");
?>

Which should work perfectly if you are not using forms and do not want to handle any form data. It also supposes that tabled.php is in the same directory as tabled_lynx.php and that the links you want to follow in tabled.php are all relative links. Lynx knows that it came in with lynxcgi and a certain path, so Lynx will stick that information on any relative link it finds. But the lynxcgi magic will stop working if you follow a link that starts with, say, http:.

If you do want form data, you have to parse it yourself. Lynxcgi provides enough of a dummy environment in its role as phoney-balogna server that scripts can get their form data, but the shebang called php CLI is asleep at the switch form-wise. So you have to populate the $_REQUEST, $_POST, and $_GET globals yourself. Here is the complete tabled_lynx.php:

#!/usr/local/bin/php
<?php
ob_clean();
if($_SERVER['REQUEST_METHOD'] == 'GET'){
parse_str($_SERVER['QUERY_STRING'],$_GET);
$_REQUEST = $_GET;
}
if($_SERVER['REQUEST_METHOD'] == 'POST'){
parse_str(fread(STDIN,
          $_SERVER['CONTENT_LENGTH']),
          $_POST);
$_REQUEST = $_POST;
}
include("tabled.php");
?>

And here is a test version of tabled.php, to show that tabled_lynx.php works:

<!DOCTYPE html 
PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>

<meta http-equiv="Content-Type" 
content="text/html; 
charset=iso-8859-1">
<title>Tabled</title>
</head>
<body>
<h1>Tabled</h1>
<form action="<?php 
echo $_SERVER['PHP_SELF']; ?>" 
method="get">

<p><input type="submit" 
name="OK" 
value="Press"></p>
</form>
<form action="<?php 
echo $_SERVER['PHP_SELF']; ?>" 
method="post">
<p><input type="submit" 
name="Check" 
value="Mash"></p>
</form>
<pre>

<?php 
echo print_r($_REQUEST,1);
echo print_r($_POST,1);
echo print_r($_GET,1);
?>
</pre>
</body>
</html>

I posed this problem on the USENET group comp.lang.php where Jerry Stuckle suggested the wrapper and Thomas Lahn suggested the ob_clear and provided a link that proved helpful in figuring out how to parse the POST data.