Informática geek, matemáticas, pensamiento crítico y alguna otra cosa de vez en cuando.

Showing posts with label PHP. Show all posts
Showing posts with label PHP. Show all posts

2010-01-26

Subida automática de archivos a GMail mediante IMAP

(English version available)

Hace un tiempo había una librería llamada libgmail capaz de comunicarse con GMail usando su protocolo propietario. Sin embargo, los protocolos propietarios tienen un problema muy grave: pueden cambiar en cualquier momento sin previo aviso, y cuando eso ocurre, los que han hecho la ingeniería inversa tienen que averiguar qué cambios se han producido para adaptarlos. Además, hay una norma expresa en los términos de servicio de Google prohibiendo el acceso «por ningún otro medio distinto de la interfaz facilitada por Google». Si se combina eso con el hecho de que libgmail está abandonado desde hace tiempo, puede uno imaginarse el resultado: libgmail ya no se puede utilizar.

Sin embargo, hay otras maneras de usar GMail para subir archivos adjuntos de forma automática. La que probablemente proporcione mayor flexibilidad es usar el protocolo IMAP para realizar las transferencias. El acceso IMAP es proporcionado explícitamente por GMail para que los usuarios no necesiten usar la interfaz web y puedan usar un cliente de email que soporte IMAP para manejar el correo, por lo tanto se puede argumentar que el uso de IMAP es acorde a los términos de servicio. No soy abogado, sin embargo, así que pregunte a uno si quiere una garantía legal.

Por desgracia, no hay actualmente ninguna librería similar a libgmail con soporte IMAP en vez del protocolo propietario de GMail. [Actualización 2011-03-09: Me acabo de enterar de que hay una nueva versión de GMailFS que utiliza el protocolo IMAP. Sin embargo, es un sistema de archivos para Linux, no una utilidad de línea de comandos compatible entre sistemas. No conozco aún una herramienta de línea de comandos que esté disponible para varios sistemas, aparte de la aquí presentada.] He escrito un programa en PHP a modo de prueba de concepto de una herramienta así. Es un ejemplo completo que crea en una cuenta de GMail un mensaje con la etiqueta «ftp» y un archivo adjunto, cuyo tipo MIME es «application/octet-stream». El programa requiere que la extensión IMAP esté instalada en PHP. En Debian Lenny, eso significa instalar el paquete php5-imap.

El acceso IMAP debe estar habilitado en la cuenta destino antes de empezar. La etiqueta 'ftp' debe existir previamente, así que debe ser creada antes de intentar subir ningún archivo. Después de la subida, se puede recuperar el archivo mediante la interfaz web de GMail.

Un inconveniente de este sistema comparado con libgmail es que, puesto que se usa la codificación Base64 para los archivos, el tiempo de subida es cerca de un 33% más. Sería posible en teoría subir archivos usando la codificación Base85, que reduce el tiempo extra a un 25%. Sin embargo, en ese caso la recuperación del archivo requeriría un decodificador Base85 y probablemente no sería posible simplemente hacer clic para descargarlo.

He aquí el código PHP, diseñado para usarse con la versión de línea de comandos. La forma de uso es:
php imap-upload.php <archivo_a_subir>

<?php

/**********************************
 *
 * Configuration section
 *
 */

// User name (full email address)
$usr = 'user@gmail.com';

// Password. Leave unset for being asked (sorry, with echo - yuck!)
unset($pwd);

// IMAP server
$svr = 'imap.gmail.com';

// IMAP port
$prt = 993;

// Folder / Tag (must exist prior to running this)
//$folder = '[Gmail]/Drafts'; // This one would use the actual GMail drafts folder.
                              // NOTE: must be localized, e.g. in spanish the name
                              // is '[Gmail]/Borradores'
$folder = 'ftp';  // Same tag used by gmailftpd.py

// Encoding for the filename string we are passing from the command line
$filename_enc = 'UTF-8';

/*
 *
 * End of configuration section
 *
 ***********************************/


// Encode the subject in MIME quoted-printable format as per RFC 2047
function encode_subject($title, $encoding)
{
  $ret = "=?$encoding?Q?";
  $linelength = 9; // length of "Subject: "

  for ($i = 0; $i < strlen($title); $i++)
  {
    if ($linelength >= 65)
    {
      $ret .= "?=\r\n =?$encoding?Q?";
      $linelength = 1; // length of initial space
    }

    if ($title[$i] >= ' ' and $title[$i] <= '~'
        and $title[$i] != '=' and $title[$i] != '?'
        and $title[$i] != '_')
    {
      if ($title[$i] == ' ')
        $ret .= '_';
      else
        $ret .= $title[$i];
      $linelength++;
    }
    else
    {
      $ret .= '=' . strtoupper(bin2hex($title[$i]));
      $linelength += 3;
    }
  }
  return $ret . '?=';
}

// quoted-string for parameters as per RFC 822
function quoted_string($string)
{
  return '"' . strtr($string, array("\\"=>"\\\\",
                                    "\""=>"\\\"",
                                    "\r"=>"\\\r",
                                    "\n"=>"\\\n")) . '"';
}

// grab 7-11 random lowercase letters
function randword()
{
  $result = '';
  for ($i = 0; $i < 7 + mt_rand(0, 4); $i++)
    $result .= chr(mt_rand(0, 25) + 97);
  return $result;
}

$filepath = $argv[1];

if (! isset($filepath))
  die("No filename specified to upload\n");

$filename = basename($filepath);

if ($prt == 993) $prt = ''; else $prt = ':' . $prt;

if (! isset($pwd))
{
  echo "pwd: ";
  $pwd = substr(fgets(STDIN), 0, -1); // remove extra \n
}

// Open the imap stream
$stream = imap_open("\x7B$svr$prt/ssl}$folder", $usr, $pwd);

// Report all possible errors
print_r(imap_errors());

// Exit on trouble
if ($stream === false) die("\nStopping\n");

$rnd1 = randword();
$rnd2 = randword();
$bndr = "------------" . randword() . randword();

$result = imap_append(
            $stream
            , "\x7B$svr$prt/ssl}$folder"
            , "From: $usr"
              . "\r\nTo: $rnd1@$rnd2.com"
              . "\r\nSubject: " . encode_subject($filename, $filename_enc)
              . "\r\nMIME-Version: 1.0"
              . "\r\nContent-type: multipart/mixed;"
              . "\r\n boundary=\"$bndr\""
              . "\r\n"
              . "\r\nThis is a multi-part message in MIME format."
              . "\r\n--$bndr"
              . "\r\nContent-Type: text/plain; charset=$filename_enc; format=flowed"
              . "\r\nContent-Transfer-Encoding: quoted-printable"
              . "\r\n"
              . "\r\n" . imap_8bit($filename)
              . "\r\n"
              . "\r\n--$bndr"
              . "\r\nContent-Type: application/octet-stream;"
              . "\r\n name=" . quoted_string($filename)
              . "\r\nContent-Transfer-Encoding: base64"
              . "\r\nContent-Disposition: attachment;"
              . "\r\n filename=" . quoted_string($filename)
              . "\r\n"
              . "\r\n" . imap_binary(file_get_contents($filepath))
              . "--$bndr--"
              . "\r\n"
              . "\r\n"
            );

if ($result)
  echo "Succeeded adding file\n";
else
  print_r(imap_errors());

$check = imap_check($stream);
echo "There are now ". $check->Nmsgs . " messages in the $folder folder\n";

print_r(imap_errors());

print_r(imap_alerts());

imap_close($stream);

?>

Y de regalo un pequeño rompecabezas. El generador de letras al azar puede generar entre 7 y 11 caracteres. Suponiendo que el generador genera números enteros uniformemente distribuidos entre 0 y 4 inclusive, ¿cuál es la probabilidad de cada una de las longitudes? Una pista: no es 1/5.

Automated upload to GMail via IMAP

Some time ago, there was a Python library called libgmail which was able to communicate with GMail using their proprietary protocol. However, proprietary protocols have a serious drawback: they can be changed at any time without prior notice, and when that happens, those who have reverse-engineered them have to figure out the changes. Furthermore, there's an explicit rule in Google's ToS forbidding access «by any means other than through the interface that is provided by Google». Combine it with the fact that libgmail has now been abandoned for a while, and you can figure out the result: libgmail is no longer usable.

There are, however, other ways of using GMail to do automated attachment uploads. The one that probably provides the most flexibility is to use the IMAP protocol to do the transfers. IMAP access is explicitly provided by GMail so that users do not need to use the web interface and can use an email client that supports IMAP to handle mail, thus it's arguable that using IMAP is in accordance with the ToS. I am not a lawyer, though, so ask one if you want to be sure.

Unfortunately, there's currently no library similar to libgmail which supports IMAP instead of the proprietary GMail protocol. [Update 2011-03-09: I just learned that there's a new version of GMailFS which uses the IMAP protocol. However, it is a filesystem for Linux, not a cross-platform command line utility. I don't know yet of a command line tool that is compatible across systems, besides the one presented here.] I've written a PHP program as a proof-of-concept of such a tool. It's a complete example that creates a message with the tag 'ftp', having an attached file with MIME type 'application/octet-stream', in a GMail account. The program requires the IMAP extension to be installed in PHP. In Debian Lenny, that means to install the package php5-imap.

IMAP access must be enabled in the target GMail account before starting. The files are tagged with the label 'ftp' which must already exist, so create it before trying to upload. After uploading, you can retrieve the file via the GMail web interface.

A drawback when compared to libgmail is that, as it uses Base64 encoding for files, the upload time is about 33% greater. It would be theoretically possible to upload files using Base85 encoding, which reduces the overhead to 25%. However, in that case the retrieval of the file would need a Base85 decoder and it would probably not be possible to just click on the file to download it.

Here's the PHP code, intended for being used with the PHP command line interface. Usage is:
php imap-upload.php <file_to_upload>

<?php

/**********************************
 *
 * Configuration section
 *
 */

// User name (full email address)
$usr = 'user@gmail.com';

// Password. Leave unset for being asked (sorry, with echo - yuck!)
unset($pwd);

// IMAP server
$svr = 'imap.gmail.com';

// IMAP port
$prt = 993;

// Folder / Tag (must exist prior to running this)
//$folder = '[Gmail]/Drafts'; // This one would use the actual GMail drafts folder.
                              // NOTE: must be localized, e.g. in spanish the name
                              // is '[Gmail]/Borradores'
$folder = 'ftp';  // Same tag used by gmailftpd.py

// Encoding for the filename string we are passing from the command line
$filename_enc = 'UTF-8';

/*
 *
 * End of configuration section
 *
 ***********************************/


// Encode the subject in MIME quoted-printable format as per RFC 2047
function encode_subject($title, $encoding)
{
  $ret = "=?$encoding?Q?";
  $linelength = 9; // length of "Subject: "

  for ($i = 0; $i < strlen($title); $i++)
  {
    if ($linelength >= 65)
    {
      $ret .= "?=\r\n =?$encoding?Q?";
      $linelength = 1; // length of initial space
    }

    if ($title[$i] >= ' ' and $title[$i] <= '~'
        and $title[$i] != '=' and $title[$i] != '?'
        and $title[$i] != '_')
    {
      if ($title[$i] == ' ')
        $ret .= '_';
      else
        $ret .= $title[$i];
      $linelength++;
    }
    else
    {
      $ret .= '=' . strtoupper(bin2hex($title[$i]));
      $linelength += 3;
    }
  }
  return $ret . '?=';
}

// quoted-string for parameters as per RFC 822
function quoted_string($string)
{
  return '"' . strtr($string, array("\\"=>"\\\\",
                                    "\""=>"\\\"",
                                    "\r"=>"\\\r",
                                    "\n"=>"\\\n")) . '"';
}

// grab 7-11 random lowercase letters
function randword()
{
  $result = '';
  for ($i = 0; $i < 7 + mt_rand(0, 4); $i++)
    $result .= chr(mt_rand(0, 25) + 97);
  return $result;
}

$filepath = $argv[1];

if (! isset($filepath))
  die("No filename specified to upload\n");

$filename = basename($filepath);

if ($prt == 993) $prt = ''; else $prt = ':' . $prt;

if (! isset($pwd))
{
  echo "pwd: ";
  $pwd = substr(fgets(STDIN), 0, -1); // remove extra \n
}

// Open the imap stream
$stream = imap_open("\x7B$svr$prt/ssl}$folder", $usr, $pwd);

// Report all possible errors
print_r(imap_errors());

// Exit on trouble
if ($stream === false) die("\nStopping\n");

$rnd1 = randword();
$rnd2 = randword();
$bndr = "------------" . randword() . randword();

$result = imap_append(
            $stream
            , "\x7B$svr$prt/ssl}$folder"
            , "From: $usr"
              . "\r\nTo: $rnd1@$rnd2.com"
              . "\r\nSubject: " . encode_subject($filename, $filename_enc)
              . "\r\nMIME-Version: 1.0"
              . "\r\nContent-type: multipart/mixed;"
              . "\r\n boundary=\"$bndr\""
              . "\r\n"
              . "\r\nThis is a multi-part message in MIME format."
              . "\r\n--$bndr"
              . "\r\nContent-Type: text/plain; charset=$filename_enc; format=flowed"
              . "\r\nContent-Transfer-Encoding: quoted-printable"
              . "\r\n"
              . "\r\n" . imap_8bit($filename)
              . "\r\n"
              . "\r\n--$bndr"
              . "\r\nContent-Type: application/octet-stream;"
              . "\r\n name=" . quoted_string($filename)
              . "\r\nContent-Transfer-Encoding: base64"
              . "\r\nContent-Disposition: attachment;"
              . "\r\n filename=" . quoted_string($filename)
              . "\r\n"
              . "\r\n" . imap_binary(file_get_contents($filepath))
              . "--$bndr--"
              . "\r\n"
              . "\r\n"
            );

if ($result)
  echo "Succeeded adding file\n";
else
  print_r(imap_errors());

$check = imap_check($stream);
echo "There are now ". $check->Nmsgs . " messages in the $folder folder\n";

print_r(imap_errors());

print_r(imap_alerts());

imap_close($stream);

?>

Here's a little bonus brainteaser. The generator of random lowercase letters can generate 7 to 11 characters. What's the probability of each length? Hint: it's not 1/5.