[tbnl-devel] File Upload Design Issues

Travis Cross travis at crosswirecorp.com
Wed Sep 1 20:52:08 UTC 2004


Jeff Caldwell wrote:
>  1) What are the relevant RFC's I need to consider?
RFC2388: http://www.faqs.org/rfcs/rfc2388.html - Seems to be the real 
starting point for this.
RFC1867: http://www.faqs.org/rfcs/rfc1867.html - This spec provides more 
examples and additional clarity as compared to the first one.
RFC2045: http://www.faqs.org/rfcs/rfc2045.html
RFC2046: http://www.faqs.org/rfcs/rfc2046.html
RFC2183: http://www.faqs.org/rfcs/rfc2183.html

>  2) What references/URLs/sample code is available for me to learn
>  about the encoding/stream issues?
RFC2047: http://www.faqs.org/rfcs/rfc2047.html
RFC2231: http://www.faqs.org/rfcs/rfc2231.html
For sample code, it might depend on what language you want to read.  PHP 
has been doing file upload handling successfully for quite some time, if 
your brain is inclined to read C code :)
Their implementation is described here: 
http://www.php.net/manual/en/features.file-upload.php
It seems that Webware in Python also supports file uploads, though I 
have not looked closely at their implementation. 
(http://www.webwareforpython.org/)

>  3) General comments on the design. In particular, uploading a file
>  may only the first step. What is to be done with the file?
Most implementations that I've seen do something like this: they take 
the uploaded file, store it in a temporary folder, and pass the 
information (the real location of the file, the form name it was posted 
under, the provided mime type, the size of the file, the client-side 
name for the file, and any errors that occurred) to the scripting 
language interface.  The temporary file is then immediately deleted when 
the request is complete.  A significant security consideration is 
assuring that no user can upload enough data to cause the server to crash.
Since Lisp is far more powerful than most of the languages that I've 
observed (and since a continual process is assured to be running), there 
may be a more effective way possible to do this.  (Brainstorming here) 
Maybe the file could be stored in a temporary file and the interface 
would return a stream to that file.  Maybe the file could just be kept 
in memory and returned as a variable, which would be especially useful 
when one just wants to drop the data into a relational database.

>  3a) There are OS dependencies, as not all OS's support the same file
>  names.
True.  In the temporary file model, I think the solution is just using 
some simple and meaningless but unique string.  As long as the file name 
is passed to the application, it really doesn't matter what the file is 
physically named.

>  3b) The file upload must generate a portable but unique internal name
>  while at the same time retaining the OS-specific name specified by
>  the sender (original name).
Correct.

>  3c) This implies a kind of catalog of files, mapping the original
>  name to the internal name and vice versa. This technique also is
>  necessary to prevent duplicate file names, for example from multiple
>  users.
Other information needs to be mapped as well, including the form field 
information so that files can be distinguished from each other if a form 
supports multiple files in a single post.

>  3d) The user should be able to associate a short and/or long
>  description, and possibly a category or two, with the file. TBNL
>  might not dictate this precisely but the catalog should be flexible
>  in what it holds.
I'm not certain what you mean by this, or what the need is.  It would 
seem that the primary index on which to find a file would be the 'name' 
or the form field as specified in the POST.

>  3e) The catalog needs to be able to associate a file with a user and
>  vice versa. Users aren't a TBNL concept at the moment and I'm not
>  sure adding them just for this is worthwhile. On the other hand,
>  uploading and storing files without the possibility of a sense of
>  ownership doesn't seem robust. Is this really a part of the TBNL
>  core? How can this be handled cleanly?
It would seem that this could be left to the user.  ie, if the uploaded 
file is only stored temporarily, and then the application is expected to 
do something with it before the request is complete (move it somewhere 
else, for example), then the responsibility for file storage would be 
held by the user of the library, probably where it should be.  If the 
uploaded files are held temporarily in memory, then it seems that it 
would become even more of a non-issue.

>  3f) Where and how should the files be stored? ACS and OpenACS
>  (openacs.org), the last time I looked, store uploaded files in a big
>  directory tree with, IIRC, subdirectories named with single letters.
>  The destination subdirectory was chosen by the first 3 or 4 chars of
>  the internally-generated file name. The idea is that a filesystem can
>  have, or can efficiently handle, only so many files per subdirectory.
>  Given an idea of the total number of uploaded files we want to
>  support, we could calculate the depth and number of subdirectories
>  needed.
Is there really a need for any subdirectories?  Does the OpenACS 
approach add any real value to the temporary file or memory model?

>  3g) TBNL-level security is enough and it's OK to mix uploaded files
>  from several users in the same catalog and subdirectories, so long as
>  TBNL keeps things sorted out and, depending upon the application, the
>  programmer is responsible for showing/making available the right file
>  to the right person.
It would seem that TBNL should only present the information for the 
file(s) uploaded as part of the request that the particular dispatched 
thread is handling.

>  I bet I both left something out and put too much in :) I'd appreciate
>  comments on all of the above. In summary, how should the files be
>  stored and how should the catalog be structured? Once I have an idea
>  of how we think this should work, I'm happy to glue it all together.
I think the idea of a catalog might be a layer of functionality above 
what is needed here.  It seems to me that it would be better to 
implement simple file upload support, and then if there is a need or 
added value in a catalog based approach to implement that separately and 
on top of the basic file uploading support.

>  Thanks for your help.
Anytime.

-- Travis




More information about the Tbnl-devel mailing list