[cffi-devel] a thought on string encodings

James Bielman jamesjb at jamesjb.com
Mon Jan 2 11:47:09 UTC 2006


On Thu, 2005-12-22 at 18:50 +0100, Hoehle, Joerg-Cyril wrote:

> I hope encoding stuff will be the next great addition to CFFI. Here's some vague idea I once had.
> I got the impression that there are (at least) two types of functions:
>  - one where the conversion depends on whatever dynamic calling    context
>  - another where the conversion is fixed, i.e. depends on the function only, not on the caller (but possibly on the library).
> 
> Given CFFI's post transformers, I suspect that there's an opportunity to model both kinds of functions, i.e.
>  - some where defcfun expands to defaults of custom:*foreign-encoding* (in CLISP speak)
>  - some where the wrappers within defcfun impose a given encoding, e.g. ASCII, ISO-8859-1, UTF-8 or UTF16.

Hi Jörg,

I've started thinking about this.  To demonstrate the new type
translator interface, I'm working on (to begin with), a UTF8-STRING type
which converts Lisp strings to/from UTF-8 on Unicode Lisps.

I want to implement this efficiently in CLISP, so I want to be sure I
use optimized C primitives as much as possible.  I think I have a fairly
efficient method for conversion to a foreign string:

#+clisp
(defmethod translate-to-foreign ((s string) (name (eql 'utf8-string)))
  (ffi:with-foreign-string (ptr chars bytes s :encoding charset:utf-8)
    (declare (ignore chars))
    (let ((buf (foreign-alloc :unsigned-char :count bytes)))
      (memcpy buf ptr bytes)
      (values buf t))))

(where memcpy just calls the C function of the same name)

I didn't see any interface in CLISP to convert a Lisp string to a
pointer that didn't stack-allocate, but this should still be pretty
fast.  (Does the CLISP FFI provide something like memcpy?)

However, I haven't been able to find an inverse for
FFI:WITH-FOREIGN-STRING.  I'd like to be able to convert a pointer back
to a Lisp string without looping in bytecode to create a vector of
octets from the pointer.

So, I think I need that block interface we've talked about.  I tried a
whole bunch of combinations of FFI:MEMORY-AS with FFI:C-ARRAY-PTR types
and got nothing but segfaults.  Is there something I can use to convert
the pointer to either a vector of octets (which I can pass to
EXT:CONVERT-STRING-FROM-BYTES, or to a Lisp string directly?

Thanks,
James





More information about the cffi-devel mailing list