[Git][cmucl/cmucl][issue-139-set-filename-encoding-to-utf8] 4 commits: Convert runtime strings to utf-16

Raymond Toy (@rtoy) gitlab at common-lisp.net
Sun Dec 11 00:26:44 UTC 2022



Raymond Toy pushed to branch issue-139-set-filename-encoding-to-utf8 at cmucl / cmucl


Commits:
689db03d by Raymond Toy at 2022-12-10T11:45:53-08:00
Convert runtime strings to utf-16

The C runtime initializes several variables from either the
environment or the command line.  These are, of course, encoded in the
locale, so we need to convert these strings into utf-16 format.  Add
`decode-runtime-strings` to do just that.

This problem showed up when installing cmucl in /tmp/αβ and trying to
run lisp.  We get an error when printing the herald for the core file
that is being used because we call truename on the path.  Since the
path is encoded, we can't find the file.  We need to convert the path
to a utf-16 string that we can use.

- - - - -
6720015c by Raymond Toy at 2022-12-10T11:55:03-08:00
Remove lisp:: package specifer

We're in the lisp package, so we don't need the prefix.

- - - - -
234d86a7 by Raymond Toy at 2022-12-10T16:03:53-08:00
Handle decode-runtime-strings and environment more carefully.

`decode-runtime-strings` needs to be more careful in converting the
strings from the C runtime.  For the command line parameters and the
environment list, we need to use the locale when converting to Lisp
strings.  But `*cmucl-lib*`, `*cmucl-core-path*`, and `*unidata-path*`
needs to use the file encoding to convert the C result to Lisp.

We also need to call `environment-init` after calling
`decode-runtime-strings` because the strings could have changed.  This
important for the "library:" search-list which contains file names.
Without this, the search-list is mangled so we can't find anything.

Finally, call `intl::setlocale` after the `environment-init` has been
called again because `intl::setlocale` needs the paths to the pot
files so the "library:" search-list has to be valid pathnames.
Previously, translations could be accessed early and error out because
the paths were not correct.

This was testing by installing in "/tmp/αβ" and running lisp.  Before
these changes, we errored out printing out the path to the core file
because the path to the core file was not properly decoded.  Now it
works.

Then we tried `(set-system-external-format :euc-kr)`.  We can find the
format implementation file correctly.

We also tested with
```
LANG=ko_KR.EUC_KR bin/lisp -noinit
```
This correctly loads up the euc-kr file and sets the external format.
There are no errors.  (But of course the printed path for the core
file is wrong because euc-kr can't handle the greek letters.

- - - - -
f43c6517 by Raymond Toy at 2022-12-10T16:26:21-08:00
Add some comments

- - - - -


2 changed files:

- src/code/extfmts.lisp
- src/code/save.lisp


Changes:

=====================================
src/code/extfmts.lisp
=====================================
@@ -493,7 +493,7 @@
 	     ;; encoding to NIL because we don't need any special
 	     ;; encoding to open the format files.
 	     (let* ((*print-readably* nil)
-		    (unix::*filename-encoding* nil)
+		    ;;(unix::*filename-encoding* nil)
 		    (*package* (find-package "STREAM"))
 		    (lisp::*enable-package-locked-errors* nil)
 		    (s (open (format nil "ext-formats:~(~A~).lisp" name)


=====================================
src/code/save.lisp
=====================================
@@ -164,20 +164,48 @@
 		 *default-external-format*))))
   (values))
 
- 
+(defun decode-runtime-strings (locale file-locale)
+  ;; The C runtime can initialize the following strings from the
+  ;; command line or the environment.  We need to decode these into
+  ;; the utf-16 strings that Lisp uses.
+  (setf lisp-command-line-list
+	(mapcar #'(lambda (s)
+		    (stream:string-decode s locale))
+		lisp-command-line-list))
+  (setf lisp-environment-list
+	(mapcar #'(lambda (s)
+		    (stream:string-decode s locale))
+		lisp-environment-list))
+  ;; This needs more work..  *cmucl-lib* could be set from the the envvar
+  ;; "CMUCLLIB" or from the "-lib" command-line option, and thus
+  ;; should use the LOCALE to decode the string.
+  (when *cmucl-lib*
+    (setf *cmucl-lib*
+	  (stream:string-decode *cmucl-lib* file-locale)))
+  ;; This also needs more work since the core path could come from the
+  ;; "-core" command-line option and should thus use LOCALE to decode
+  ;; the string.  It could also come from the "CMUCLCORE" envvar.
+  (setf *cmucl-core-path*
+	(stream:string-decode *cmucl-core-path* file-locale))
+  ;; *unidata-path* defaults to a pathname object, but the user can
+  ;; specify a path, so we need to decode the string path if given.
+  (when (and *unidata-path* (stringp *unidata-path*))
+    (setf *unidata-path*
+	  (stream:string-decode *unidata-path* file-locale))))
+
 (defun save-lisp (core-file-name &key
-				 (purify t)
-				 (root-structures ())
-				 (environment-name "Auxiliary")
-				 (init-function #'%top-level)
-				 (load-init-file t)
-				 (site-init "library:site-init")
-				 (print-herald t)
-				 (process-command-line t)
-		                  #+:executable
-		                 (executable nil)
-				 (batch-mode nil)
-				 (quiet nil))
+				   (purify t)
+				   (root-structures ())
+				   (environment-name "Auxiliary")
+				   (init-function #'%top-level)
+				   (load-init-file t)
+				   (site-init "library:site-init")
+				   (print-herald t)
+				   (process-command-line t)
+		                   #+:executable
+		                   (executable nil)
+				   (batch-mode nil)
+				   (quiet nil))
   "Saves a CMU Common Lisp core image in the file of the specified name.  The
   following keywords are defined:
   
@@ -278,13 +306,18 @@
 	     ;; Load external format aliases now so we can aliases to
 	     ;; specify the external format.
 	     (stream::load-external-format-aliases)
-	     ;; Set the locale for lisp
-	     (intl::setlocale)
 	     ;; Set up :locale format
 	     (set-up-locale-external-format)
 	     ;; Set terminal encodings to :locale and filename encoding to :utf-8.
 	     ;; (This needs more work on Darwin.)
 	     (set-system-external-format :locale :utf-8)
+	     (decode-runtime-strings :locale :utf-8)
+	     ;; Need to reinitialize the environment again because
+	     ;; we've possibly changed the environment variables and
+	     ;; pathnames.
+	     (environment-init)
+	     ;; Set the locale for lisp
+	     (intl::setlocale)
 	     (ext::process-command-strings process-command-line)
 	     (setf *editor-lisp-p* nil)
 	     (macrolet ((find-switch (name)
@@ -340,14 +373,14 @@
 	 (unix:unix-exit
 	  (catch '%end-of-the-world
 	    (unwind-protect
-		(if *batch-mode*
-		    (handler-case
-			(%restart-lisp)
-		      (error (cond)
-			(format *error-output* (intl:gettext "Error in batch processing:~%~A~%")
-				cond)
-			(throw '%end-of-the-world 1)))
-		    (%restart-lisp))
+		 (if *batch-mode*
+		     (handler-case
+			 (%restart-lisp)
+		       (error (cond)
+			 (format *error-output* (intl:gettext "Error in batch processing:~%~A~%")
+				 cond)
+			 (throw '%end-of-the-world 1)))
+		     (%restart-lisp))
 	      (finish-standard-output-streams))))))
 
     ;; Record dump time and host
@@ -357,7 +390,7 @@
     (let ((initial-function (get-lisp-obj-address #'restart-lisp))
 	  (core-name (unix-namestring core-file-name nil)))
       (without-gcing
-	  #+:executable
+	#+:executable
 	(if executable
 	    (save-executable core-name initial-function)
 	    (save core-name initial-function #+sse2 1 #-sse2 0))



View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/354f94f5be60e66e139a09d8dd3e5209f70696fb...f43c65171c4cad7b432827f4946273ae19b44a30

-- 
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/354f94f5be60e66e139a09d8dd3e5209f70696fb...f43c65171c4cad7b432827f4946273ae19b44a30
You're receiving this email because of your account on gitlab.common-lisp.net.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/cmucl-cvs/attachments/20221211/559787d5/attachment-0001.html>


More information about the cmucl-cvs mailing list