I want to open the file by estimating if I don't know the character code in Commonlisp.

Asked 2 years ago, Updated 2 years ago, 118 views

I'd like to open the file by estimating that I don't know the character code in Commonlisp.
How do you open a file that you don't know the character code?

I am trying to use a library called Guess after making the file an unsigned-byte 8 vector.They estimate the character code, but they don't estimate newline characters.

Guess is a port of libguess to common lisp and is published on https://github.com/zqwell/guess.

common-lisp

2022-09-30 17:02

1 Answers

If you don't know the character code, you often do it yourself (vector(unsigned-byte8) before converting it.

So far, I think it's a good idea, but I don't think there's anything that's standard for Common Lisp from now on.

I don't know the character code, so I need to make a decision, but in my case, I used to use onjo's guess, which was the source of the guess that appeared in the questionnaire.

However, the web has become almost a UTF-8 page recently, so I forced myself to do so without judging it.

 (or (ignore-errors (babel: octets-to-stringos: encoding:utf-8))
    ( ignore-errors (babel: octets-to-stringos:encoding:eucjp))
    (ignore-errors (babel: octets-to-stringos:encoding:cp932))

There are many cases where it ends at about .
I understand that libguess is compatible with other languages besides Japanese, so I think it is quite easy to use.

By the way, ABCL can use Java libraries, but I have used Java ICU from ABCL.
(In Clojure, ICU may be used relatively.)
There are other versions of the ICU besides Java, but I have never tried this one.

Common Lisp #\Newline can vary from LF to CR+LF depending on your environment.
After converting the encoding, I think there are many things to do.

Corrupted strings are also a problem, but there is no such thing as Ruby's String#scrub in Common Lisp, so I think I will have to make my own.
I'm in the case of SBCL, but

 (defunscrub(octets)
  (handler-bind(sb-impl::octet-decoding-error)
                  (lambda(c) 
                    (use-value "="c)))
    (sb-ext: octets-to-string octets))


(defvar*broken*
  (coerce'(227 129 130 227 129 132 227 129 134 227 129 136 227 129)'(vector(unsigned-byte 8)))


(scrub*broken*)
<=>"Aiue="

I have created and used something like .


2022-09-30 17:02

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.