I'd like to open the file by estimating that I don't know the character code in Commonlisp.
How do you open a file that you don't know the character code?
I am trying to use a library called Guess after making the file an unsigned-byte 8 vector.They estimate the character code, but they don't estimate newline characters.
Guess is a port of libguess to common lisp and is published on https://github.com/zqwell/guess.
common-lisp
If you don't know the character code, you often do it yourself (vector(unsigned-byte8) before converting it.
So far, I think it's a good idea, but I don't think there's anything that's standard for Common Lisp from now on.
I don't know the character code, so I need to make a decision, but in my case, I used to use onjo's guess, which was the source of the guess that appeared in the questionnaire.
However, the web has become almost a UTF-8 page recently, so I forced myself to do so without judging it.
(or (ignore-errors (babel: octets-to-stringos: encoding:utf-8))
( ignore-errors (babel: octets-to-stringos:encoding:eucjp))
(ignore-errors (babel: octets-to-stringos:encoding:cp932))
There are many cases where it ends at about .
I understand that libguess is compatible with other languages besides Japanese, so I think it is quite easy to use.
By the way, ABCL can use Java libraries, but I have used Java ICU from ABCL.
(In Clojure, ICU may be used relatively.)
There are other versions of the ICU besides Java, but I have never tried this one.
Common Lisp #\Newline can vary from LF to CR+LF depending on your environment.
After converting the encoding, I think there are many things to do.
Corrupted strings are also a problem, but there is no such thing as Ruby's String#scrub in Common Lisp, so I think I will have to make my own.
I'm in the case of SBCL, but
(defunscrub(octets)
(handler-bind(sb-impl::octet-decoding-error)
(lambda(c)
(use-value "="c)))
(sb-ext: octets-to-string octets))
(defvar*broken*
(coerce'(227 129 130 227 129 132 227 129 134 227 129 136 227 129)'(vector(unsigned-byte 8)))
(scrub*broken*)
<=>"Aiue="
I have created and used something like .
© 2024 OneMinuteCode. All rights reserved.