Bug #5637
closedwarnings of shellescape
Description
\あ
Updated by znz (Kazuhiro NISHIYAMA) about 13 years ago
- ruby -v changed from ruby 2.0.0dev (2011-11-15 trunk 33753) [x86_64-linux] to -
西山和広です。
redmine の方で書くと消えてしまうようなので、メールで書き直します。
Shellwords.shellescape で警告が出ます。
% ./ruby -v -r shellwords -e 'p Shellwords.shellescape("\u3042")'
ruby 2.0.0dev (2011-11-15 trunk 33753) [x86_64-linux]
/home/chkbuild/tmp/build/ruby-trunk/20111114T222552Z/lib/ruby/1.9.1/shellwords.rb:86: warning: regexp match /.../n against to UTF-8 string
/home/chkbuild/tmp/build/ruby-trunk/20111114T222552Z/lib/ruby/1.9.1/shellwords.rb:86: warning: regexp match /.../n against to UTF-8 string
/home/chkbuild/tmp/build/ruby-trunk/20111114T222552Z/lib/ruby/1.9.1/shellwords.rb:86: warning: regexp match /.../n against to UTF-8 string
"\あ"
エスケープ結果も変だと思います。
エスケープ結果を 1.8.7 にあわせるのなら以下のパッチで
どうでしょうか。
diff --git a/lib/shellwords.rb b/lib/shellwords.rb
index 5d6ba75..78331a7 100644
--- a/lib/shellwords.rb
+++ b/lib/shellwords.rb
@@ -79,11 +79,11 @@ module Shellwords
# An empty argument will be skipped, so return empty quotes.
return "''" if str.empty?
- str = str.dup
-
str = str.dup.force_encoding("ASCII-8BIT")
Process as a single byte sequence because not all shell¶
implementations are multibyte aware.¶
- str.gsub!(/([^A-Za-z0-9_\-.,:\/@\n])/n, "\\\1")
-
str.gsub!(/([^A-Za-z0-9_\-.,:\/@\n])/, "\\\1")
A LF cannot be escaped with a backslash because a backslash + LF¶
combo is regarded as line continuation and simply ignored.¶
diff --git a/test/test_shellwords.rb b/test/test_shellwords.rb
index d48a888..cbc5043 100644
--- a/test/test_shellwords.rb
+++ b/test/test_shellwords.rb
@@ -36,4 +36,8 @@ class TestShellwords < Test::Unit::TestCase
shellwords(bad_cmd)
end
end
+
- def test_shellescape_utf8_string
- assert_equal "\\343\\201\\202", shellescape("\u3042")
- end
end
--
|ZnZ(ゼット エヌ ゼット)
|西山和広(Kazuhiro NISHIYAMA)
Updated by knu (Akinori MUSHA) about 13 years ago
- Assignee set to knu (Akinori MUSHA)
Updated by knu (Akinori MUSHA) almost 13 years ago
いろいろ考えたんですが、単に //n フラグを削るだけにしようと思います。
・1.8: 一律バイナリとして扱うのは、文字列にencoding情報がなく$KCODEもあてにならないため、やむを得ない仕様だった。(この事情は1.9+には当てはまらない)
・1.9: 1.9.3の今までずっとこの挙動だった。警告はバグ(//nの修正漏れ)として消すが、挙動については非互換を招くので変えない。
・2.0: 文字列の使い道(渡すシェルのlocaleなど)を知っているのは呼出元だけだが、1.9+では呼出元がASCII-8BITも含め適切にencodeすることができるので、shellescapeがそのencodingを尊重する現在の挙動こそ(たまたまだが)望ましく、変える必要はない。
警告の出しようもない(SJISなら云々とかもシェルのlocaleをSJISにするなど分かってやっている場合は害)ので、余計なことはせず、ドキュメントにだけ注記するつもりです。
Updated by knu (Akinori MUSHA) almost 13 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
This issue was solved with changeset r34166.
Kazuhiro, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
- lib/shellwords.rb (Shellwords#shellescape): Drop the //n flag
that only causes warnings with no real effect. [Bug #5637]
Updated by dariocravero (Darío Cravero) over 12 years ago
Hi,
Thanks for this patch!.. :)
One question though, from comment #3 it's not clear if it's safe to use it in 1.9.3. This is what Google Translator gave me:
"1.9: this behavior was all the way to 1.9.3 now. Turn off warning but does not change as a bug (missing fix of / / n), because the behavior leads to incompatibility."
However, I've applied it and, as expected, I don't see the warning anymore. Still, can you just confirm there're no side effects to this on 1.9.3?
Thanks a million!..
Updated by knu (Akinori MUSHA) over 12 years ago
As I documented, it's all up to how you use the resulted string.
If you are going to pass it to a shell that lacks support for the encoding of the string, then you should probably encode the original string in ASCII-8BIT before shell-escaping with shellescape() to get a byte-by-byte escape to make sure the shell won't find a metacharacter inside a multibyte character.
UTF-8 multibyte characters do not contain any ASCII character by design anyway, so most people in the everything-is-UTF-8 world don't even have to care about this.
But, for example, when you have to run a program passing a Shift_JIS string via a shell under a non-Shift_JIS locale, you'd probably have to compose the command line in the ASCII-8BIT encoding so that all shell metacharacters that may appear in Shift_JIS multibyte characters are properly escaped.