Added by znz (Kazuhiro NISHIYAMA) over 10 years ago. Updated almost 10 years ago.

Updated by znz (Kazuhiro NISHIYAMA) over 10 years ago

  • ruby -v changed from ruby 2.0.0dev (2011-11-15 trunk 33753) [x86_64-linux] to -


redmine の方で書くと消えてしまうようなので、メールで書き直します。

Shellwords.shellescape で警告が出ます。

% ./ruby -v -r shellwords -e 'p Shellwords.shellescape("\u3042")'
ruby 2.0.0dev (2011-11-15 trunk 33753) [x86_64-linux]
/home/chkbuild/tmp/build/ruby-trunk/20111114T222552Z/lib/ruby/1.9.1/shellwords.rb:86: warning: regexp match /.../n against to UTF-8 string
/home/chkbuild/tmp/build/ruby-trunk/20111114T222552Z/lib/ruby/1.9.1/shellwords.rb:86: warning: regexp match /.../n against to UTF-8 string
/home/chkbuild/tmp/build/ruby-trunk/20111114T222552Z/lib/ruby/1.9.1/shellwords.rb:86: warning: regexp match /.../n against to UTF-8 string

エスケープ結果を 1.8.7 にあわせるのなら以下のパッチで

diff --git a/lib/shellwords.rb b/lib/shellwords.rb
index 5d6ba75..78331a7 100644
--- a/lib/shellwords.rb
+++ b/lib/shellwords.rb
@@ -79,11 +79,11 @@ module Shellwords
# An empty argument will be skipped, so return empty quotes.
return "''" if str.empty?

  • str = str.dup
  • str = str.dup.force_encoding("ASCII-8BIT")

    Process as a single byte sequence because not all shell

    implementations are multibyte aware.

  • str.gsub!(/([^A-Za-z0-9_\-.,:\/@\n])/n, "\\\1")
  • str.gsub!(/([^A-Za-z0-9_\-.,:\/@\n])/, "\\\1")

    A LF cannot be escaped with a backslash because a backslash + LF

    combo is regarded as line continuation and simply ignored.

diff --git a/test/test_shellwords.rb b/test/test_shellwords.rb
index d48a888..cbc5043 100644
--- a/test/test_shellwords.rb
+++ b/test/test_shellwords.rb
@@ -36,4 +36,8 @@ class TestShellwords < Test::Unit::TestCase

  • def test_shellescape_utf8_string
  • assert_equal "\\343\\201\\202", shellescape("\u3042")
  • end

|ZnZ(ゼット エヌ ゼット)
|西山和広(Kazuhiro NISHIYAMA)

Updated by knu (Akinori MUSHA) over 10 years ago

Updated by knu (Akinori MUSHA) over 10 years ago

いろいろ考えたんですが、単に //n フラグを削るだけにしようと思います。

・1.8: 一律バイナリとして扱うのは、文字列にencoding情報がなく$KCODEもあてにならないため、やむを得ない仕様だった。(この事情は1.9+には当てはまらない)
・1.9: 1.9.3の今までずっとこの挙動だった。警告はバグ(//nの修正漏れ)として消すが、挙動については非互換を招くので変えない。
・2.0: 文字列の使い道(渡すシェルのlocaleなど)を知っているのは呼出元だけだが、1.9+では呼出元がASCII-8BITも含め適切にencodeすることができるので、shellescapeがそのencodingを尊重する現在の挙動こそ(たまたまだが)望ましく、変える必要はない。


Updated by knu (Akinori MUSHA) over 10 years ago

This issue was solved with changeset r34166.
Kazuhiro, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

  • lib/shellwords.rb (Shellwords#shellescape): Drop the //n flag
    that only causes warnings with no real effect. [Bug #5637]

Updated by dariocravero (Darío Cravero) almost 10 years ago


Thanks for this patch!.. :)

One question though, from comment #3 it's not clear if it's safe to use it in 1.9.3. This is what Google Translator gave me:

"1.9: this behavior was all the way to 1.9.3 now. Turn off warning but does not change as a bug (missing fix of / / n), because the behavior leads to incompatibility."

However, I've applied it and, as expected, I don't see the warning anymore. Still, can you just confirm there're no side effects to this on 1.9.3?

Thanks a million!..

Updated by knu (Akinori MUSHA) almost 10 years ago

As I documented, it's all up to how you use the resulted string.

If you are going to pass it to a shell that lacks support for the encoding of the string, then you should probably encode the original string in ASCII-8BIT before shell-escaping with shellescape() to get a byte-by-byte escape to make sure the shell won't find a metacharacter inside a multibyte character.

UTF-8 multibyte characters do not contain any ASCII character by design anyway, so most people in the everything-is-UTF-8 world don't even have to care about this.

But, for example, when you have to run a program passing a Shift_JIS string via a shell under a non-Shift_JIS locale, you'd probably have to compose the command line in the ASCII-8BIT encoding so that all shell metacharacters that may appear in Shift_JIS multibyte characters are properly escaped.


