Project

General

Profile

Actions

Bug #5637

closed

warnings of shellescape

Added by znz (Kazuhiro NISHIYAMA) about 13 years ago. Updated over 12 years ago.

Status:
Closed
Target version:
ruby -v:
-
Backport:
[ruby-dev:44878]

Description

\あ

Updated by znz (Kazuhiro NISHIYAMA) about 13 years ago

  • ruby -v changed from ruby 2.0.0dev (2011-11-15 trunk 33753) [x86_64-linux] to -

西山和広です。

redmine の方で書くと消えてしまうようなので、メールで書き直します。

Shellwords.shellescape で警告が出ます。

% ./ruby -v -r shellwords -e 'p Shellwords.shellescape("\u3042")'
ruby 2.0.0dev (2011-11-15 trunk 33753) [x86_64-linux]
/home/chkbuild/tmp/build/ruby-trunk/20111114T222552Z/lib/ruby/1.9.1/shellwords.rb:86: warning: regexp match /.../n against to UTF-8 string
/home/chkbuild/tmp/build/ruby-trunk/20111114T222552Z/lib/ruby/1.9.1/shellwords.rb:86: warning: regexp match /.../n against to UTF-8 string
/home/chkbuild/tmp/build/ruby-trunk/20111114T222552Z/lib/ruby/1.9.1/shellwords.rb:86: warning: regexp match /.../n against to UTF-8 string
"\あ"

エスケープ結果も変だと思います。
エスケープ結果を 1.8.7 にあわせるのなら以下のパッチで
どうでしょうか。

diff --git a/lib/shellwords.rb b/lib/shellwords.rb
index 5d6ba75..78331a7 100644
--- a/lib/shellwords.rb
+++ b/lib/shellwords.rb
@@ -79,11 +79,11 @@ module Shellwords
# An empty argument will be skipped, so return empty quotes.
return "''" if str.empty?

  • str = str.dup
  • str = str.dup.force_encoding("ASCII-8BIT")

    Process as a single byte sequence because not all shell

    implementations are multibyte aware.

  • str.gsub!(/([^A-Za-z0-9_\-.,:\/@\n])/n, "\\\1")
  • str.gsub!(/([^A-Za-z0-9_\-.,:\/@\n])/, "\\\1")

    A LF cannot be escaped with a backslash because a backslash + LF

    combo is regarded as line continuation and simply ignored.

diff --git a/test/test_shellwords.rb b/test/test_shellwords.rb
index d48a888..cbc5043 100644
--- a/test/test_shellwords.rb
+++ b/test/test_shellwords.rb
@@ -36,4 +36,8 @@ class TestShellwords < Test::Unit::TestCase
shellwords(bad_cmd)
end
end
+

  • def test_shellescape_utf8_string
  • assert_equal "\\343\\201\\202", shellescape("\u3042")
  • end
    end

--
|ZnZ(ゼット エヌ ゼット)
|西山和広(Kazuhiro NISHIYAMA)

Updated by knu (Akinori MUSHA) about 13 years ago

  • Assignee set to knu (Akinori MUSHA)
Actions #3

Updated by knu (Akinori MUSHA) about 13 years ago

いろいろ考えたんですが、単に //n フラグを削るだけにしようと思います。

・1.8: 一律バイナリとして扱うのは、文字列にencoding情報がなく$KCODEもあてにならないため、やむを得ない仕様だった。(この事情は1.9+には当てはまらない)
・1.9: 1.9.3の今までずっとこの挙動だった。警告はバグ(//nの修正漏れ)として消すが、挙動については非互換を招くので変えない。
・2.0: 文字列の使い道(渡すシェルのlocaleなど)を知っているのは呼出元だけだが、1.9+では呼出元がASCII-8BITも含め適切にencodeすることができるので、shellescapeがそのencodingを尊重する現在の挙動こそ(たまたまだが)望ましく、変える必要はない。

警告の出しようもない(SJISなら云々とかもシェルのlocaleをSJISにするなど分かってやっている場合は害)ので、余計なことはせず、ドキュメントにだけ注記するつもりです。

Actions #4

Updated by knu (Akinori MUSHA) about 13 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r34166.
Kazuhiro, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • lib/shellwords.rb (Shellwords#shellescape): Drop the //n flag
    that only causes warnings with no real effect. [Bug #5637]

Updated by dariocravero (Darío Cravero) over 12 years ago

Hi,

Thanks for this patch!.. :)

One question though, from comment #3 it's not clear if it's safe to use it in 1.9.3. This is what Google Translator gave me:

"1.9: this behavior was all the way to 1.9.3 now. Turn off warning but does not change as a bug (missing fix of / / n), because the behavior leads to incompatibility."

However, I've applied it and, as expected, I don't see the warning anymore. Still, can you just confirm there're no side effects to this on 1.9.3?

Thanks a million!..

Updated by knu (Akinori MUSHA) over 12 years ago

As I documented, it's all up to how you use the resulted string.

If you are going to pass it to a shell that lacks support for the encoding of the string, then you should probably encode the original string in ASCII-8BIT before shell-escaping with shellescape() to get a byte-by-byte escape to make sure the shell won't find a metacharacter inside a multibyte character.

UTF-8 multibyte characters do not contain any ASCII character by design anyway, so most people in the everything-is-UTF-8 world don't even have to care about this.

But, for example, when you have to run a program passing a Shift_JIS string via a shell under a non-Shift_JIS locale, you'd probably have to compose the command line in the ASCII-8BIT encoding so that all shell metacharacters that may appear in Shift_JIS multibyte characters are properly escaped.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0