Project

General

Profile

Bug #5709 ยป 0001-Improve-Regexp-documentation.patch

sdaubert (Sylvain Daubert), 12/05/2011 02:26 AM

View differences:

doc/re.rdoc
24 24
Specifically, <tt>/st/</tt> requires that the string contains the letter
25 25
_s_ followed by the letter _t_, so it matches _haystack_, also.
26 26

  
27
== <tt>=~</tt> and Regexp#match
28

  
29
Pattern matching may be achieved by using <tt>=~</tt> operator or Regexp#match
30
method.
31

  
32
=== <tt>=~</tt> operator
33
<tt>=~</tt> is Ruby's basic pattern-matching operator.
34
One operand must be a regular expression and one must be a string (this
35
operator is equivalently defined by Regexp and String). If a match is found,
36
the operator returns index of first match in string, otherwise it returns
37
+nil+.
38

  
39
    /hay/ =~ 'haystack'   #=> 0
40
    /a/   =~ 'haystack'   #=> 1
41
    /u/   =~ 'haystack'   #=> nil
42

  
43
Using <tt>=~</tt> operator sets <tt>$~</tt> global variable after a successful
44
match. <tt>$~</tt> holds a MatchData object. Regexp.last_match is equivalent to
45
<tt>$~</tt>.
46

  
47
=== Regexp#match method
48

  
49
#match method return a MatchData object :
50

  
51
    /st/.match('haystack')   #=> #<MatchData "st">
52

  
27 53
== Metacharacters and Escapes
28 54

  
29 55
The following are <i>metacharacters</i> <tt>(</tt>, <tt>)</tt>,
......
111 137
* <tt>/[[:print:]]/</tt> - Like [:graph:], but includes the space character
112 138
* <tt>/[[:punct:]]/</tt> - Punctuation character
113 139
* <tt>/[[:space:]]/</tt> - Whitespace character (<tt>[:blank:]</tt>, newline,
114
   carriage return, etc.)
140
  carriage return, etc.)
115 141
* <tt>/[[:upper:]]/</tt> - Uppercase alphabetical
116 142
* <tt>/[[:xdigit:]]/</tt> - Digit allowed in a hexadecimal number (i.e.,
117 143
  0-9a-fA-F)
......
169 195
Parentheses can be used for <i>capturing</i>. The text enclosed by the
170 196
<i>n</i><sup>th</sup> group of parentheses can be subsequently referred to
171 197
with <i>n</i>. Within a pattern use the <i>backreference</i>
172
<tt>\</tt><i>n</i>; outside of the pattern use
198
<tt>\n</tt>; outside of the pattern use
173 199
<tt>MatchData[</tt><i>n</i><tt>]</tt>.
174 200

  
175 201
    # 'at' is captured by the first group of parentheses, then referred to
......
473 499
    /a(?i:b)c/.match('aBc') #=> #<MatchData "aBc">
474 500
    /a(?i:b)c/.match('abc') #=> #<MatchData "abc">
475 501

  
502
Options may also be used with <tt>Regexp.new</tt>:
503
    Regexp.new("abc", Regexp::IGNORECASE)                     #=> /abc/i
504
    Regexp.new("abc", Regexp::MULTILINE)                      #=> /abc/m
505
    Regexp.new("abc # Comment", Regexp::EXTENDED)             #=> /abc # Comment/x
506
    Regexp.new("abc", Regexp::IGNORECASE | Regexp::MULTILINE) #=> /abc/mi
507

  
476 508
== Free-Spacing Mode and Comments
477 509

  
478 510
As mentioned above, the <tt>x</tt> option enables <i>free-spacing</i>
......
525 557
       #=> Encoding::CompatibilityError: incompatible encoding regexp match
526 558
            (ISO-8859-1 regexp with UTF-8 string)
527 559

  
560
== Special global variables
561

  
562
Pattern matching sets some global variables :
563
* <tt>$~</tt> is equivalent to Regexp.last_match;
564
* <tt>$&</tt> contains the complete matched text;
565
* <tt>$`</tt> contains string before match;
566
* <tt>$'</tt> contains string after match;
567
* <tt>$1</tt>, <tt>$2</tt> and so on contain text matching first, second, etc
568
  capture group;
569
* <tt>$+</tt> contains last capture group.
570

  
571
Example:
572
    m = /s(\w{2}).*(c)/.match('haystack') #=> #<MatchData "stac" 1:"ta" 2:"c">
573
    $~                                    #=> #<MatchData "stac" 1:"ta" 2:"c">
574
    Regexp.latch_match                    #=> #<MatchData "stac" 1:"ta" 2:"c">
575
    
576
    $&      #=> "stac"
577
            # same as m[0]
578
    $`      #=> "hay"
579
            # same as m.pre_match
580
    $'      #=> "k"
581
            # same as m.post_match
582
    $1      #=> "ta"
583
            # same as m[1]
584
    $2      #=> "c"
585
            # same as m[2]
586
    $3      #=> nil
587
            # no third group in pattern
588
    $+      #=> "c"
589
            # same as m[-1]
590

  
591
These global variables are thread-local and method-local varaibles.
592

  
528 593
== Performance
529 594

  
530 595
Certain pathological combinations of constructs can lead to abysmally bad
531
-