Class: ORB::Tokenizer2
- Inherits:
-
Object
- Object
- ORB::Tokenizer2
- Includes:
- Patterns
- Defined in:
- lib/orb/tokenizer2.rb
Overview
Tokenizer2 is a streaming, non-recursive tokenizer for ORB templates.
It scans the source sequentially and emits tokens as it passes over the input. During scanning, it keeps track of the current state and the list of tokens. Any consumption of the source, either by buffering or skipping moves the cursor. The cursor position is used to keep track of the current line and column in the virtual source document. When tokens are generated, they are annotated with the position they were found in the virtual document.
Constant Summary collapse
- IGNORED_BODY_TAGS =
Tags that should be ignored
%w[script style].freeze
- MAX_BRACE_DEPTH =
Maximum allowed brace nesting depth to prevent memory exhaustion
100- MAX_TEMPLATE_SIZE =
Maximum allowed template source size in bytes (2MB)
2 * 1024 * 1024
- VOID_ELEMENTS =
Tags that are self-closing by HTML5 spec
%w[area base br col command embed hr img input keygen link meta param source track wbr].freeze
Constants included from Patterns
Patterns::ATTRIBUTE_ASSIGN, Patterns::ATTRIBUTE_NAME, Patterns::BLANK, Patterns::BLOCK_CLOSE, Patterns::BLOCK_CONTENT_TEXT, Patterns::BLOCK_NAME_CHARS, Patterns::BLOCK_OPEN, Patterns::BRACE_CLOSE, Patterns::BRACE_OPEN, Patterns::COMMENT_TEXT, Patterns::CONTROL_EXPRESSION_END, Patterns::CONTROL_EXPRESSION_START, Patterns::CR, Patterns::CRLF, Patterns::DOUBLE_QUOTE, Patterns::DOUBLE_QUOTED_TEXT, Patterns::END_TAG_END, Patterns::END_TAG_END_VERBATIM, Patterns::END_TAG_START, Patterns::EXPRESSION_TEXT, Patterns::INITIAL_TEXT, Patterns::NEWLINE, Patterns::OTHER, Patterns::PRINTING_EXPRESSION_END, Patterns::PRINTING_EXPRESSION_START, Patterns::PRIVATE_COMMENT_END, Patterns::PRIVATE_COMMENT_START, Patterns::PUBLIC_COMMENT_END, Patterns::PUBLIC_COMMENT_START, Patterns::SINGLE_QUOTE, Patterns::SINGLE_QUOTED_TEXT, Patterns::SPACE_CHARS, Patterns::SPLAT_ATTRIBUTE, Patterns::SPLAT_EXPRESSION_START, Patterns::START_TAG_END, Patterns::START_TAG_END_SELF_CLOSING, Patterns::START_TAG_END_VERBATIM, Patterns::START_TAG_START, Patterns::TAG_NAME, Patterns::UNQUOTED_VALUE, Patterns::UNQUOTED_VALUE_INVALID_CHARS, Patterns::VERBATIM_TEXT
Instance Attribute Summary collapse
-
#errors ⇒ Object
readonly
Returns the value of attribute errors.
-
#tokens ⇒ Object
readonly
Returns the value of attribute tokens.
Instance Method Summary collapse
-
#initialize(source, options = {}) ⇒ Tokenizer2
constructor
A new instance of Tokenizer2.
-
#tokenize ⇒ Object
(also: #tokenize!)
Main Entry.
Constructor Details
#initialize(source, options = {}) ⇒ Tokenizer2
Returns a new instance of Tokenizer2.
32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
# File 'lib/orb/tokenizer2.rb', line 32 def initialize(source, = {}) @source = StringScanner.new(source) @raise_errors = .fetch(:raise_errors, true) # Streaming Tokenizer State @column = 1 @line = 1 @errors = [] @tokens = [] @attributes = [] @braces = [] @state = :initial @buffer = +'' end |
Instance Attribute Details
#errors ⇒ Object (readonly)
Returns the value of attribute errors.
30 31 32 |
# File 'lib/orb/tokenizer2.rb', line 30 def errors @errors end |
#tokens ⇒ Object (readonly)
Returns the value of attribute tokens.
30 31 32 |
# File 'lib/orb/tokenizer2.rb', line 30 def tokens @tokens end |
Instance Method Details
#tokenize ⇒ Object Also known as: tokenize!
Main Entry
48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/orb/tokenizer2.rb', line 48 def tokenize if @source.string.bytesize > MAX_TEMPLATE_SIZE raise ORB::SyntaxError.new("Template exceeds maximum size (#{MAX_TEMPLATE_SIZE} bytes)", 0) end next_token until @source.eos? # Consume remaining buffer buffer_to_text_token # Return the tokens @tokens end |