js

package
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 2, 2015 License: MIT Imports: 4 Imported by: 0

README

JS GoDoc GoCover

This package is a JS lexer (ECMA-262, edition 6.0) written in Go. It follows the specification at ECMAScript Language Specification. The lexer takes an io.Reader and converts it into tokens until the EOF.

Installation

Run the following command

go get github.com/tdewolff/parse/js

or add the following import and run project with go get

import "github.com/tdewolff/parse/js"

Lexer

Usage

The following initializes a new Lexer with io.Reader r:

l := js.NewLexer(r)

To tokenize until EOF an error, use:

for {
	tt, text := l.Next()
	switch tt {
	case js.ErrorToken:
		// error or EOF set in l.Err()
		return
	// ...
	}
}

All tokens (see ECMAScript Language Specification):

ErrorToken          TokenType = iota // extra token when errors occur
UnknownToken                         // extra token when no token can be matched
WhitespaceToken                      // space \t \v \f
LineTerminatorToken                  // \r \n \r\n
CommentToken
IdentifierToken // also: null true false
PunctuatorToken /* { } ( ) [ ] . ; , < > <= >= == != === !==  + - * % ++ -- << >>
   >>> & | ^ ! ~ && || ? : = += -= *= %= <<= >>= >>>= &= |= ^= / /= => */
NumericToken
StringToken
RegexpToken
TemplateToken
Quirks

Because the ECMAScript specification for PunctuatorToken (of which the / and /= symbols) and RegexpToken depends on a parser state to differentiate between the two, the lexer (to remain modular) uses different rules. Whenever / is encountered and the previous token is one of (,=:[!&|?{};, it returns a RegexpToken, otherwise it returns a PunctuatorToken. This is the same rule JSLint appears to use.

Examples
package main

import (
	"os"

	"github.com/tdewolff/parse/js"
)

// Tokenize JS from stdin.
func main() {
	l := js.NewLexer(os.Stdin)
	for {
		tt, text := l.Next()
		switch tt {
		case js.ErrorToken:
			if l.Err() != io.EOF {
				fmt.Println("Error on line", l.Line(), ":", l.Err())
			}
			return
		case js.IdentifierToken:
			fmt.Println("Identifier", string(text))
		case js.NumericToken:
			fmt.Println("Numeric", string(text))
		// ...
		}
	}
}

License

Released under the MIT license.

Documentation

Overview

Package js is an ECMAScript5.1 lexer following the specifications at http://www.ecma-international.org/ecma-262/5.1/.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Hash

type Hash uint32

uses github.com/tdewolff/hasher

const (
	Break      Hash = 0x5
	Case       Hash = 0x3404
	Catch      Hash = 0xba05
	Class      Hash = 0x505
	Const      Hash = 0x2c05
	Continue   Hash = 0x3e08
	Debugger   Hash = 0x8408
	Default    Hash = 0xab07
	Delete     Hash = 0xcd06
	Do         Hash = 0x4c02
	Else       Hash = 0x3704
	Enum       Hash = 0x3a04
	Export     Hash = 0x1806
	Extends    Hash = 0x4507
	False      Hash = 0x5a05
	Finally    Hash = 0x7a07
	For        Hash = 0xc403
	Function   Hash = 0x4e08
	If         Hash = 0x5902
	Implements Hash = 0x5f0a
	Import     Hash = 0x6906
	In         Hash = 0x4202
	Instanceof Hash = 0x710a
	Interface  Hash = 0x8c09
	Let        Hash = 0xcf03
	New        Hash = 0x1203
	Null       Hash = 0x5504
	Package    Hash = 0x9507
	Private    Hash = 0x9c07
	Protected  Hash = 0xa309
	Public     Hash = 0xb506
	Return     Hash = 0xd06
	Static     Hash = 0x2f06
	Super      Hash = 0x905
	Switch     Hash = 0x2606
	This       Hash = 0x2304
	Throw      Hash = 0x1d05
	True       Hash = 0xb104
	Try        Hash = 0x6e03
	Typeof     Hash = 0xbf06
	Var        Hash = 0xc703
	Void       Hash = 0xca04
	While      Hash = 0x1405
	With       Hash = 0x2104
	Yield      Hash = 0x8005
)

func ToHash

func ToHash(s []byte) Hash

ToHash returns the hash whose name is s. It returns zero if there is no such hash. It is case sensitive.

func (Hash) String

func (i Hash) String() string

String returns the hash' name.

type Lexer

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer is the state for the lexer.

func NewLexer

func NewLexer(r io.Reader) *Lexer

NewLexer returns a new Lexer for a given io.Reader.

Example
l := NewLexer(bytes.NewBufferString("var x = 'lorem ipsum';"))
out := ""
for {
	tt, data, n := l.Next()
	if tt == ErrorToken {
		break
	}
	out += string(data)
	l.Free(n)
}
fmt.Println(out)
Output:

var x = 'lorem ipsum';

func (*Lexer) Err

func (l *Lexer) Err() error

Err returns the error encountered during lexing, this is often io.EOF but also other errors can be returned.

func (*Lexer) Free added in v1.1.0

func (l *Lexer) Free(n int)

Free frees up bytes of length n from previously shifted tokens.

func (*Lexer) Next

func (l *Lexer) Next() (TokenType, []byte, int)

Next returns the next Token. It returns ErrorToken when an error was encountered. Using Err() one can retrieve the error message.

type TokenType

type TokenType uint32

TokenType determines the type of token, eg. a number or a semicolon.

const (
	ErrorToken          TokenType = iota // extra token when errors occur
	UnknownToken                         // extra token when no token can be matched
	WhitespaceToken                      // space \t \v \f
	LineTerminatorToken                  // \r \n \r\n
	CommentToken
	IdentifierToken
	PunctuatorToken /* { } ( ) [ ] . ; , < > <= >= == != === !==  + - * % ++ -- << >>
	   >>> & | ^ ! ~ && || ? : = += -= *= %= <<= >>= >>>= &= |= ^= / /= >= */
	NumericToken
	StringToken
	RegexpToken
	TemplateToken
)

TokenType values.

func (TokenType) String

func (tt TokenType) String() string

String returns the string representation of a TokenType.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL
JackTT - Gopher 🇻🇳