package scanner
Import Path
text/scanner (on go.dev)
Dependency Relation
imports 6 packages, and imported by one package
Involved Source Files
Package scanner provides a scanner and tokenizer for UTF-8-encoded text.
It takes an io.Reader providing the source, which then can be tokenized
through repeated calls to the Scan function. For compatibility with
existing tools, the NUL character is not allowed. If the first character
in the source is a UTF-8 encoded byte order mark (BOM), it is discarded.
By default, a [Scanner] skips white space and Go comments and recognizes all
literals as defined by the Go language specification. It may be
customized to recognize only a subset of those literals and to recognize
different identifier and white space characters.
Code Examples
package main
import (
"fmt"
"strings"
"text/scanner"
)
func main() {
const src = `
// This is scanned code.
if a > 10 {
someParsable = text
}`
var s scanner.Scanner
s.Init(strings.NewReader(src))
s.Filename = "example"
for tok := s.Scan(); tok != scanner.EOF; tok = s.Scan() {
fmt.Printf("%s: %s\n", s.Position, s.TokenText())
}
}
package main
import (
"fmt"
"strings"
"text/scanner"
"unicode"
)
func main() {
const src = "%var1 var2%"
var s scanner.Scanner
s.Init(strings.NewReader(src))
s.Filename = "default"
for tok := s.Scan(); tok != scanner.EOF; tok = s.Scan() {
fmt.Printf("%s: %s\n", s.Position, s.TokenText())
}
fmt.Println()
s.Init(strings.NewReader(src))
s.Filename = "percent"
// treat leading '%' as part of an identifier
s.IsIdentRune = func(ch rune, i int) bool {
return ch == '%' && i == 0 || unicode.IsLetter(ch) || unicode.IsDigit(ch) && i > 0
}
for tok := s.Scan(); tok != scanner.EOF; tok = s.Scan() {
fmt.Printf("%s: %s\n", s.Position, s.TokenText())
}
}
package main
import (
"fmt"
"strings"
"text/scanner"
)
func main() {
const src = `
// Comment begins at column 5.
This line should not be included in the output.
/*
This multiline comment
should be extracted in
its entirety.
*/
`
var s scanner.Scanner
s.Init(strings.NewReader(src))
s.Filename = "comments"
s.Mode ^= scanner.SkipComments // don't skip comments
for tok := s.Scan(); tok != scanner.EOF; tok = s.Scan() {
txt := s.TokenText()
if strings.HasPrefix(txt, "//") || strings.HasPrefix(txt, "/*") {
fmt.Printf("%s: %s\n", s.Position, txt)
}
}
}
package main
import (
"fmt"
"strings"
"text/scanner"
)
func main() {
// tab-separated values
const src = `aa ab ac ad
ba bb bc bd
ca cb cc cd
da db dc dd`
var (
col, row int
s scanner.Scanner
tsv [4][4]string // large enough for example above
)
s.Init(strings.NewReader(src))
s.Whitespace ^= 1<<'\t' | 1<<'\n' // don't skip tabs and new lines
for tok := s.Scan(); tok != scanner.EOF; tok = s.Scan() {
switch tok {
case '\n':
row++
col = 0
case '\t':
col++
default:
tsv[row][col] = s.TokenText()
}
}
fmt.Print(tsv)
}
Package-Level Type Names (total 2)
Position is a value that represents a source position.
A position is valid if Line > 0.
// column number, starting at 1 (character count per line)
// filename, if any
// line number, starting at 1
// byte offset, starting at 0
IsValid reports whether the position is valid.
( Position) String() string
*Position : database/sql/driver.Validator
Position : expvar.Var
Position : fmt.Stringer
func (*Scanner).Pos() (pos Position)
A Scanner implements reading of Unicode characters and tokens from an [io.Reader].
Error is called for each error encountered. If no Error
function is set, the error is reported to os.Stderr.
ErrorCount is incremented by one for each error encountered.
IsIdentRune is a predicate controlling the characters accepted
as the ith rune in an identifier. The set of valid characters
must not intersect with the set of white space characters.
If no IsIdentRune function is set, regular Go identifiers are
accepted instead. The field may be changed at any time.
The Mode field controls which tokens are recognized. For instance,
to recognize Ints, set the ScanInts bit in Mode. The field may be
changed at any time.
Start position of most recently scanned token; set by Scan.
Calling Init or Next invalidates the position (Line == 0).
The Filename field is always left untouched by the Scanner.
If an error is reported (via Error) and Position is invalid,
the scanner is not inside a token. Call Pos to obtain an error
position in that case, or to obtain the position immediately
after the most recently scanned token.
// column number, starting at 1 (character count per line)
// filename, if any
// line number, starting at 1
// byte offset, starting at 0
The Whitespace field controls which characters are recognized
as white space. To recognize a character ch <= ' ' as white space,
set the ch'th bit in Whitespace (the Scanner's behavior is undefined
for values ch > ' '). The field may be changed at any time.
Init initializes a [Scanner] with a new source and returns s.
[Scanner.Error] is set to nil, [Scanner.ErrorCount] is set to 0, [Scanner.Mode] is set to [GoTokens],
and [Scanner.Whitespace] is set to [GoWhitespace].
IsValid reports whether the position is valid.
Next reads and returns the next Unicode character.
It returns [EOF] at the end of the source. It reports
a read error by calling s.Error, if not nil; otherwise
it prints an error message to [os.Stderr]. Next does not
update the [Scanner.Position] field; use [Scanner.Pos]() to
get the current position.
Peek returns the next Unicode character in the source without advancing
the scanner. It returns [EOF] if the scanner's position is at the last
character of the source.
Pos returns the position of the character immediately after
the character or token returned by the last call to [Scanner.Next] or [Scanner.Scan].
Use the [Scanner.Position] field for the start position of the most
recently scanned token.
Scan reads the next token or Unicode character from source and returns it.
It only recognizes tokens t for which the respective [Scanner.Mode] bit (1<<-t) is set.
It returns [EOF] at the end of the source. It reports scanner errors (read and
token errors) by calling s.Error, if not nil; otherwise it prints an error
message to [os.Stderr].
( Scanner) String() string
TokenText returns the string corresponding to the most recently scanned token.
Valid after calling [Scanner.Scan] and in calls of [Scanner.Error].
*Scanner : database/sql/driver.Validator
Scanner : expvar.Var
Scanner : fmt.Stringer
func (*Scanner).Init(src io.Reader) *Scanner
Package-Level Functions (only one)
TokenString returns a printable string for a token or Unicode character.
Package-Level Constants (total 18)
The result of Scan is one of these tokens or a Unicode character.
The result of Scan is one of these tokens or a Unicode character.
The result of Scan is one of these tokens or a Unicode character.
The result of Scan is one of these tokens or a Unicode character.
Predefined mode bits to control recognition of tokens. For instance,
to configure a [Scanner] such that it only recognizes (Go) identifiers,
integers, and skips comments, set the Scanner's Mode field to:
ScanIdents | ScanInts | SkipComments
With the exceptions of comments, which are skipped if SkipComments is
set, unrecognized tokens are not ignored. Instead, the scanner simply
returns the respective individual characters (or possibly sub-tokens).
For instance, if the mode is ScanIdents (not ScanStrings), the string
"foo" is scanned as the token sequence '"' [Ident] '"'.
Use GoTokens to configure the Scanner such that it accepts all Go
literal tokens including Go identifiers. Comments will be skipped.
GoWhitespace is the default value for the [Scanner]'s Whitespace field.
Its value selects Go's white space characters.
The result of Scan is one of these tokens or a Unicode character.
The result of Scan is one of these tokens or a Unicode character.
The result of Scan is one of these tokens or a Unicode character.
Predefined mode bits to control recognition of tokens. For instance,
to configure a [Scanner] such that it only recognizes (Go) identifiers,
integers, and skips comments, set the Scanner's Mode field to:
ScanIdents | ScanInts | SkipComments
With the exceptions of comments, which are skipped if SkipComments is
set, unrecognized tokens are not ignored. Instead, the scanner simply
returns the respective individual characters (or possibly sub-tokens).
For instance, if the mode is ScanIdents (not ScanStrings), the string
"foo" is scanned as the token sequence '"' [Ident] '"'.
Use GoTokens to configure the Scanner such that it accepts all Go
literal tokens including Go identifiers. Comments will be skipped.
Predefined mode bits to control recognition of tokens. For instance,
to configure a [Scanner] such that it only recognizes (Go) identifiers,
integers, and skips comments, set the Scanner's Mode field to:
ScanIdents | ScanInts | SkipComments
With the exceptions of comments, which are skipped if SkipComments is
set, unrecognized tokens are not ignored. Instead, the scanner simply
returns the respective individual characters (or possibly sub-tokens).
For instance, if the mode is ScanIdents (not ScanStrings), the string
"foo" is scanned as the token sequence '"' [Ident] '"'.
Use GoTokens to configure the Scanner such that it accepts all Go
literal tokens including Go identifiers. Comments will be skipped.
Predefined mode bits to control recognition of tokens. For instance,
to configure a [Scanner] such that it only recognizes (Go) identifiers,
integers, and skips comments, set the Scanner's Mode field to:
ScanIdents | ScanInts | SkipComments
With the exceptions of comments, which are skipped if SkipComments is
set, unrecognized tokens are not ignored. Instead, the scanner simply
returns the respective individual characters (or possibly sub-tokens).
For instance, if the mode is ScanIdents (not ScanStrings), the string
"foo" is scanned as the token sequence '"' [Ident] '"'.
Use GoTokens to configure the Scanner such that it accepts all Go
literal tokens including Go identifiers. Comments will be skipped.
Predefined mode bits to control recognition of tokens. For instance,
to configure a [Scanner] such that it only recognizes (Go) identifiers,
integers, and skips comments, set the Scanner's Mode field to:
ScanIdents | ScanInts | SkipComments
With the exceptions of comments, which are skipped if SkipComments is
set, unrecognized tokens are not ignored. Instead, the scanner simply
returns the respective individual characters (or possibly sub-tokens).
For instance, if the mode is ScanIdents (not ScanStrings), the string
"foo" is scanned as the token sequence '"' [Ident] '"'.
Use GoTokens to configure the Scanner such that it accepts all Go
literal tokens including Go identifiers. Comments will be skipped.
Predefined mode bits to control recognition of tokens. For instance,
to configure a [Scanner] such that it only recognizes (Go) identifiers,
integers, and skips comments, set the Scanner's Mode field to:
ScanIdents | ScanInts | SkipComments
With the exceptions of comments, which are skipped if SkipComments is
set, unrecognized tokens are not ignored. Instead, the scanner simply
returns the respective individual characters (or possibly sub-tokens).
For instance, if the mode is ScanIdents (not ScanStrings), the string
"foo" is scanned as the token sequence '"' [Ident] '"'.
Use GoTokens to configure the Scanner such that it accepts all Go
literal tokens including Go identifiers. Comments will be skipped.
Predefined mode bits to control recognition of tokens. For instance,
to configure a [Scanner] such that it only recognizes (Go) identifiers,
integers, and skips comments, set the Scanner's Mode field to:
ScanIdents | ScanInts | SkipComments
With the exceptions of comments, which are skipped if SkipComments is
set, unrecognized tokens are not ignored. Instead, the scanner simply
returns the respective individual characters (or possibly sub-tokens).
For instance, if the mode is ScanIdents (not ScanStrings), the string
"foo" is scanned as the token sequence '"' [Ident] '"'.
Use GoTokens to configure the Scanner such that it accepts all Go
literal tokens including Go identifiers. Comments will be skipped.
Predefined mode bits to control recognition of tokens. For instance,
to configure a [Scanner] such that it only recognizes (Go) identifiers,
integers, and skips comments, set the Scanner's Mode field to:
ScanIdents | ScanInts | SkipComments
With the exceptions of comments, which are skipped if SkipComments is
set, unrecognized tokens are not ignored. Instead, the scanner simply
returns the respective individual characters (or possibly sub-tokens).
For instance, if the mode is ScanIdents (not ScanStrings), the string
"foo" is scanned as the token sequence '"' [Ident] '"'.
Use GoTokens to configure the Scanner such that it accepts all Go
literal tokens including Go identifiers. Comments will be skipped.
Predefined mode bits to control recognition of tokens. For instance,
to configure a [Scanner] such that it only recognizes (Go) identifiers,
integers, and skips comments, set the Scanner's Mode field to:
ScanIdents | ScanInts | SkipComments
With the exceptions of comments, which are skipped if SkipComments is
set, unrecognized tokens are not ignored. Instead, the scanner simply
returns the respective individual characters (or possibly sub-tokens).
For instance, if the mode is ScanIdents (not ScanStrings), the string
"foo" is scanned as the token sequence '"' [Ident] '"'.
Use GoTokens to configure the Scanner such that it accepts all Go
literal tokens including Go identifiers. Comments will be skipped.
The result of Scan is one of these tokens or a Unicode character.
The pages are generated with Golds v0.7.0-preview. (GOOS=linux GOARCH=amd64) Golds is a Go 101 project developed by Tapir Liu. PR and bug reports are welcome and can be submitted to the issue list. Please follow @zigo_101 (reachable from the left QR code) to get the latest news of Golds. |