String literals

COMMENT: Needs refactoring!

String literals are described by the following lexical definitions:

stringliteral ::= [[stringprefix][1]]([shortstring][2] | [ longstring][3]) stringprefix ::= “r” | “u” | “ur” | “R” | “U” | “UR” | “Ur” | “uR” shortstring ::= “’” [shortstringitem][4]* “’” | ‘”’ [shortstringitem][4]* ‘”’ longstring ::= “”’” [longstringitem][5]* “”’” | '"""' [longstringitem][5]\* '"""' shortstringitem ::= [shortstringchar][6] | [ escapeseq][7] longstringitem ::= [ longstringchar][8] | [escapeseq][7] shortstringchar ::= <any source character except “" or newline or the quote> longstringchar ::= <any source character except “"> escapeseq ::= “" <any ASCII character>

One syntactic restriction not indicated by these productions is that whitespace is not allowed between the [stringprefix][1] and the rest of the string literal. The source character set is defined by the encoding declaration; it is ASCII if no encoding declaration is given in the source file; see section [2.1.4][10].

In plain English: String literals can be enclosed in matching single quotes (') or double quotes ("). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings). The backslash (\) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character. String literals may optionally be prefixed with a letter “r” or “R”; such strings are called raw strings and use different rules for interpreting backslash escape sequences. A prefix of “u” or “U” makes the string a Unicode string. Unicode strings use the Unicode character set as defined by the Unicode Consortium and ISO 10646. Some additional escape sequences, described below, are available in Unicode strings. The two prefix characters may be combined; in this case, “u” must appear before “r”.

In triple-quoted strings, unescaped newlines and quotes are allowed (and are retained), except that three unescaped quotes in a row terminate the string. (A `quote'' is the character used to open the string, i.e. eitheror<