Parsing CSS with ANTLR 12

Posted by Ben Poweski Fri, 23 May 2008 18:28:00 GMT

For a project I”m working on, we’ve used CSS syntax to describe styling on application objects. To accomplish this we created a parser using Antlr. Here is our implementation of the CSS core syntax. Unicode support is left out as it was not needed for our use case, but it should be pretty easy to add in.

grammar CssCore;
options { language=Java; }

/*

Grammar taken from
http://www.w3.org/TR/REC-CSS2/syndata.html#tokenization

*/

//stylesheet  : [ CDO | CDC | S | statement ]*;
stylesheet
    :   (CDO|CDC|statement)*
    ;


//statement   : ruleset | at-rule;
statement
    :   ruleset
    |   atRule
    ;

//at-rule     : ATKEYWORD S* any* [ block | ';' S* ];
atRule  :   ATKEYWORD any* (block | SEMICOLON)
    ;

//block       : '{' S* [ any | block | ATKEYWORD S* | ';' ]* '}' S*;
block   :   LBRACE (any|block|ATKEYWORD|SEMICOLON)* RBRACE 
    ;

//ruleset     : selector? '{' S* declaration? [ ';' S* declaration? ]* '}' S*;

selector:   '*'
    |   '*'? (IDENT|'>'|'+'|CLASS|HASH)+
    ;


ruleset :   selector? LBRACE declaration? ( SEMICOLON declaration? )* RBRACE
    ;

//selector    : any+;

//declaration : property ':' S* value;
declaration
    :   property COLON value
    ;

//property    : IDENT S*;
property:   IDENT
     ;

//value       : [ any | block | ATKEYWORD S* ]+;
value   :   (any|block|ATKEYWORD)*
          ;

//any         : [ IDENT | NUMBER | PERCENTAGE | DIMENSION | STRING
//              | DELIM | URI | HASH | UNICODE-RANGE | INCLUDES
//              | FUNCTION | DASHMATCH | '(' any* ')' | '[' any* ']' ] S*;
any :   (   IDENT|NUMBER|PERCENTAGE|DIMENSION|STRING|
            HASH|INCLUDES|
            FUNCTION|DASHMATCH
            // TODO UNICODE_RANGE|DELIM|URI| '(' any* ')' | '[' any* ']' ] S*;
        )
    ;


/* Tokens */

//IDENT     {ident}
IDENT   :   F_IDENT
    ;

//ATKEYWORD     @{ident}
ATKEYWORD
    :   '@' F_IDENT
    ;

//STRING    {string}
STRING  :   F_STRING
    ;

//HASH  #{name}
HASH    :   '#' F_NAME
    ;

//NUMBER    {num}
NUMBER  :   F_NUM
    ;

//PERCENTAGE    {num}%
PERCENTAGE
    :   F_NUM '%'
    ;

//DIMENSION     {num}{ident}
DIMENSION
    :   F_NUM F_IDENT
    ;

//URI   url\({w}{string}{w}\)
//|url\({w}([!#$%&*-~]|{nonascii}|{escape})*{w}\)
//UNICODE-RANGE     U\+[0-9A-F?]{1,6}(-[0-9A-F]{1,6})?


//CDO   <!--
CDO :   '<!--'
    ;

//CDC   -->
CDC :   '-->'
    ;


//;     ;
SEMICOLON
    :   ';'
    ;   

COLON   :   ':'
    ;   


//{     \{
LBRACE  :   '{'
    ;

//}     \}
RBRACE  :   '}'
    ;

//(     \(
LPAREN  :   '('
    ;

//)     \)
RPAREN  :   ')'
    ;

//[     \[
LBRACKET:   '['
    ;

//]     \]
RBRACKET:   ']'
    ;

//S     [ \t\r\n\f]+
S   :   (' '|'\t'|'\r'|'\n'|'\f')+
        { $channel=HIDDEN; }
    ;

//COMMENT   \/\*[^*]*\*+([^/][^*]*\*+)*\/
COMMENT :   '/*' (options {greedy=false;} : .)*   '*/'
        { $channel=HIDDEN; }
    ;

//FUNCTION  {ident}\(
FUNCTION:   F_IDENT '('
    ;

//INCLUDES  ~=
INCLUDES:   '~='
    ;

//DASHMATCH     |=
DASHMATCH
    :   '|='
    ;

//DELIM     any other character not matched by the above rules

CLASS   :   '.' F_IDENT
    ;


//ident     {nmstart}{nmchar}*
fragment
F_IDENT :   F_NMSTART F_NMCHAR*
    ;

//name  {nmchar}+
fragment
F_NAME  :   F_NMCHAR+
    ;

//nmstart   [a-zA-Z]|{nonascii}|{escape}
fragment
F_NMSTART
    :   (F_LETTER)
// TODO add nonascii, escaped
    ;

//nonascii  [^\0-\177]
//unicode   \\[0-9a-f]{1,6}[ \n\r\t\f]?
//escape    {unicode}|\\[ -~\200-\4177777]

//nmchar    [a-z0-9-]|{nonascii}|{escape}
fragment
F_NMCHAR:   (F_LETTER|F_DIGIT|'-')
// TODO add nonascii, escaped
    ;

//num   [0-9]+|[0-9]*\.[0-9]+
fragment
F_NUM   :   ('0'..'9')+
    |   ('0'..'9')* '.' ('0'..'9')+
    ;


//string    {string1}|{string2}
fragment
F_STRING:   F_STRING1
    |   F_STRING2
    ;

//string1   \"([\t !#$%&(-~]|\\{nl}|\'|{nonascii}|{escape})*\"
fragment
F_STRING1
    :   '"' ('\t'|' '|'!'|'#'|'$'|'%'|'&'|'\''|'.'|F_LETTER|F_DIGIT)* '"' 
    ;
//string2   \'([\t !#$%&(-~]|\\{nl}|\"|{nonascii}|{escape})*\'
fragment
F_STRING2
    :   '\'' ('\t'|' '|'!'|'#'|'$'|'%'|'&'|'.'|F_LETTER|F_DIGIT)* '\'' 
    ;

//nl    \n|\r\n|\r|\f
fragment
F_NL    :   '\n'
    |   '\r\n'
    |   '\r'
    |   '\f'
    ;

fragment
F_LETTER:   'a'..'z'
    |   'A'..'Z'
    ;

fragment
F_DIGIT :   '0'..'9'
    ;

//w     [ \t\r\n\f]*
fragment
F_W :   (' '|'\t'|'\r'|'\n'|'\f')*
    ;

Comments

Leave a response

  1. mmo 3 months later:

    I like this approach. Learning all I can on Rails, thank you.

  2. search engine ranking about 1 year later:

    Parsing CSS with ANTLR 1. Posted by Ben Poweski Fri, 23 May 2008 18:28:00 GMT. For a project I”m working on, we’ve used CSS syntax to describe styling on …

  3. sms gateway about 1 year later:

    ANTLR produces the parser in Java which I wrapped up in an … the page with appropriate CSS rules to highlight and print the formula in a pretty fashion. … and backwards, display parse trees, and other useful things.

  4. Notice Boards about 1 year later:

    It’s really good written and I fully agree with You on main issue, btw. I must say that I really enjoyed reading all of Your posts.

  5. face blog about 1 year later:

    wow! It is an amazing use of CSS. Many people do not know this aspect of CSS. Brilliant work buddy…….keep it up.

  6. ShAaNiG about 1 year later:

    I like this approach. Learning all I can on Rails, thank you.

    Prudential West

  7. Apartments Beaumont Texas about 1 year later:

    We have the best apartment value in Beaumont Tx. Winchester west apartments are a great place to live. Gated community with fenced pool and covered parking, each unit has a full sized washer and dryer in it. We are located on the west end of Dowlen rd approx 2 miles from the Park dale mall.

  8. ShAaNiG about 1 year later:

    It’s really good written and I fully agree with You on main issue, btw. I must say that I really enjoyed reading all of Your posts.

    Wholesale Brand Name Clothing

  9. ShAaNiG about 1 year later:

    It’s really good written and I fully agree with You on main issue, btw. I must say that I really enjoyed reading all of Your posts.

    Wholesalers

  10. carlos about 1 year later:

    it’s a so tedious to manage the S* everywhere. I think the parsing process build the tree representation of the sintaxis css, and the S* must not be there. In my opinion the css sintax isn’t good enough.

  11. daniel about 1 year later:

    I agree with the persons who think that you have a good approach in CSS parsing. I also think that you have formatted your code in an easy to follow way. It’s great.

    how do I become

  12. Kent about 1 year later:

    I wish I have thought of this particular solution not just because I want to have solved my own problems but also because I find it brilliant. Not everyone may think CSS is neat coding but I like it.

    essay admission

Comments