Files Classes Functions Hierarchy
#include <tokenizer.h>
Public Member Functions | |
| void | reset () |
| Resets token iteration. | |
| boolc | operator! () const |
| Can the stream be read? | |
| string & | operator* () |
| Modify the current token. | |
| stringc & | operator() () const |
| Access the current token in the stream. | |
| void | operator++ () |
| Increment the stream's index. | |
| void | atomize (stringc &atom) |
| Splits strings. | |
| void | subtract (stringc &atom) |
| Splits strings removing the atom. | |
| void | tokenize () |
delimiter set. | |
| void | stripcomment (stringc &comment) |
| For each line delete right of and including the comment if it exists in the line. | |
| void | remove (stringc &token) |
| Remove matching tokens. | |
| void | remove_if () |
| Remove invalid lines as interpreted by myspacer. | |
| template<typename SPACER > | |
| void | remove_if (SPACER spacer) |
| Remove invalid lines. | |
| template<typename X > | |
| void | apply (X x) |
| Apply the functional object x to each element. | |
| void | trim () |
| Trim the leading and trailing space of each token. | |
| void | trim_and_prune () |
| Remove surrounding white space and delete empty strings. | |
| void | extractfromcurrent (vector< string > &v, stringc &atom) const |
| The current line is read and parsed, split by the atom and each element is written to the vector v. | |
| ostream & | print (ostream &os) const |
| Print seq. | |
| void | read (stringc &data) |
| Read the data in. | |
| void | readaslines (stringc &data) |
| Read the string as lines. | |
| void | readaslinesgeneral (stringc &data) |
| Because special characters in strings can interfere with reading strings, this routine is a hack to remove white spaces, commas and empty lines. | |
| tokenizer () | |
| Construct in uninitialized state. | |
| tokenizer (stringc &data) | |
| Construct with an initial string. | |
| boolc | operator== (tokenizer &t2) |
| Data copy. | |
| boolc | find (string::size_type &k, stringc &atom) |
| Non destructive text search. | |
| boolc | find (string::size_type &k, stringc &atom, string::size_type const k0) |
| Search from k0 in the current string. | |
| boolc | atomize_next (stringc &atom) |
| Search for the first atom and atomize it. | |
| boolc | atomize_next (stringc &atom, liststringi &iend_) |
| boolc | atomize_next_tag (liststringi &i1, liststringi &i2, stringc &tag, liststringi &iend_) |
| tag=cat then i1 points to <cat>, i2 to </cat>. | |
| boolc | atomize_next_tag (liststringi &i1, liststringi &i2, stringc &tag) |
| boolc | atomize_next (liststringi &i1, liststringi &i2, stringc &atom1, stringc &atom2) |
| operator stringc () | |
| Output the tokenizer back as a string. | |
Public Attributes | |
| liststringi | current |
| The current state. | |
| list< string > | seq |
| Data representation. | |
| string | printdelimiter |
| Each element printed is separated by the delimiter. | |
Generally concerned with global scope.
This class has two roles. Firstly it is a primitive parser. Secondly it has a state so this class is used by another to point to the current token.
I refer to a stream to mean the list of string tokens with a current position in the list.
Use: read in the file and call atomize to separate the sequence into individual elements that can be interpreted by an interpreter.
Can directly operate on the data though the list class.
Example 1. Remove comments and delete any empty strings.
tokenizer ss;
...
ss.stripcomment("//");
ss.seq.remove("");
See vrmlshape where I wrote a VRML parser using this class to hold the current state.
Definition at line 66 of file tokenizer.h.
| tokenizer::tokenizer | ( | ) |
Construct in uninitialized state.
Definition at line 435 of file tokenizer.cpp.
00436 : printdelimiter("\n") 00437 { 00438 current=seq.end(); 00439 }
| tokenizer::tokenizer | ( | stringc & | data | ) |
| void tokenizer::apply | ( | X | x | ) | [inline] |
Apply the functional object x to each element.
Definition at line 117 of file tokenizer.h.
References seq.
Referenced by trim().
00118 { 00119 liststringi i = seq.begin(); 00120 liststringic imax = seq.end(); 00121 for ( ; i!=imax; ++i ) 00122 { x(*i); } 00123 };
| void tokenizer::atomize | ( | stringc & | atom | ) |
Splits strings.
Definition at line 297 of file tokenizer.cpp.
References seq.
00298 { 00299 liststringi i = seq.begin(); 00300 00301 for (;i!=seq.end(); ++i) 00302 atomize(i,atom); 00303 }
| boolc tokenizer::atomize_next | ( | liststringi & | i1, | |
| liststringi & | i2, | |||
| stringc & | atom1, | |||
| stringc & | atom2 | |||
| ) |
Definition at line 229 of file tokenizer.cpp.
00235 { 00236 bool res; 00237 00238 res=atomize_next(atom1); 00239 if (res==false) 00240 return false; 00241 i1=current; 00242 00243 res=atomize_next(atom2); 00244 if (res==false) 00245 return false; 00246 i2=current; 00247 00248 // Default to reset iterator to first tag. 00249 current=i1; 00250 00251 return true; 00252 }
| boolc tokenizer::atomize_next | ( | stringc & | atom, | |
| liststringi & | iend_ | |||
| ) |
Definition at line 186 of file tokenizer.cpp.
00190 { 00191 liststringi i = current; 00192 00193 string::size_type k; 00194 00195 string::size_type const atomlen = atom.length(); 00196 00197 for (;i!=iend_; ++i) 00198 { 00199 k = i->find(atom.c_str()); 00200 00201 if (k==string::npos) 00202 continue; 00203 00204 atomize(i,atom,k); 00205 if (i->length()==atomlen) 00206 { 00207 if (atom==*i) 00208 { 00209 current=i; 00210 return true; 00211 } 00212 } 00213 00214 ++i; 00215 atomize(i,atom,0); 00216 assert(i->length()==atomlen); 00217 current=i; 00218 return true; 00219 } 00220 00221 return false; 00222 }
Search for the first atom and atomize it.
current points to it. Essentially this is a find function.
Definition at line 142 of file tokenizer.cpp.
Referenced by misclib_testcode::tokenizertest::test10().
00145 { 00146 liststringi i = current; 00147 00148 string::size_type k; 00149 00150 string::size_type const atomlen = atom.length(); 00151 00152 for (;i!=seq.end(); ++i) 00153 { 00154 k = i->find(atom.c_str()); 00155 00156 if (k==string::npos) 00157 continue; 00158 00159 atomize(i,atom,k); 00160 if (i->length()==atomlen) 00161 { 00162 if (atom==*i) 00163 { 00164 current=i; 00165 return true; 00166 } 00167 } 00168 00169 ++i; 00170 atomize(i,atom,0); 00171 assert(i->length()==atomlen); 00172 current=i; 00173 return true; 00174 } 00175 00176 return false; 00177 }
| boolc tokenizer::atomize_next_tag | ( | liststringi & | i1, | |
| liststringi & | i2, | |||
| stringc & | tag | |||
| ) |
Definition at line 255 of file tokenizer.cpp.
00260 { 00261 liststringi iend_=seq.end(); 00262 return atomize_next_tag(i1,i2,tag,iend_); 00263 }
| boolc tokenizer::atomize_next_tag | ( | liststringi & | i1, | |
| liststringi & | i2, | |||
| stringc & | tag, | |||
| liststringi & | iend_ | |||
| ) |
tag=cat then i1 points to <cat>, i2 to </cat>.
Definition at line 266 of file tokenizer.cpp.
Referenced by tokenizerlocal::scope(), and misclib_testcode::tokenizertest::unittest01().
00272 { 00273 //cout << "atomize_next_tag "; 00274 string tag1="<"+tag+">"; 00275 //cout << SHOW(tag1) << " "; 00276 bool res; 00277 00278 res=atomize_next(tag1,iend_); 00279 if (res==false) 00280 return false; 00281 i1=current; 00282 00283 string tag2="</"+tag+">"; 00284 //cout << SHOW(tag2) << " "; 00285 res=atomize_next(tag2,iend_); 00286 if (res==false) 00287 return false; 00288 i2=current; 00289 00290 // Default to reset iterator to first tag. 00291 current=i1; 00292 // cout << SHOW(*current) << " 1" << endl; 00293 00294 return true; 00295 }
| void tokenizer::extractfromcurrent | ( | vector< string > & | v, | |
| stringc & | atom | |||
| ) | const |
The current line is read and parsed, split by the atom and each element is written to the vector v.
Definition at line 306 of file tokenizer.cpp.
00310 { 00311 v.clear(); 00312 00313 string s(*current); 00314 00315 if (s.empty()) 00316 return; 00317 00318 string::size_type k; 00319 00320 string::size_type const atomlen = atom.length(); 00321 00322 k = s.find(atom.c_str()); 00323 for ( ; k!=string::npos; k=s.find(atom.c_str()) ) 00324 { 00325 if (k==0) 00326 { 00327 s.erase(k,atomlen); 00328 00329 continue; 00330 } 00331 00332 v.push_back(s.substr(0,k)); 00333 s.erase(0,k); 00334 } 00335 00336 if (s.empty()==false) 00337 v.push_back(s); 00338 }
Search from k0 in the current string.
Definition at line 503 of file tokenizer.cpp.
References assertreturnfalse, and find().
00508 { 00509 string::size_type atomsize=atom.size(); 00510 if (atomsize==0) 00511 { 00512 current=seq.end(); 00513 return false; 00514 } 00515 00516 liststringi i = current; 00517 assertreturnfalse(i!=seq.end()); 00518 00519 if (k0+atomsize-1<i->size()) 00520 { 00521 k = i->find(atom.c_str(),k0); 00522 if (k!=string::npos) 00523 return true; 00524 } 00525 00526 // failed to find in current string. 00527 ++current; 00528 00529 return tokenizer::find(k,atom); 00530 }
Non destructive text search.
Definition at line 533 of file tokenizer.cpp.
Referenced by find(), tokenizerfind::operator++(), tokenizerfind::reset(), and misclib_testcode::tokenizertest::unittest01().
00537 { 00538 liststringi i = current; 00539 00540 for (;i!=seq.end(); ++i) 00541 { 00542 k = i->find(atom.c_str()); 00543 00544 if (k==string::npos) 00545 continue; 00546 00547 current=i; 00548 return true; 00549 } 00550 00551 return false; 00552 }
| tokenizer::operator stringc | ( | ) |
Output the tokenizer back as a string.
Definition at line 480 of file tokenizer.cpp.
References reset().
00481 { 00482 string s; 00483 tokenizer& tk(*this); 00484 for ( tk.reset(); !tk; ++tk) 00485 { s += tk(); }; 00486 00487 return s; 00488 }
| boolc tokenizer::operator! | ( | ) | const |
| stringc & tokenizer::operator() | ( | ) | const |
| string & tokenizer::operator* | ( | ) |
| void tokenizer::operator++ | ( | ) |
Data copy.
Compare tokenizers by comparing each token.
Definition at line 452 of file tokenizer.cpp.
References reset().
00453 { 00454 reset(); 00455 t2.reset(); 00456 for ( ;!t2; ++t2 ) 00457 { 00458 if (!(*this)==false) 00459 return false; 00460 00461 if ( (*this)() != t2() ) 00462 return false; 00463 00464 ++(*this); 00465 } 00466 00467 if (!(*this)) 00468 return false; 00469 00470 return true; 00471 }
Print seq.
Set the printdelimiter to any string.
Definition at line 407 of file tokenizer.cpp.
References printdelimiter, and seq.
Referenced by operator<<().
00408 { 00409 liststringic i = seq.begin(); 00410 liststringic iend2 = seq.end(); 00411 if (i!=iend2) 00412 os << *i; 00413 ++i; 00414 for ( ; i!=iend2; ++i ) 00415 { 00416 os << printdelimiter << *i; 00417 } 00418 00419 return os; 00420 }
| void tokenizer::read | ( | stringc & | data | ) |
Read the data in.
Definition at line 46 of file tokenizer.cpp.
References seq.
Referenced by modulereport::reset().
00047 { 00048 seq.push_back(data); 00049 }
| void tokenizer::readaslines | ( | stringc & | data | ) |
Read the string as lines.
Each line is a string.
Definition at line 51 of file tokenizer.cpp.
References seq, and subtract().
Referenced by modulelist::buildlist(), projunittest::eval(), html::insert(), readaslinesgeneral(), misclib_testcode::tokenizertest::test05(), misclib_testcode::tokenizertest::test06(), and misclib_testcode::tokenizertest::test08().
| void tokenizer::readaslinesgeneral | ( | stringc & | data | ) |
Because special characters in strings can interfere with reading strings, this routine is a hack to remove white spaces, commas and empty lines.
Definition at line 392 of file tokenizer.cpp.
References readaslines(), remove_if(), subtract(), and trim().
Referenced by projunittests::eval(), simplexD2tessindexed< PT, PD, INDX >::serializeInverse(), simplexD1tessindexed< PT, PD, INDX >::serializeInverse(), and simplexD1listlinked< VI, INDX >::serializeInverse().
00393 { 00394 readaslines(data); 00395 subtract(","); 00396 subtract(" "); 00397 trim(); 00398 remove_if(); 00399 }
| void tokenizer::remove | ( | stringc & | token | ) |
Remove matching tokens.
Definition at line 25 of file tokenizer.cpp.
References seq.
Referenced by misclib_testcode::tokenizertest::test04().
00026 { 00027 seq.remove(token); 00028 }
| void tokenizer::remove_if | ( | SPACER | spacer | ) | [inline] |
Remove invalid lines.
Definition at line 112 of file tokenizer.h.
References seq.
00113 { seq.remove_if(spacer); }
| void tokenizer::remove_if | ( | ) |
Remove invalid lines as interpreted by myspacer.
Definition at line 30 of file tokenizer.cpp.
References seq.
Referenced by tokenizermisc::comparewithoutspace(), readaslinesgeneral(), misclib_testcode::tokenizertest::test06(), misclib_testcode::tokenizertest::test08(), misclib_testcode::tokenizertest::test09(), tokenize(), and trim_and_prune().
00031 { 00032 seq.remove_if(spacerdelete<>()); 00033 }
| void tokenizer::reset | ( | ) |
Resets token iteration.
Definition at line 422 of file tokenizer.cpp.
Referenced by menusystem::addfont10paragraphs(), modulelist::buildlist(), modulereport::compile_exitstatus(), projunittests::eval(), projunittest::eval(), vispoint3::handlecommand(), vispoint2::handlecommand(), visline::handlecommand(), html::insert(), operator stringc(), operator==(), tokenizerlocal::reset(), simplexD2tessindexed< PT, PD, INDX >::serializeInverse(), simplexD1tessindexed< PT, PD, INDX >::serializeInverse(), simplexD1listlinked< VI, INDX >::serializeInverse(), gobjglColor3f::serializeInverse(), gobjglVertex3f::serializeInverse(), makestate< CFC >::standardbuild(), misclib_testcode::tokenizertest::test09(), misclib_testcode::tokenizertest::test10(), tokenizer(), misclib_testcode::tokenizertest::unittest01(), misclib_testcode::tokenizertest::unittest02(), misclib_testcode::tokenizertest::unittest03(), and modulereport::unittests_exitstatus().
| void tokenizer::stripcomment | ( | stringc & | comment | ) |
For each line delete right of and including the comment if it exists in the line.
Definition at line 123 of file tokenizer.cpp.
References seq.
Referenced by misclib_testcode::tokenizertest::test04(), misclib_testcode::tokenizertest::test06(), and misclib_testcode::tokenizertest::test08().
00124 { 00125 liststringi k = seq.begin(); 00126 string::size_type i; 00127 for (;k!=seq.end(); ++k) 00128 { 00129 string & token(*k); 00130 i=0; 00131 i = token.find(comment,i); 00132 if (i==string::npos) 00133 continue; 00134 00135 token.erase(i); 00136 } 00137 }
| void tokenizer::subtract | ( | stringc & | atom | ) |
Splits strings removing the atom.
Definition at line 401 of file tokenizer.cpp.
References seq.
Referenced by menusystem::addfont10blockstart(), menusystem::addfont10paragraphs(), vispoint3::handlecommand(), vispoint2::handlecommand(), visline::handlecommand(), readaslines(), readaslinesgeneral(), makestate< CFC >::standardbuild(), misclib_testcode::tokenizertest::test00(), misclib_testcode::tokenizertest::test04(), misclib_testcode::tokenizertest::test06(), misclib_testcode::tokenizertest::test08(), and tokenize().
00402 { 00403 atomize(atom); 00404 seq.remove(atom); 00405 }
| void tokenizer::tokenize | ( | ) |
delimiter set.
Definition at line 473 of file tokenizer.cpp.
References remove_if(), subtract(), and trim().
Referenced by gobjglColor3f::serializeInverse(), and gobjglVertex3f::serializeInverse().
| void tokenizer::trim | ( | ) |
Trim the leading and trailing space of each token.
Definition at line 35 of file tokenizer.cpp.
References apply().
Referenced by tokenizermisc::comparewithoutspace(), readaslinesgeneral(), misclib_testcode::tokenizertest::test07(), misclib_testcode::tokenizertest::test08(), misclib_testcode::tokenizertest::test09(), tokenize(), and trim_and_prune().
00036 { 00037 apply(spacertrim<>()); 00038 }
| void tokenizer::trim_and_prune | ( | ) |
Remove surrounding white space and delete empty strings.
Definition at line 40 of file tokenizer.cpp.
References remove_if(), and trim().
Referenced by projunittest::eval(), misclib_testcode::tokenizertest::test09(), and misclib_testcode::tokenizertest::unittest01().
00041 { 00042 trim(); 00043 remove_if(spacerdelete<>()); 00044 }
The current state.
Definition at line 73 of file tokenizer.h.
Referenced by tokenizerlocal::erasetag(), html::insert(), tokenizerlocal::operator!(), operator!(), tokenizerlocal::operator()(), operator()(), operator*(), tokenizerlocal::operator++(), operator++(), tokenizerlocal::reset(), reset(), tokenizerlocal::scopesearch(), tokenizer(), tokenizerlocal::tokenizerlocal(), misclib_testcode::tokenizertest::unittest01(), and tokenizerlocal::writetag().
| string tokenizer::printdelimiter |
Each element printed is separated by the delimiter.
Definition at line 147 of file tokenizer.h.
Referenced by print(), misclib_testcode::tokenizertest::test04(), misclib_testcode::tokenizertest::test06(), misclib_testcode::tokenizertest::test07(), and misclib_testcode::tokenizertest::test08().
| list<string> tokenizer::seq |
Data representation.
Definition at line 76 of file tokenizer.h.
Referenced by menusystem::addfont10blockstart(), apply(), atomize(), tokenizerlocal::endpointreset(), tokenizerlocal::erasetag(), vispoint3::handlecommand(), vispoint2::handlecommand(), visline::handlecommand(), html::insert(), operator!(), operator()(), operator*(), operator++(), print(), read(), readaslines(), remove(), remove_if(), reset(), modulereport::reset(), gobjglColor3f::serializeInverse(), gobjglVertex3f::serializeInverse(), stripcomment(), subtract(), misclib_testcode::tokenizertest::test00(), misclib_testcode::tokenizertest::test03(), misclib_testcode::tokenizertest::test04(), misclib_testcode::tokenizertest::test07(), misclib_testcode::tokenizertest::test10(), tokenizer(), tokenizerlocal::tokenizerlocal(), misclib_testcode::tokenizertest::unittest01(), misclib_testcode::tokenizertest::unittest02(), and tokenizerlocal::writetag().
1.5.8