|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.sauronsoftware.grab4j.html.search.Criteria
public class Criteria
This class represents a search criteria to find elements within a HTML document representation.
A criteria is parsed from a string, using some simple rules. A grab4j search criteria, first of all, is very similar to a XPath query.
You can launch a search from any element in a document. The search will be performed over the sub-elements available. The next examples are supposed to be executed over a whole document, but this is not ever required. You can start to query wherever you want, of course using relative search paths.
A search criteria string representation is splitted in several parts, separated by a slash character:
token1/token2/token3
Each token is used to recognize a tag or a set of tags. The general model is the following:
tagNamePattern[index](attribute1=valuePattern1)(attribute2=valuePattern2)(...)
The first element in the token model is the tag name pattern. It is usefull to find the wanted tag(s). It is a wildcard pattern: the star character can be used to match any characters sequence.
A first simple example:
html/body/div
This criteria finds all the "div" elements whose father is the "body" tag, which in turn is inside a "html" tag.
A wildcard example:
html/body/*
This criteria finds all the elements whose father is the "body" tag, within the "html" one.
Another one:
html/body/h*
This criteria finds all the elements whose father is the "body" tag and whose name starts with the "h" letter, such "h1", "h2", "h3" and so on.
Using the index selector:
html/body/div[1]
This criteria returns the second "div" element whose father is the "body" tag. Note that the index lesser value is 0, just like in arrays.
html/body/h*[2]
This criteria returns the third element whose father is the "body" tag and whose name starts with the "h" letter.
Using attribute selector(s):
html/body/div(id=d1)
This one searches for divs with an attribute called "id", whose value is exactly "d1".
The star wildcard is admitted in the value part of the selector:
html/body/div(id=*)
This one searches for divs with an attribute called "id", regardless of its value.
html/body/div(id=d*)
This one searches for divs with an attribute called "id", whose value starts with the "d" letter.
More attribute selectors can be combined together:
html/body/div(id=d*)(align=left)
A index selector and two attribute selectors in this example:
html/body/div[1](id=d*)(align=left)
This will search for the second "div" tag, inside the "html"-"body" sequence, whose attribute "id" has a value starting with "d" and whose attribute "align" is exactly "left".
Search criterias admit a special token, called the "recursive deep token" and represented by a sequence of three points.
html/body/.../table
This criteria will search for tables inside the body of the document, regardless if they are placed straight under the "body" tag or not. This is, of course, a recursive search within the body sub-elements. The criteria will return all the tables like the following
<html><body><table>...
but it will return also all the ones like
<html><body><div><div>table>...
Escaping of reserved characters is possibile through the sequence <xx>, where xx is the exadecimal code of the escaped character.
Constructor Summary | |
---|---|
Criteria(java.lang.String criteria)
It parses and builds the criteria. |
Method Summary | |
---|---|
Condition |
getCondition(int index)
This method returns a criteria condition. |
int |
getConditionCount()
This method returns the number of the conditions in the criteria. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Criteria(java.lang.String criteria) throws InvalidCriteriaException
criteria
- The criteria as a string.
InvalidCriteriaException
- If the given string is not a valid criteria.Method Detail |
---|
public int getConditionCount()
public Condition getCondition(int index)
index
- The index of the wanted condition, starting from 0 to
getConditionCount() - 1.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |