Search::Xapian - Perl XS frontend to the Xapian C++ search library.
use Search::Xapian;
my $db = Search::Xapian::Database->new( '[DATABASE DIR]' );
my $enq = $db->enquire( '[QUERY TERM]' );
printf "Running query '%s'\n", $enq->get_query()->get_description();
my @matches = $enq->matches(0, 10);
print scalar(@matches) . " results found\n";
foreach my $match ( @matches ) {
my $doc = $match->get_document();
printf "ID %d %d%% [ %s ]\n", $match->get_docid(), $match->get_percent(), $doc->get_data();
}
This module wraps most methods of most Xapian classes. The missing classes
and methods should be added in the future. It also provides a simplified,
more 'perlish' interface to some common operations, as demonstrated above.
There are some gaps in the POD documentation for wrapped classes, but you
can read the Xapian C++ API documentation at
https://xapian.org/docs/apidoc/html/annotated.html for details of
these. Alternatively, take a look at the code in the examples and tests.
If you want to use Search::Xapian and the threads module together, make
sure you're using Search::Xapian >= 1.0.4.0 and Perl >= 5.8.7. As of 1.0.4.0,
Search::Xapian uses CLONE_SKIP to make sure that the perl wrapper objects
aren't copied to new threads - without this the underlying C++ objects can get
destroyed more than once.
If you encounter problems, or have any comments, suggestions, patches, etc
please email the Xapian-discuss mailing list (details of which can be found at
https://xapian.org/lists).
None by default.
- DB_OPEN
-
Open a database, fail if database doesn't exist.
- DB_CREATE
-
Create a new database, fail if database exists.
- DB_CREATE_OR_OPEN
-
Open an existing database, without destroying data, or create a new
database if one doesn't already exist.
- DB_CREATE_OR_OVERWRITE
-
Overwrite database if it exists.
- OP_AND
-
Match if both subqueries are satisfied.
- OP_OR
-
Match if either subquery is satisfied.
- OP_AND_NOT
-
Match if left but not right subquery is satisfied.
- OP_XOR
-
Match if left or right, but not both queries are satisfied.
- OP_AND_MAYBE
-
Match if left is satisfied, but use weights from both.
- OP_FILTER
-
Like OP_AND, but only weight using the left query.
- OP_NEAR
-
Match if the words are near each other. The window should be specified, as
a parameter to
Search::Xapian::Query::Query , but it defaults to the
number of terms in the list.
- OP_PHRASE
-
Match as a phrase (All words in order).
- OP_ELITE_SET
-
Select an elite set from the subqueries, and perform a query with these combined as an OR query.
- OP_VALUE_RANGE
-
Filter by a range test on a document value.
- FLAG_DEFAULT
-
This gives the QueryParser default flag settings, allowing you to easily add
flags to the default ones.
- FLAG_BOOLEAN
-
Support AND, OR, etc and bracketed subexpressions.
- FLAG_LOVEHATE
-
Support + and -.
- FLAG_PHRASE
-
Support quoted phrases.
- FLAG_BOOLEAN_ANY_CASE
-
Support AND, OR, etc even if they aren't in ALLCAPS.
- FLAG_WILDCARD
-
Support right truncation (e.g. Xap*).
- FLAG_PURE_NOT
-
Allow queries such as 'NOT apples'.
These require the use of a list of all documents in the database
which is potentially expensive, so this feature isn't enabled by
default.
- FLAG_PARTIAL
-
Enable partial matching.
Partial matching causes the parser to treat the query as a
``partially entered'' search. This will automatically treat the
final word as a wildcarded match, unless it is followed by
whitespace, to produce more stable results from interactive
searches.
- FLAG_SPELLING_CORRECTION
-
- FLAG_SYNONYM
-
- FLAG_AUTO_SYNONYMS
-
- FLAG_AUTO_MULTIWORD_SYNONYMS
-
- STEM_ALL
-
Stem all terms.
- STEM_NONE
-
Don't stem any terms.
- STEM_SOME
-
Stem some terms, in a manner compatible with Omega (capitalised words and those
in phrases aren't stemmed).
- ENQ_ASCENDING
-
docids sort in ascending order (default)
- ENQ_DESCENDING
-
docids sort in descending order
- ENQ_DONT_CARE
-
docids sort in whatever order is most efficient for the backend
Standard is db + ops + qpflags + qpstem
- major_version
-
Returns the major version of the Xapian C++ library being used. E.g. for
Xapian 1.0.9 this would return 1.
- minor_version
-
Returns the minor version of the Xapian C++ library being used. E.g. for
Xapian 1.0.9 this would return 0.
- revision
-
Returns the revision of the Xapian C++ library being used. E.g. for
Xapian 1.0.9 this would return 9. In a stable release series, Xapian libraries
with the same minor and major versions are usually ABI compatible, so this
often won't match the third component of $Search::Xapian::VERSION (which is the
version of the Search::Xapian XS wrappers).
- sortable_serialise NUMBER
-
Convert a floating point number to a string, preserving sort order.
This method converts a floating point number to a string, suitable for
using as a value for numeric range restriction, or for use as a sort
key.
The conversion is platform independent.
The conversion attempts to ensure that, for any pair of values supplied
to the conversion algorithm, the result of comparing the original
values (with a numeric comparison operator) will be the same as the
result of comparing the resulting values (with a string comparison
operator). On platforms which represent doubles with the precisions
specified by IEEE_754, this will be the case: if the representation of
doubles is more precise, it is possible that two very close doubles
will be mapped to the same string, so will compare equal.
Note also that both zero and -zero will be converted to the same
representation: since these compare equal, this satisfies the
comparison constraint, but it's worth knowing this if you wish to use
the encoding in some situation where this distinction matters.
Handling of NaN isn't (currently) guaranteed to be sensible.
- sortable_unserialise SERIALISED_NUMBER
-
Convert a string encoded using sortable_serialise back to a floating
point number.
This expects the input to be a string produced by sortable_serialise().
If the input is not such a string, the value returned is undefined (but
no error will be thrown).
The result of the conversion will be exactly the value which was
supplied to sortable_serialise() when making the string on platforms
which represent doubles with the precisions specified by IEEE_754, but
may be a different (nearby) value on other platforms.
- Error Handling
-
Error handling for all methods liable to generate them.
- Documentation
-
Add POD documentation for all classes, where possible just adapted from Xapian
docs.
- Unwrapped classes
-
The following Xapian classes are not yet wrapped:
ErrorHandler, standard ExpandDecider subclasses
(user-defined ones works),
user-defined weight classes.
- Unwrapped methods
-
The following methods are not yet wrapped:
Enquire::get_eset(...) with more than two arguments,
Query ctor optional ``parameter'' parameter,
Remote::open(...),
static Stem::get_available_languages().
We wrap MSet::swap() and MSet::operator[](), but not ESet::swap(),
ESet::operator[](). Is swap actually useful? Should we instead tie MSet
and ESet to allow them to just be used as lists?
Thanks to Tye McQueen <tye@metronet.com> for explaining the
finer points of how best to write XS frontends to C++ libraries, James
Aylett <james@tartarus.org> for clarifying the less obvious
aspects of the Xapian API, Tim Brody for patches wrapping ::QueryParser and
::Stopper and especially Olly Betts <olly@survex.com> for contributing
advice, bugfixes, and wrapper code for the more obscure classes.
Alex Bowley <kilinrax@cpan.org>
Please report any bugs/suggestions to <xapian-discuss@lists.xapian.org>
or use the Xapian bug tracker https://xapian.org/bugs. Please do
NOT use the CPAN bug tracker or mail any of the authors individually.
This program is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.
the Search::Xapian::BM25Weight manpage,
the Search::Xapian::BoolWeight manpage,
the Search::Xapian::Database manpage,
the Search::Xapian::Document manpage,
the Search::Xapian::Enquire manpage,
the Search::Xapian::MatchSpy manpage,
the Search::Xapian::MultiValueSorter manpage,
the Search::Xapian::PositionIterator manpage,
the Search::Xapian::PostingIterator manpage,
the Search::Xapian::QueryParser manpage,
the Search::Xapian::Stem manpage,
the Search::Xapian::TermGenerator manpage,
the Search::Xapian::TermIterator manpage,
the Search::Xapian::TradWeight manpage,
the Search::Xapian::ValueIterator manpage,
the Search::Xapian::Weight manpage,
the Search::Xapian::WritableDatabase manpage,
and
https://xapian.org/.
|