LiveSearch Manual

Short Intro

You can use Live Search for smaller and medium websites. A database is not needed. The website will be crawled from the defined baseurl. The links will be collected and the content cached - so future searches are faster. Almost found search results are stored in session files too to increase search speed.
The indexed content is saved inside the cache directory (one textfile for each URL)
The textparts with the found searchstrings are cropped in the search results so only the part with the searchstring is displayed (like displaying on Google)
Various options help you to handle Live Search. A Search Word Cloud is available too.
Keywords, Descriptions and Images can be searched for since V2.0 too.
Since V3.0 you're able to choose caching method (curl or allow_url_fopen) and you're able to search within PDF files.
Since V3.3 you can add additional external hosts if they are linked on your website and wished to include them into your search results.
Since V3.4 you can define logical correlations (AND/OR) inside the config for searching multiple words
Since V3.7 there's a Dashboard and some improvements were made
Since V3.9 you're able to add a XML Sitemap (protocol based on sitemaps.org) even if located on external domains as long as they are listed in the $additionalHosts-Array

Including of the livesearch files is only needed on the search results page.

Features

  • Easy to include and setup
  • Should work with every smaller and middle website
  • No need of any database
  • Pagination function
  • Caching of searchresults and content
  • Include unlinked files new since 1.2
  • Include external hosts (domains) new since 3.3
  • Define logical correlation between single words (AND, OR, as is)new since 3.4
  • Exclude paths and files from being searched for links new since 1.2
  • Exclude blocks of your Site from indexing by using comment tags new since 1.3
  • Exclude <script> blocks new since 2.0
  • Added performance switch if necessary new since 2.0
  • Search for META-description new since 2.0
  • Search for META-keywords new since 2.0
  • Search for images new since 2.0
  • Possibility to hide images from being indexed new since 2.0
  • Thumbnail-Generation on the Fly new since 2.0
  • Select method of website grabbing (curl or url_fopen) new since 3.0
  • Three different auto pagination styles
  • Possibility to search within pdf files new since 3.0
  • Possibility to add a XML sitemap url of any domain following XML-protocol standard as described in class
  • ...

Requirements

  • PHP 5.x
  • activated allow_url_fopen or enabled Curl and the webserver should be allowed to access your website
  • optional GD-Library for Thumbnail Generation
  • optional installed and available pdftotext binary and the site has to be hosted on a Linux server for PDF-indexing and searching

Installation&Usage

Searchform

Just post any form with your searchfield and use as form action i.e. search.php

Upload

upload ls-folder to your webproject - the ls-folder containing livesearch.class.php, livesearch.css, icon-pdf.png, cache

Including and initializing

Include livesearch.class.php and initialize Class on every page on which you want to use the Live Search functions, usually just the search results page
<?php
  include("ls/livesearch.class.php");
  $LiveSearch = new LiveSearch();
?>

Settings
... in the livesearch.class.php with sample settings and values

read this before ...

The livesearch.class.php, where you have to define your settings, is equipped with a lot (more than here) details to each variable you could set and more completed and in some cases more up2date and than these pages.
A small excerpt from the class file to show you what you can expect
//Baselink to search with trailing slash
// var $baseurl = "http://www.yoursite.com/";
// var $baseurl = "http://www.yoursite.com/aSubDirectory/";
var $baseurl = "http://ls.envato.homac.at/";
//Serverpath of your baseurl - including trailing slash (something like /users/mac/www/envato.homac.at/htdocs/demos/LiveSearch/) ... ONLY necessary for thumbnail creation
// var $basepath = "/users/envato/www/yoursite.com/htdocs/";
var $basepath = "/users/envato/www/livesearch.envato.homac.at/htdocs/";
//disable indexing of paths above basurl path
//makes only sense when you are using a subdirectory below your root of the website and only like to stay there
var $dontIndexUpperDirsOfBaseUrl = false;   //baseurl parent paths will be followed
// var $dontIndexUpperDirsOfBaseUrl = true;   //baseurl parent paths won't be followed
//absolute URL to you ls folder with trailing slash
// var $lsurl = "http://www.yoursite.com/ls/";
var $lsurl = "http://dev.ls.envato.homac.at/LS-3.9/ls/";
//search results page to prevent from heavy load waitings - within $baseurl
var $searchresultspage = "search.php";
//excluding files and directories from indexing under $basedir beginning with:
//this means the links on these pages won't be followed,
//works with pdf files too
//regular expressions can be used
// var $excl = array(".*-dates/",
//                   "secret.html",
//                   "hide",
//                   );
var $excl = false;
//including files and directories under $basedir with:
//don't forget trailing slashes for directories
//you can set it to false too
//these pages will be indexed, the links on it and its content
//works with pdf files too
// var $incl = array("hallo3.txt",
//                   );
var $incl = false;

The baselink from where the grabbing should start
var $baseurl = "http://www.homac.at/";

The absolute directory path on your webserver for your $baseurl ONLY needed for the GD-thumbnail generation
var $basepath = "/users/mac/www/www.homac.at/htdocs/";

Don't follow parent links above basurl path, makes only sense when you are using a subdirectory below your root of the website and only like to stay there
var $dontIndexUpperDirsOfBaseUrl = false;   //baseurl parent paths will be followed
// var $dontIndexUpperDirsOfBaseUrl = true;   //baseurl parent paths won't be followed

The URL to your ls directory ONLY needed to display the GD generated thumbnails
var $lsurl = "http://www.homac.at/ls/";

The name of your search results page to prevent from endless loops and pagination (within the $basedir path)
var $searchresultspage = "search.php";

Exclude paths or individual files under the $basedir from being checked for links, works with PDF files too
var $excl = array("dont_index",
  "hideme.html",
  "private/",
  "docs/invoice.pdf",
)

Include individual files under the $basedir which aren't linked anywhere on your site,
for example hallo.txt isn't linked anywhere on demo page, but content can be found, works with PDF files too
var $incl = array("hallo.txt",
  "data.pdf",
)

List of external hosts/domains
array() or array("List","of","domains")
Domains/Hosts of external linked pages or embedded external images
Just the host - no URLs no Protocols
Examples
var $additionalHosts = array("www.anywhereelse.com","flickr.com");

XML-Sitemap if you have a XML/RSS sitemap you can point the URL to your sitemap here, if no protocoll (http(s)) is given LiveSearch tries to build url realtive to $baseurl
the Sitemap has to be written in a valid format with absolute URLs containing at least
...
<loc>http://your.website.com/anyFile.html</loc>
...
OR
...
<link>http://your.website.com/anyFile.html</link>
...
and only URLs within your $baseurl host OR hosts in the $additionalHosts array will be followed
Examples:
var $xmlSitemap = "http://an.other.host.com/sitemap.xml";
var $xmlSitemap = "sitemap.xml";
var $xmlSitemap = "/sitemap.xml";
var $xmlSitemap = false;

Method of sitegrabbing
auto, curl or url_fopen - if you're using auto curl will be tried before url_fopen will be adducted
var $method = "auto"; # auto, curl or url_fopen

Extensions for grabbing links
var $checkext = array("htm","html","php","txt");

save URLs to each cached file into seperate files next to content file (1.txt → 1.url, 2.txt → 2.url) for debugging purposes (true/false)
var $saveURLfiles = false;

If you like to search within PDF files set this variable true (mind the requirements)
var $collect_pdfs = true;

the extensions of pdf files (usually it's just pdf :) )
var $pdfext = array("pdf");

Hours between autocaching processes
-1 ... for caching every search process could be okay for smaller dynamic content sites)
0 ... to disable autocaching
X ... autocache at least every X hours, can be float (0.25 = 15 minutes) or integer (2 = 2 hours
var $cachetime = 12;    //-1 for cache every time when searching, 0 when caching should not happen

Results per page, if more results are found you can use the pagination function
var $srch_res_per_page = 15;

logical combination if searching for multiple words at once
OR ... splits to words and shows results containing ANY of the words
AND ... splits to words and shows results containing ALL of the words
false ... doesn't split and shows results for the WHOLE string as it is
$srch_logic = "OR";

Bootstrap 2.x and Bootstrap 3.x pagination styles
currently there are to styles avail (default, boxed, bootstrap, bootstrap3) - will be just used in the draw...methods
bootstrap uses Bootstrap 2.3.2 CSS styles
bootstrap3 uses Bootstrap 3.1.1 CSS styles
var $pagerstyle = "boxed";

Min and max fontsize for the SearchCloud (px)
var $cloud_min = 10;
var $cloud_max = 45;

Maximum number of items in the Search Cloud
var $maxCloudItems = 50;

Errormessages for query string length or no results (only if you use the drawSearchResults method)
var $errorToShort = '<div class="alert alert-error">You have to enter at least %1$s characters.</div>';
var $errorNothingFound = '<div class="alert alert-info">No search results for %1$s.</div>;

If you're running into performance-troubles on greater websites (timeout during caching, memory exhaustion ...) you should set this value to true, otherwise leave it false
var $performance_fix = false;

If you like to search for images too (filename, alt-tag, title-tag) set this variable true
var $collect_images = true;

The headline for your search results
var $img_results_headline = '%1$s Images for %2$s';  //Number of images, Searchstring

The headline for your search result
var $img_result_headline = '1 Image for %1$s';  //Searchstring

The height of GD generated images - !!! in the livesearch.css there're are height definitions ltoo for CSS-thumbs !!!
var $thumb_height = 70;

If true the thmubnails will be genereated automatically with the help of the GD-Library, otherwise the images will be sized by CSS
var $create_thumbs = true;

UTF decoding for searching - if needed (true/false) - enabled by default
var $utf8DecodeResults = true;

Cache directory - have to be writeable, in the example below the path to cache directory will be calculated automatically relative to livesearch.class.php
$this->cachedir = realpath(dirname(__FILE__)) . "/cache";

Functions&Methods

Cache/Index Files
Just caching the files without searching - this action could take a while an will be called automatically while search process if no files are cached or the age of the cached files is older than the defined $cachetime
$LiveSearch->cacheFiles();
Search
necessary to initiate the search, if no files are cached or the age of the cached files is older than the defined the cacheFiles function will be called by the search function too
$LiveSearch->search($_REQUEST["q"],$_REQUEST["p"]);

or, if you like to design the results by yourself (you will get an array)
$searchresults = $LiveSearch->search($_REQUEST["q"],$_REQUEST["p"]);

$_REQUEST["q"]
The value of the search, the searchstring
$_REQUEST["p"]
The current page, needed for the pagination
You will receive an array with the following keys for assigning the return values function to a variable (last code snippet)

url
The absolute URL incl. protocol http(s)
host
The hostname of the URL, reduced by any leading www, uefull when idexing different sites
title
The pagetitle
content
The snippet where the the searchstring is embedded
isPDF
0/1 - indicates if the result is a PDF 1 or not 0
If image search is enabled you will receive an array with the images too, the url is set to an #, the title to the $img_results_headline and the content contains img tags of the found images, an example:
Array
(
  [0] => Array
    (
      [0] => Array
        (
            [src] => http://ls.envato.homac.at/images/gravatar.jpg
            [title] => 
            [alt] => image
            [parenturl] => http://ls.envato.homac.at/index.php
            [GDThumb] => d9b023be3750db3cfbdcc72f0e71cc65.jpg
        )
	      [title] => 1 Images for avatar
      [url] => #
      [content] => <a href="http://ls.envato.homac.at/images/gravatar.jpg"><img src="http://ls.envato.homac.at/ls/cache/thumbs/d9b023be3750db3cfbdcc72f0e71cc65.jpg" alt="image" title=""></a>
  
    )
  
  [1] => Array
    (
      [url] => http://ls.envato.homac.at/help.php
      [title] => LiveSearch - How it works
      [content] => ... Links (i.e. &amp;action=search) - don't forget the leading &amp; Current cloud for this website: <strong class="highlight">avatar</strong> search easter image super firefox keywords duper <strong class="highlight">avatar</strong>search wise Excluding blocks from being indexed since V 1.3 you're able to exclude/hide blocks...
    )
)
Available variables
After a successfull search you have access to some variables $LiveSearch->searchcount ... Total number of searchresults
$LiveSearch->p ... Current Page
$LiveSearch->pages ... Total pages
...
Clear Cache
Function to delete all cached files. Note: cached files will be deleted automatically if they are too old ($cachetime exceeded) or on every other caching process (-1)
$LiveSearch->clearCache();
Clear Search Results
Function to remove stored search results
$LiveSearch->clearSrch();
Clear Stored Search Counts
Function to remove stored searchstrings (used by the Search Word Cloud)
$LiveSearch->clearSrchStr();
Pager
Returns paging information after search was successfull and results are more than the defined $srch_res_per_page, with these information you could build your own pagination
$LiveSearch->pager();

Returns array with the following keys:

current
Current page
total
Total pages
Pagination Example

Returns an example output for the pagination if results are more than the defined $srch_res_per_page and will be called in the $LiveSearch->drawSearchresults() method too.

$LiveSearch->drawPagination();
or
$LiveSearch->drawPagination("p","q");
or
$LiveSearch->drawPagination("p","q","&amp;action=search");
or
$LiveSearch->drawPagination("p","q","&amp;action=search","boxed");

Syntax:
$LiveSearch->drawPagination([PageVarName], [SearchStringVarName], [Add2Query], [PagerStyle]);
The Parameters
[PageVarName]
The name of the page variable, default p
[SearchStringVarName]
The name of the search string, default q
[Add2Query]
If needed you could add some Variables to the pagination Links (i.e. &action=search) - don't forget the leading &, default false
[PagerStyle]
If you like to change the display style of the pagination use default or boxed, default default
Searchresults Example
An example output for the search results, including the pagination from above
$LiveSearch->drawSearchresults();
or
$LiveSearch->drawSearchresults("p","q");
or
$LiveSearch->drawSearchresults("p","q","&amp;action=search");

Syntax:
$LiveSearch->drawSearchresults([PageVarName], [SearchStringVarName], [Add2Query]);
The Parameters
[PageVarName]
The name of the page variable, default p
[SearchStringVarName]
The name of the search string, default q
[Add2Query]
If needed you could add some Variables to the pagination Links (i.e. &action=search) - don't forget the leading &
Show collected URLs
Shows you the collected and cached Urls
$LiveSearch->showUrls();

an example:
Array
	(
	    [0] => http://livesearch.dev.homac.at/LS-3.9/index.php
	    [1] => http://livesearch.dev.homac.at/LS-3.9/howitworks.php
	    [2] => http://livesearch.dev.homac.at/LS-3.9/contact.php
	    [3] => http://livesearch.dev.homac.at/LS-3.9/
	    [4] => http://livesearch.dev.homac.at/LS-3.9/gallery.php
	    [5] => http://livesearch.dev.homac.at/LS-3.9/downloads.php
	    [6] => http://livesearch.dev.homac.at/LS-3.9/sample1.php
	    [7] => http://livesearch.dev.homac.at/LS-3.9/sample2.php
	    [8] => http://livesearch.dev.homac.at/LS-3.9/sample3.php
	    [9] => http://codecanyon.net/item/live-search-searchengine-for-your-website/86875
	    [10] => http://livesearch.dev.homac.at/LS-3.8/
	    [11] => http://livesearch.dev.homac.at/LS-3.9/envato.pdf
	    [12] => http://livesearch.dev.homac.at/LS-3.9/hallo.txt
	    [13] => http://www.homac.at/impressum.html
	    [14] => http://livesearch.dev.homac.at/LS-3.8/index.php
	    [15] => http://livesearch.dev.homac.at/LS-3.8/howitworks.php
	    [16] => http://livesearch.dev.homac.at/LS-3.8/contact.php
	    [17] => http://livesearch.dev.homac.at/LS-3.8/search.php
	    [18] => http://livesearch.dev.homac.at/LS-3.8/gallery.php
	    [19] => http://livesearch.dev.homac.at/LS-3.8/downloads.php
	    [20] => http://livesearch.dev.homac.at/LS-3.8/sample1.php
	    [21] => http://livesearch.dev.homac.at/LS-3.8/sample2.php
	    [22] => http://livesearch.dev.homac.at/LS-3.8/sample3.php
	    [23] => http://livesearch.dev.homac.at/LS-3.8/envato.pdf
	    [24] => http://livesearch.dev.homac.at/LS-3.8/hallo.txt
	)
													
Search Cloud
Shows you the Search Word Cloud
$LiveSearch->printSrchCloud();
or
$LiveSearch->printSrchCloud("q");
or
$LiveSearch->printSrchCloud("q","&amp;action=search");

Syntax
$LiveSearch->printSrchCloud([SearchStringVarName], [Add2Query]);
The Parameters
[SearchStringVarName]
The name of the search string, default q
[Add2Query]
If needed you could add some Variables to the pagination Links (i.e. &action=search) - don't forget the leading &

Exclude Blocks&Images

Excluding blocks from being indexed
since LS V1.3 you're able to exclude/hide blocks from your website from LiveSearch by setting simple comment tags. This makes sense for menues, footers, advertisments, searchclouds, ... on every page

Start hiding
<!--LSHIDE-->
Stop hiding
<!--/LSHIDE-->

Example #1:
Some words, can be found but <!--LSHIDE-->this combination can't be<!--/LSHIDE--> found

Example #2:
blabla
  <!--LSHIDE-->
  Mainmenue #1
  Mainmenue #2 Mainmenue #3
  <!--/LSHIDE-->
  some text ...
  <!--LSHIDE-->
  Submenue #1
  Submenue #2
  <!--/LSHIDE-->
blabla..

Excluding images being indexed such as icons, logos ...
additionaly to the LSHIDE-blocks you can hide images from being indexed since LS V2.0 by setting a class called LSHIDE to your images
These are some sample usage codes
<img src='images/icons/contact.gif' alt='contact' class='icon LSHIDE' /> <!--won't be indexed-->
<img src='images/space.png' alt='' class='lshide' /> <!--won't be indexed-->
<img src="images/portfolio/homac.jpg" alt="homac" title="Homac e.U." class="float-left p5" /> <!--will be indexed-->
<img src="images/portfolio/envato.jpg" alt="" /> <!--will be indexed-->

LiveSearch Manager

With version 3.7 this script was equipped with an small administration interface which shows you some status information and allows you to use some of the methods directly inside this interface.
To access the LiveSearch Manager point your browser to lsMngr.php of your ls directory on the webserver
example: http://www.mysite.com/ls/lsMngr.php
The user credentials can (have to) be set in the class-File directly
var $mngrUser = "admin";
var $mngrPass = ""; //please choose your password

Examples

Just the Formular

<form method="post" action="search.php">
  <input type="text" name="q">
  <input type="submit">
</form>

The Searchresults

<?php
  include("ls/livesearch.class.php");
  $LiveSearch = new LiveSearch();
?>
  ...
  <?php
  $LiveSearch->search($_REQUEST["q"],$_REQUEST["p"]);
  echo "<p>" . $LiveSearch->drawSearchresults() . "</p>";
?>
..
or
<?php
  include("ls/livesearch.class.php");
  $LiveSearch = new LiveSearch();
?>
  ...
  <?php
  $search_results = $LiveSearch->search($_REQUEST["q"],$_REQUEST["p"]);
  echo "Found: " . $LiveSearch->searchcount;
  echo "Pages: " . $LiveSearch->pages;
  echo "Current Page: " . $LiveSearch->p;
  echo "<pre><b>Search Results</b><code>" .
  print_r($search_results,true) . "</code><pre>";
?>
..