The main enhancements of this release are: IPv6 routing of Peer-to-Peer network elements and overall IPv6 enhancements, added self stats logging (showing the number of peers in the network over time), enhancement of embedded http server (less loading by browser using expires dates), document date detection, document snapshots (an optional document archive created by the crawler), partial updates to solr during postprocessing and more postprocessing enhancements, new UPNP classes, updates to many external libraries such as pdf parser and jetty. Furthermore, there is now a release notes web pages which shows all the details at http://yacy.net/release_notes/
Commit | Description |
---|---|
Mon Jan 19 03:30:35 CET 2015 by reger | refactor opensearch heuristic introduce FederateSearchManager handling search heuristic to external systems via specific FederateSearchConnectors, which provide the query() functionallity, the translation to YaCy schema .toYaCySchema() and the search() routine to deliver results to searchevents, which is generally implemented in Abstract connector. The manager enforces now a min 15s delay between calls to external systems. Besides the OpensearchConnector a SolrFederateSearchConnector is available. It uses a additional config file for fieldname translation. default heuristicopensearch.conf: - openbdb.com removed - seems not longer to deliver results - config via solrconnector to datacite.org added (large technical library archive) Changed Files: defaults/federatecfg/datacite.solr.schema, defaults/heuristicopensearch.conf, htroot/ConfigHeuristics_p.java, htroot/ConfigNetwork_p.java, htroot/yacysearch.java, source/net/yacy/cora/federate/AbstractFederateSearchConnector.java, source/net/yacy/cora/federate/FederateSearchConnector.java, source/net/yacy/cora/federate/FederateSearchManager.java, source/net/yacy/cora/federate/SolrFederateSearchConnector.java, source/net/yacy/cora/federate/opensearch/OpenSearchConnector.java, source/net/yacy/kelondro/data/meta/URIMetadataNode.java, source/net/yacy/search/Switchboard.java |
Sun Jan 04 18:47:47 CET 2015 by sixcooler | bump to Solr-/Lucene-4.10.3 Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, defaults/solr/solrconfig.xml, lib/lucene-analyzers-common-4.10.3.jar, lib/lucene-analyzers-phonetic-4.10.3.jar, lib/lucene-classification-4.10.3.jar, lib/lucene-codecs-4.10.3.jar, lib/lucene-core-4.10.3.jar, lib/lucene-facet-4.10.3.jar, lib/lucene-grouping-4.10.3.jar, lib/lucene-highlighter-4.10.3.jar, lib/lucene-join-4.10.3.jar, lib/lucene-memory-4.10.3.jar, lib/lucene-misc-4.10.3.jar, lib/lucene-queries-4.10.3.jar, lib/lucene-queryparser-4.10.3.jar, lib/lucene-spatial-4.10.3.jar, lib/lucene-suggest-4.10.3.jar, lib/solr-core-4.10.3.jar, lib/solr-solrj-4.10.3.jar, nbproject/project.xml, pom.xml |
Sun Jan 04 11:10:45 CET 2015 by reger | Added a ?don't store remote search results? option This is intended for peers who want to participate in the P2P network but don't wish to load/fill-up their index with metadata of every received search result. The DHT transfer is not effected by this option (and will work as usual, so that a peer disabling the new store to index switch still receives and holds the metadata according to DHT rules). Downside for the local peer is that search speed will not improve if search terms are only avail. remote or by quick hits in local index. To be able to improve the local index a Click-Servlet option was added additionally. If switched on, all search result links point to this servlet, which forwards the users browser (by html header) to the desired page and feeds the page to the fulltext-index. The servlet accepts a parameter defining the action to perform (see defaults/web.xml, index, crawl, crawllinks) The option check-boxes are placed in ConfigPortal.html Changed Files: defaults/web.xml, defaults/yacy.init, htroot/ConfigPortal.html, htroot/ConfigPortal.java, htroot/yacysearchitem.java, source/net/yacy/http/servlets/ClickServlet.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/search/query/SearchEvent.java, source/net/yacy/search/query/SearchEventCache.java |
Sun Dec 21 18:10:15 CET 2014 by Michael Peter Christen | added experimental pdf splitting which enables YaCy to split pdfs during parsing into individual pages and add them all using different URLs. These constructed urls are generated from the source url with an appended page=<pagenumber> attribute to the url get/post properties. This will distinguish the different page entries. The search result list will then replace the post parameter with a url anchor # mark which causes that the original url is presented in the search result. These URLs can be opened directly on the correct page using pdf.js which is now built-in into firefox. That means: if you find a search hit on page 5 and click on the search result, firefox will open the pdf viewer and shows page 5. Changed Files: source/net/yacy/crawler/retrieval/SitemapImporter.java, source/net/yacy/document/parser/pdfParser.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/search/snippet/ResultEntry.java |
Fri Dec 19 21:54:17 CET 2014 by reger | update to Jetty 9.2.6 Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/jetty-9.2.6.v20141205.License, lib/jetty-client-9.2.6.v20141205.jar, lib/jetty-continuation-9.2.6.v20141205.jar, lib/jetty-deploy-9.2.6.v20141205.jar, lib/jetty-http-9.2.6.v20141205.jar, lib/jetty-io-9.2.6.v20141205.jar, lib/jetty-jmx-9.2.6.v20141205.jar, lib/jetty-proxy-9.2.6.v20141205.jar, lib/jetty-security-9.2.6.v20141205.jar, lib/jetty-server-9.2.6.v20141205.jar, lib/jetty-servlet-9.2.6.v20141205.jar, lib/jetty-servlets-9.2.6.v20141205.jar, lib/jetty-util-9.2.6.v20141205.jar, lib/jetty-webapp-9.2.6.v20141205.jar, lib/jetty-xml-9.2.6.v20141205.jar, nbproject/project.xml |
Fri Dec 19 02:54:38 CET 2014 by reger | update to PDFBox 1.8.8 Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/fontbox-1.8.8.License, lib/fontbox-1.8.8.jar, lib/jempbox-1.8.8.License, lib/jempbox-1.8.8.jar, lib/pdfbox-1.8.8.License, lib/pdfbox-1.8.8.jar, nbproject/project.xml, pom.xml |
Mon Dec 15 23:32:46 CET 2014 by Michael Peter Christen | Added a transaction interface to the snapshots: all documents in the snapshots can now be processed with transactions using commit and rollback commands. Furthermore, a large number of monitoring methods had been added to check the success of transactions. The transactions for snapshots have two main components: a rss search API to get information about latest/oldest entries and a commit/rollback API to move entries away from the rss results. This is done by usage of two storage locations for the snapshots, INVENTORY and ARCHIVE. New snapshots are placed to INVENTORY, commited snapshots move to ARCHIVE, rollback snapshots move to INVENTORY again. Normal Workflow: Beside all these options below, usually it is sufficient to process data like this: - call http://localhost:8090/api/snapshot.rss?state=INVENTORY&order=LATESTFIRST - process the rss result and use the <guid> value as <urlhash> (see next command) - for each processed result call http://localhost:8090/api/snapshot.json?command=commit&urlhash=<urlhash> - then you can call the rss feed again and the commited urls are omited from the next set of items. These are the commands to control this: The rss feed: http://localhost:8090/api/snapshot.rss?state=INVENTORY&order=LATESTFIRST http://localhost:8090/api/snapshot.rss?state=INVENTORY&order=OLDESTFIRST http://localhost:8090/api/snapshot.rss?state=INVENTORY&order=ANY http://localhost:8090/api/snapshot.rss?state=ARCHIVE&order=LATESTFIRST http://localhost:8090/api/snapshot.rss?state=ARCHIVE&order=OLDESTFIRST http://localhost:8090/api/snapshot.rss?state=ARCHIVE&order=LATESTFIRST The feed will return a <urlhash> in the <guid> - field of the rss. This must be used for commit/rollback: Commit/Rollback: http://localhost:8090/api/snapshot.json?command=commit&urlhash=<urlhash> http://localhost:8090/api/snapshot.json?command=rollback&urlhash=<urlhash> The json will return a property list containing the property "result" with possible values "success" or "fail", according of the result. If an "fail" occurs, please look into the log for further info. Monitoring: http://localhost:8090/api/snapshot.json?command=status This shows the total number of entries in the INVENTORY and the ARCHIVE http://localhost:8090/api/snapshot.json?command=list This will result a list of all hosts which have snapshots and the number of entries for the hosts. Counts for INVENTORY and ARCHIVE are listed in the porperties for "count.INVENTORY" and "count.ARCHIVE" http://localhost:8090/api/snapshot.json?command=list&depth=2 The list can be restricted to such which have a specific depth. The list contains then the same host names, but the count values change because only documents at that specific crawl depth are listed http://localhost:8090/api/snapshot.json?command=list&host=yacy.net.80 This lists all urlhashes for the given host, not only an accumulated list of the number of entries http://localhost:8090/api/snapshot.json?command=list&host=yacy.net.80&depth=0 This restricts the list of urlhashes for that host for the given depth http://localhost:8090/api/snapshot.json?command=list&state=INVENTORY http://localhost:8090/api/snapshot.json?command=list&state=ARCHIVE This selects either the INVENTORY or ARCHIVE for all list commands, default is ALL which means that from both snapshot directories the host information is collected and combined. You can use the state option for all the commands as listed above Detailed Information: http://localhost:8090/api/snapshot.json?command=metadata&urlhash=upiFJ7Fh1hyQ This collects metadata information for the given urlhash. This can also be restricted with state=INVENTORY and state=ARCHIVE to test if the document is either in one of these snapshot directories. If an urlhash is not found, an empty result is returned. If an entry was found and the state was not restricted, then the result contains a state property containing the name of the location where the document is, either INVENTORY or ARCHIVE. Hint: If a very large number of documents is inside of INVENTORY, then it could be better to call the rss feed with http://localhost:8090/api/snapshot.rss?state=INVENTORY&order=ANY because that is very efficient. Changed Files: htroot/api/snapshot.java, source/net/yacy/crawler/data/Snapshots.java, source/net/yacy/crawler/data/Transactions.java |
Mon Dec 15 20:45:05 CET 2014 by reger | update to metadata-extractor-2.7.0.jar add 2 simple JUnit test cases for jpeg and tif parsing Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/metadata-extractor-2.7.0.License, lib/metadata-extractor-2.7.0.jar, lib/xmpcore-5.1.2.jar, nbproject/project.xml, pom.xml, source/net/yacy/document/parser/images/genericImageParser.java, source/net/yacy/document/parser/images/metadataImageParser.java, test/net/yacy/document/parser/images/genericImageParserTest.java, test/net/yacy/document/parser/images/metadataImageParserTest.java, test/parsertest/YaCyLogo_120ppi.jpg, test/parsertest/YaCyLogo_120ppi.tif |
Sun Dec 14 13:40:45 CET 2014 by Michael Peter Christen | Added and integrated new date detection class which can identify date notions within the fulltext of a document. This class attempts to identify also dates given abbreviated or with missing year or described with names for special days, like 'Halloween'. In case that a date has no year given, the current year and following years are considered. This process is therefore able to identify a large set of dates to a document, either because there are several dates given in the document or the date is ambiguous. Four new Solr fields are used to store the parsing result: dates_in_content_sxt: if date expressions can be found in the content, these dates are listed here in order of the appearances dates_in_content_count_i: the number of entries in dates_in_content_sxt date_in_content_min_dt: if dates_in_content_sxt is filled, this contains the oldest date from the list of available dates #date_in_content_max_dt: if dates_in_content_sxt is filled, this contains the youngest date from the list of available dates, that may also be possibly in the future These fields are deactiviated by default because the evaluation of regular expressions to detect the date is yet too CPU intensive. Maybe future enhancements will cause that this is switched on by default. The purpose of these fields is the creation of calendar-like search facets, to be implemented next. Changed Files: defaults/solr.collection.schema, source/net/yacy/cora/date/GenericFormatter.java, source/net/yacy/crawler/data/Transactions.java, source/net/yacy/data/ymark/YMarkAutoTagger.java, source/net/yacy/document/Condenser.java, source/net/yacy/document/content/SurrogateReader.java, source/net/yacy/document/parser/torrentParser.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/DocumentIndex.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java |
Tue Dec 09 16:20:34 CET 2014 by Michael Peter Christen | enhanced the snapshot functionality: - snapshots can now also be xml files which are extracted from the solr index and stored as individual xml files in the snapshot directory along the pdf and jpg images - a transaction layer was placed above of the snapshot directory to distinguish snapshots into 'inventory' and 'archive'. This may be used to do transactions of index fragments using archived solr search results between peers. This is currently unfinished, we need a protocol to move snapshots from inventory to archive - the SNAPSHOT directory was renamed to snapshot and contains now two snapshot subdirectories: inventory and archive - snapshots may now be generated by everyone, not only such peers running on a server with xkhtml2pdf installed. The expert crawl starts provides the option for snapshots to everyone. PDF snapshots are now optional and the option is only shown if xkhtml2pdf is installed. - the snapshot api now provides the request for historised xml files, i.e. call: http://localhost:8090/api/snapshot.xml?urlhash=Q3dQopFh1hyQ The result of such xml files is identical with solr search results with only one hit. The pdf generation has been moved from the http loading process to the solr document storage process. This may slow down the process a lot and a different version of the process may be needed. Changed Files: htroot/CrawlStartExpert.html, htroot/CrawlStartExpert.java, htroot/Crawler_p.java, htroot/QuickCrawlLink_p.java, htroot/api/snapshot.java, source/net/yacy/cora/federate/solr/responsewriter/EnhancedXMLResponseWriter.java, source/net/yacy/cora/util/Html2Image.java, source/net/yacy/crawler/CrawlSwitchboard.java, source/net/yacy/crawler/data/CrawlProfile.java, source/net/yacy/crawler/data/Snapshots.java, source/net/yacy/crawler/data/Transactions.java, source/net/yacy/data/ymark/YMarkCrawlStart.java, source/net/yacy/document/parser/html/TransformerWriter.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/DocumentIndex.java, source/net/yacy/search/index/Segment.java |
Fri Dec 05 01:13:37 CET 2014 by reger | ViewFile servlet: update index if newer, so viewed text and metadata (stored) info is similar - to archive it, use request with profile to allow indexing (defaultglobaltext) and update index (the resource is loaded, parsed anyway, so it's not a expensive operation) Request: remove 2 unused init parameter - number of anchors of the parent - forkfactor sum of anchors of all ancestors Changed Files: htroot/HostBrowser.java, htroot/QuickCrawlLink_p.java, htroot/ViewFile.java, htroot/api/push_p.java, htroot/rct_p.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/crawler/retrieval/Request.java, source/net/yacy/crawler/retrieval/SitemapImporter.java, source/net/yacy/data/ymark/YMarkCrawlStart.java, source/net/yacy/http/ProxyCacheHandler.java, source/net/yacy/http/ProxyHandler.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/search/Switchboard.java, source/net/yacy/server/http/HTTPDProxyHandler.java |
Wed Dec 03 11:45:48 CET 2014 by Michael Peter Christen | added a servlet which can create preview images, preview tumbnails and preview pdfs from web pages, i.e.: http://localhost:8090/api/snapshot.png?url=http://yacy.net/en/&width=128&height=128 http://localhost:8090/api/snapshot.jpg?url=http://yacy.net/en/&width=128&height=128 http://localhost:8090/api/snapshot.pdf?url=http://yacy.net/en/ This supports also an on-the-fly generation of the preview documents if the user is an administrator. Otherwise, the servlet fails. To enable this, you must add wkhtmltopdf, imagemagick and (on headless servers) xvfb to your operation system. for detailed instructions, see https://gitorious.org/yacy/rc1/commit/97f6089a41a4ed40aef84f692690e30f50585f5d Changed Files: htroot/api/snapshot.java, source/net/yacy/cora/util/Html2Image.java, source/net/yacy/crawler/data/Snapshots.java, source/net/yacy/http/servlets/YaCyDefaultServlet.java |
Tue Dec 02 16:26:07 CET 2014 by Michael Peter Christen | Replaced all fixed thread pools with cached thread pools. The cached thread pools will flush their cached (dead) threads after 60 seconds. This will cause that YaCy now runs constantly withl about 50 threads, about 100 at peak times. Previously, about 400 threads had been cached and kept in a hibernation state, which caused that the numproc counter in /proc/user_beancounters (exists only in VM-hosted linux) was as high as the cached number of threads. This caused that VM supervisors terminated whole VM sessions if a limit was reached. Many VM providers have limits of numproc=96 which made it virtually impossible to run YaCy on such machines. With this change, it will be possible to run many YaCy instances even on VM hosts. Changed Files: source/net/yacy/cora/protocol/Domains.java, source/net/yacy/cora/protocol/http/HTTPClient.java, source/net/yacy/document/importer/MediawikiImporter.java, source/net/yacy/kelondro/blob/ArrayStack.java |
Mon Dec 01 15:03:09 CET 2014 by Michael Peter Christen | YaCy can now create web page snapshots as pdf documents which can later be transcoded into jpg for image previews. To create such pdfs you must do: Add wkhtmltopdf and imagemagick to your OS, which you can do: On a Mac download wkhtmltox-0.12.1_osx-cocoa-x86-64.pkg from http://wkhtmltopdf.org/downloads.html and downloadh ttp://cactuslab.com/imagemagick/assets/ImageMagick-6.8.9-9.pkg.zip In Debian do "apt-get install wkhtmltopdf imagemagick" Then check in /Settings_p.html?page=ProxyAccess: "Transparent Proxy" and "Always Fresh" - this is used by wkhtmltopdf to fetch web pages using the YaCy proxy. Using "Always Fresh" it is possible to get all pages from the proxy cache. Finally, you will see a new option when starting an expert web crawl. You can set a maximum depth for crawling which should cause a pdf generation. The resulting pdfs are then available in DATA/HTCACHE/SNAPSHOTS/<host>.<port>/<depth>/<shard>/<urlhash>.<date>.pdf Changed Files: htroot/CrawlStartExpert.html, htroot/CrawlStartExpert.java, htroot/Crawler_p.java, htroot/QuickCrawlLink_p.java, source/net/yacy/cora/util/Html2Image.java, source/net/yacy/crawler/CrawlSwitchboard.java, source/net/yacy/crawler/data/CrawlProfile.java, source/net/yacy/crawler/data/Snapshots.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/data/ymark/YMarkCrawlStart.java, source/net/yacy/http/ProxyCacheHandler.java, source/net/yacy/search/Switchboard.java |
Sat Nov 29 11:56:32 CET 2014 by Michael Peter Christen | added new web page snapshot infrastructure which will lead to the ability to have web page previews in the search results. (This is a stub, no function available with this yet...) Changed Files: htroot/Crawler_p.java, htroot/QuickCrawlLink_p.java, source/net/yacy/crawler/CrawlSwitchboard.java, source/net/yacy/crawler/data/CrawlProfile.java, source/net/yacy/crawler/data/Snapshots.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/data/ymark/YMarkCrawlStart.java, source/net/yacy/kelondro/index/BufferedObjectIndex.java, source/net/yacy/kelondro/util/OS.java, source/net/yacy/search/Switchboard.java |
Fri Nov 28 20:24:39 CET 2014 by reger | update to Jetty 9.2.4 Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/jetty-9.2.4.v20141103.License, lib/jetty-client-9.2.4.v20141103.jar, lib/jetty-continuation-9.2.4.v20141103.jar, lib/jetty-deploy-9.2.4.v20141103.jar, lib/jetty-http-9.2.4.v20141103.jar, lib/jetty-io-9.2.4.v20141103.jar, lib/jetty-jmx-9.2.4.v20141103.jar, lib/jetty-proxy-9.2.4.v20141103.jar, lib/jetty-security-9.2.4.v20141103.jar, lib/jetty-server-9.2.4.v20141103.jar, lib/jetty-servlet-9.2.4.v20141103.jar, lib/jetty-servlets-9.2.4.v20141103.jar, lib/jetty-util-9.2.4.v20141103.jar, lib/jetty-webapp-9.2.4.v20141103.jar, lib/jetty-xml-9.2.4.v20141103.jar, nbproject/project.xml, pom.xml |
Mon Nov 24 20:28:52 CET 2014 by Michael Peter Christen | added option to make the YaCy proxy act as the cache is never stale. If set to 'Always Fresh' the cache is always used if the entry in the cache exist. This is a good way to archive web content and access it without going online again in case the documents exist. To do so, open /Settings_p.html?page=ProxyAccess and check the "Always Fresh" checkbox. This is set do false which behave as set before. If you set this to true, then you have your web archive in DATA/HTCACHE. Copy this to carry around your private copy of the internet! Changed Files: defaults/yacy.init, htroot/SettingsAck_p.html, htroot/SettingsAck_p.java, htroot/Settings_ProxyAccess.inc, htroot/Settings_p.java, source/net/yacy/crawler/retrieval/Response.java |
Wed Nov 19 17:36:56 CET 2014 by Michael Peter Christen | added loading of the synonyms file from addon/synonyms into the knowledge loader Changed Files: htroot/DictionaryLoader_p.html, htroot/DictionaryLoader_p.java, source/net/yacy/cora/language/synonyms/SynonymLibrary.java, source/net/yacy/data/ymark/YMarkAutoTagger.java, source/net/yacy/document/Condenser.java, source/net/yacy/document/LibraryProvider.java, source/net/yacy/document/parser/torrentParser.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/DocumentIndex.java, source/net/yacy/search/index/Segment.java |
Thu Nov 13 00:58:58 CET 2014 by Michael Peter Christen | added new 'firstSeen' database table and necessary data structures which hold a date for each URL to record when a url was first seen. This is then used to overwrite the modification date for urls upon recrawl in case that the first-seen date is before the latest document date. This behaviour is necessary due to the common behaviour of content management systems which attach always the current date to all documents. Using the firstSeen database it is possible to approximate a real first document creation date in case that the crawler starts frequently for the same domain. As a result the search results ordered by date have a much better quality and the usage of YaCy as search agent for latest news has a better quality. Changed Files: htroot/IndexControlURLs_p.html, htroot/IndexControlURLs_p.java, htroot/ViewFile.html, htroot/ViewFile.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Sun Nov 09 23:06:36 CET 2014 by sixcooler | update to httpclient-4.3.6 Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/dependencies.txt, lib/httpclient-4.3.6.License, lib/httpclient-4.3.6.jar, lib/httpmime-4.3.6.License, lib/httpmime-4.3.6.jar, nbproject/project.xml, pom.xml |
Fri Nov 07 18:51:31 CET 2014 by sixcooler | update to solr-/lucene-4.10.2 Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, defaults/solr/solrconfig.xml, lib/commons-codec-1.9.License, lib/commons-codec-1.9.jar, lib/lucene-analyzers-common-4.10.2.jar, lib/lucene-analyzers-phonetic-4.10.2.jar, lib/lucene-classification-4.10.2.jar, lib/lucene-codecs-4.10.2.jar, lib/lucene-core-4.10.2.jar, lib/lucene-facet-4.10.2.jar, lib/lucene-grouping-4.10.2.jar, lib/lucene-highlighter-4.10.2.jar, lib/lucene-join-4.10.2.jar, lib/lucene-memory-4.10.2.jar, lib/lucene-misc-4.10.2.jar, lib/lucene-queries-4.10.2.jar, lib/lucene-queryparser-4.10.2.jar, lib/lucene-spatial-4.10.2.jar, lib/lucene-suggest-4.10.2.jar, lib/lucene.License, lib/solr-core-4.10.2.jar, lib/solr-solrj-4.10.2.jar, lib/solr.License, pom.xml, source/net/yacy/search/index/Fulltext.java |
Fri Oct 24 15:04:40 CEST 2014 by Michael Peter Christen | fix for exact_signature_unique_b, exact_signature_copycount_i, fuzzy_signature_unique_b and fuzzy_signature_copycount_i: apply same criteria for 'valid document' as for title and description uniqueness test. Changed Files: source/net/yacy/cora/federate/solr/logic/AbstractOperations.java, source/net/yacy/cora/federate/solr/logic/BooleanLiteral.java, source/net/yacy/cora/federate/solr/logic/CatchallLiteral.java, source/net/yacy/cora/federate/solr/logic/Conjunction.java, source/net/yacy/cora/federate/solr/logic/Disjunction.java, source/net/yacy/cora/federate/solr/logic/Literal.java, source/net/yacy/cora/federate/solr/logic/LongLiteral.java, source/net/yacy/cora/federate/solr/logic/StringLiteral.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Tue Oct 14 12:19:59 CEST 2014 by Michael Peter Christen | added partial updates to solr during postprocessing: during postprocessing the solr documents are now not completely retrieved. instead, only fiels, needed for the postprocessing are extracted. When Solr document are written, this is done using partial updates. This increases postprocessing speed by about 50% for embedded Solr configurations. For external Solr configurations the enhancement should be much higher because the postprocessing with remote Solr is very slow. When doing partial updates to a remote Solr, this method should perform much better than before, it is expected that this is even much higher than the increase with local Solr. Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Tue Oct 07 17:51:07 CEST 2014 by Michael Peter Christen | Added an experimental audio feedback system. This is the first element of a new 'decoration' component which may hold switches for different external appearance parameters. The first switch in that context is decoration.audio (as usual in yacy.init). This value is set to false by default, that means the audio feedback element is switched off by default. To switch it on, set decoration.audio = true (using /ConfigProperties_p.html). You will then hear sounds for the following events: - remote searches - incoming dht transmissions - new documents from the crawler Sound clips are stored in htroot/env/soundclips/ which is done so because a future implementation will read these files using the http client and with configurable urls which will make it very easy for the user to replace the given sounds with own sounds. Changed Files: defaults/yacy.init, htroot/env/soundclips/atmocrawling.wav, htroot/env/soundclips/atmomonitor.wav, htroot/env/soundclips/dhtin.wav, htroot/env/soundclips/newdoc.wav, htroot/env/soundclips/remotesearch.wav, htroot/env/soundclips/sources.txt, htroot/yacy/search.java, htroot/yacy/transferURL.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java |
Tue Oct 07 13:10:06 CEST 2014 by Marc Nause | Finished implementation of UPNP: *) will try other ports if YaCy standard ports are not available *) distinguish between internal and external port (not sure if this works 100%) Still to add: propery in config to enter own external port (in case of manually configured NAT) Changed Files: htroot/ConfigBasic.java, htroot/ConfigPortal.java, htroot/ConfigSearchBox.java, htroot/CrawlStartScanner_p.java, htroot/Load_MediawikiWiki.java, htroot/Load_PHPBB3.java, htroot/SettingsAck_p.java, htroot/Settings_p.java, htroot/Table_API_p.java, htroot/api/push_p.java, htroot/opensearchdescription.java, htroot/yacysearch.java, htroot/yacysearch_location.java, source/net/yacy/gui/Tray.java, source/net/yacy/http/Jetty9HttpServerImpl.java, source/net/yacy/peers/Seed.java, source/net/yacy/peers/SeedDB.java, source/net/yacy/search/Switchboard.java, source/net/yacy/server/http/HTTPDemon.java, source/net/yacy/server/serverSwitch.java, source/net/yacy/utils/upnp/UPnP.java, source/net/yacy/yacy.java |
Wed Oct 01 03:10:39 CEST 2014 by Michael Peter Christen | IPv6-enhanced Network monitoring page Changed Files: htroot/Network.html, htroot/Network.java, htroot/Network.xml, htroot/env/grafics/NodeDisqualified.gif, htroot/env/grafics/NodeDisqualifiedIPv4.gif, htroot/env/grafics/NodeDisqualifiedIPv6.gif, htroot/env/grafics/NodeQualified.gif, htroot/env/grafics/NodeQualifiedIPv4.gif, htroot/env/grafics/NodeQualifiedIPv6.gif, source/net/yacy/peers/Network.java, source/net/yacy/peers/Protocol.java |
Tue Sep 30 14:53:52 CEST 2014 by Michael Peter Christen | large IPv6 redesign of peer ping methods! removed preferred IPv4 in start options and added a new field IP6 in peer seeds which will contain one or more IPv6 addresses. Now every peer has one or more IP addresses assigned, even several IPv6 addresses are possible. The peer-ping process must check all given and possible IP addresses for a backping and return the one IP which was successful when pinging the peer. The ping-ing peer must be able to recognize which of the given IPs are available for outside access of the peer and store this accordingly. If only one IPv6 address is available and no IPv4, then the IPv6 is stored in the old IP field of the seed DNA. Many methods in Seed.java are now marked as @deprecated because they had been used for a single IP only. There is still a large construction site left in YaCy now where all these deprecated methods must be replaced with new method calls. The 'extra'-IPs, used by cluster assignment had been removed since that can be replaced with IPv6 usage in p2p clusters. All clusters must now use IPv6 if they want an intranet-routing. Changed Files: addon/YaCy.app/Contents/Info.plist, addon/yacyInit.m4, htroot/Blog.java, htroot/BlogComments.java, htroot/ConfigNetwork_p.java, htroot/CrawlStartScanner_p.java, htroot/MessageSend_p.java, htroot/Messages_p.java, htroot/Network.html, htroot/Network.java, htroot/SettingsAck_p.java, htroot/Status.java, htroot/Surftips.java, htroot/ViewProfile.java, htroot/Wiki.java, htroot/goto_p.java, htroot/mediawiki_p.java, htroot/yacy/hello.java, htroot/yacy/query.java, installYaCyWindowsService.bat, source/net/yacy/cora/protocol/Domains.java, source/net/yacy/cora/protocol/RequestHeader.java, source/net/yacy/cora/protocol/http/HTTPClient.java, source/net/yacy/cora/util/CommonPattern.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/http/YacyDomainHandler.java, source/net/yacy/migration.java, source/net/yacy/peers/DHTSelection.java, source/net/yacy/peers/Network.java, source/net/yacy/peers/PeerActions.java, source/net/yacy/peers/Protocol.java, source/net/yacy/peers/RemoteSearch.java, source/net/yacy/peers/Seed.java, source/net/yacy/peers/SeedDB.java, source/net/yacy/peers/Transmission.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/query/SearchEvent.java, source/net/yacy/search/query/SearchEventCache.java, source/net/yacy/search/snippet/TextSnippet.java, source/net/yacy/server/http/AlternativeDomainNames.java, source/net/yacy/server/serverAccessTracker.java, source/net/yacy/server/serverSwitch.java, source/net/yacy/yacy.java, startYACY.sh, startYACY_debug.bat |
Sun Sep 28 03:18:18 CEST 2014 by reger | update to PDFBox 1.8.7 Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/fontbox-1.8.7.License, lib/fontbox-1.8.7.jar, lib/jempbox-1.8.7.License, lib/jempbox-1.8.7.jar, lib/pdfbox-1.8.7.License, lib/pdfbox-1.8.7.jar, pom.xml |
Sat Sep 27 23:27:05 CEST 2014 by reger | update to Jetty 9.2.3 Changed Files: addon/YaCy.app/Contents/Info.plist, build.xml, htroot/Settings_ServerAccess.inc, lib/jetty-9.2.3.v20140905.License, lib/jetty-client-9.2.3.v20140905.jar, lib/jetty-continuation-9.2.3.v20140905.jar, lib/jetty-deploy-9.2.3.v20140905.jar, lib/jetty-http-9.2.3.v20140905.jar, lib/jetty-io-9.2.3.v20140905.jar, lib/jetty-jmx-9.2.3.v20140905.jar, lib/jetty-proxy-9.2.3.v20140905.jar, lib/jetty-security-9.2.3.v20140905.jar, lib/jetty-server-9.2.3.v20140905.jar, lib/jetty-servlet-9.2.3.v20140905.jar, lib/jetty-servlets-9.2.3.v20140905.jar, lib/jetty-util-9.2.3.v20140905.jar, lib/jetty-webapp-9.2.3.v20140905.jar, lib/jetty-xml-9.2.3.v20140905.jar, pom.xml |
Wed Sep 17 13:58:55 CEST 2014 by Michael Peter Christen | changed the concurrent enumeration of query results in such a way that it is now possible to get the results in two steps: - first retrieve all IDs as given for a query - then retieve each document individually This was necessary for very large result sets where a query may run for hours and is possibly terminated by a solr-internal timeout. This occurs regulary during postprocessing and therefore this commit may fix unwanted postprocessing terminations. Changed Files: htroot/HostBrowser.java, htroot/IndexDeletion_p.java, source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/ConcurrentUpdateSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/HyperlinkGraph.java |
Wed Sep 17 12:54:50 CEST 2014 by Michael Peter Christen | added a 'stats' table which records some peer statistics twice every hour. The table can be shown with http://localhost:8090/Tables_p.html?table=stats The entries have the following meaning: aM: activeLastMonth aW: activeLastWeek aD: activeLastDay aH: activeLastHour cC: countConnected (Active Senior) cD: countDisconnected (Passive Senior) cP: countPotential (Junior) cR: count of the RWI entries cI: size of the index (number of documents) The entry keys are abbreviated to reduce the space in the table as the name is written again for every row. This is the beginning of a 'yacystats' micro-alternative als built-in function in YaCy. Graphics may follow after some time if enough test data is available. Changed Files: source/net/yacy/cora/date/GenericFormatter.java, source/net/yacy/data/WorkTables.java, source/net/yacy/peers/SeedDB.java, source/net/yacy/search/Switchboard.java |
Commit | Description |
---|---|
Tue Jan 20 17:14:14 CET 2015 by Michael Peter Christen | fixed font size and print page generation in pdf snapshots Changed Files: htroot/api/snapshot.java, source/net/yacy/cora/protocol/ClientIdentification.java, source/net/yacy/cora/util/Html2Image.java, source/net/yacy/crawler/data/Transactions.java, source/net/yacy/search/index/Segment.java |
Mon Jan 12 00:35:47 CET 2015 by Michael Peter Christen | fix for mediawiki import Changed Files: htroot/IndexImportMediawiki_p.html, htroot/IndexImportMediawiki_p.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Segment.java |
Fri Jan 09 02:52:18 CET 2015 by reger | remove debug limit from commit before Changed Files: source/net/yacy/search/AutoSearch.java |
Tue Jan 06 14:21:20 CET 2015 by Michael Peter Christen | fix for division by zero (rare cases) Changed Files: htroot/NetworkHistory.java, source/net/yacy/visualization/ChartPlotter.java |
Fri Dec 26 18:23:26 CET 2014 by reger | fix "null" title in response writer for documents with multivalued title Changed Files: source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java |
Tue Dec 23 00:30:34 CET 2014 by Michael Peter Christen | NPE fix Changed Files: htroot/ConfigPortal.java, htroot/ViewLog_p.java |
Mon Dec 22 14:32:09 CET 2014 by Michael Peter Christen | fix for pdf sub-page result preparation Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java |
Mon Dec 22 14:24:09 CET 2014 by Michael Peter Christen | removed debug code Changed Files: source/net/yacy/document/parser/pdfParser.java |
Mon Dec 22 02:01:55 CET 2014 by Michael Peter Christen | fix to wkhtmltopdf usage Changed Files: source/net/yacy/cora/util/Html2Image.java |
Sun Dec 21 20:11:39 CET 2014 by Michael Peter Christen | fixes to wkhtmltopdf call Changed Files: source/net/yacy/cora/util/Html2Image.java |
Sun Dec 21 19:02:36 CET 2014 by Michael Peter Christen | prevent NPE during initialization of very large vocabularies Changed Files: htroot/yacysearch.java, source/net/yacy/document/LibraryProvider.java |
Sun Dec 21 17:53:06 CET 2014 by Michael Peter Christen | removed debug lines Changed Files: htroot/ConfigParser.html, htroot/ConfigParser.java, htroot/yacy/hello.java |
Sat Dec 20 15:11:06 CET 2014 by Michael Peter Christen | fix for division by zero Changed Files: htroot/yacy/hello.java |
Tue Dec 16 12:10:15 CET 2014 by Michael Peter Christen | fix for image parser (there is a class missing!) Changed Files: source/net/yacy/document/parser/images/genericImageParser.java |
Tue Dec 16 11:33:30 CET 2014 by Michael Peter Christen | fix for a count issue in snapshot api Changed Files: htroot/api/snapshot.java, source/net/yacy/crawler/data/Snapshots.java, source/net/yacy/crawler/data/Transactions.java |
Sun Dec 14 04:03:20 CET 2014 by Michael Peter Christen | fixes on wkhtmltopdf Changed Files: htroot/api/snapshot.java, source/net/yacy/cora/util/Html2Image.java |
Wed Dec 10 14:09:34 CET 2014 by Michael Peter Christen | fix for vocabulary import (double term detection) Changed Files: htroot/Vocabulary_p.java |
Wed Dec 10 13:14:39 CET 2014 by Michael Peter Christen | fix for Is Facet checkbox Changed Files: htroot/Vocabulary_p.java |
Mon Dec 08 01:35:37 CET 2014 by reger | prevent NPE on host link for to short HeuristicCfg.OpenSearchURL Changed Files: htroot/ConfigHeuristics_p.java |
Sat Dec 06 02:25:24 CET 2014 by reger | fix startup stop on missing HTCACHE/SNAPSHOT directory Changed Files: source/net/yacy/search/Switchboard.java |
Sat Dec 06 00:43:12 CET 2014 by Michael Peter Christen | npe fix Changed Files: htroot/api/snapshot.java |
Fri Nov 28 22:44:33 CET 2014 by reger | fix (enable) error msg on empty query Changed Files: htroot/yacysearch.html, htroot/yacysearch.java |
Fri Nov 28 01:19:31 CET 2014 by Michael Peter Christen | show vocabularies in search result (in case of debugging) Changed Files: htroot/yacysearchitem.html, htroot/yacysearchitem.java |
Fri Nov 28 01:19:01 CET 2014 by Michael Peter Christen | security bugfix Changed Files: defaults/yacy.init, source/net/yacy/search/Switchboard.java |
Mon Nov 24 20:53:40 CET 2014 by Michael Peter Christen | toString fix Changed Files: source/net/yacy/http/ProxyHandler.java |
Tue Nov 18 12:11:18 CET 2014 by Michael Peter Christen | fix field counter for multi-fields in html writer for the solr servlet Changed Files: source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java |
Thu Nov 13 00:59:30 CET 2014 by Michael Peter Christen | fix for wildcard patch in search queries Changed Files: source/net/yacy/search/query/QueryModifier.java |
Wed Nov 12 22:48:33 CET 2014 by Michael Peter Christen | fix for catchall handling in search Changed Files: htroot/yacysearch.java |
Mon Nov 10 18:52:01 CET 2014 by Michael Peter Christen | fix for bad table iteration Changed Files: htroot/NetworkHistory.java, htroot/Table_YMark_p.java, htroot/Tables_p.java, htroot/api/table_p.java, source/net/yacy/kelondro/blob/BEncodedHeap.java, source/net/yacy/kelondro/blob/Tables.java, source/net/yacy/kelondro/index/RowSet.java |
Mon Nov 10 02:18:44 CET 2014 by Michael Peter Christen | html fix Changed Files: htroot/Tables_p.html |
Fri Nov 07 18:12:09 CET 2014 by Michael Peter Christen | added long variables to debug output in index browser Changed Files: htroot/NetworkHistory.java |
Fri Nov 07 18:11:49 CET 2014 by Michael Peter Christen | fix of bad query generation for search facets Changed Files: source/net/yacy/search/query/QueryModifier.java |
Fri Nov 07 18:11:23 CET 2014 by Michael Peter Christen | fix for bad query generation in doublecheck in postprocessing Changed Files: htroot/HostBrowser.java, source/net/yacy/cora/federate/solr/logic/Negation.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Sun Nov 02 13:28:10 CET 2014 by Michael Peter Christen | emergency bugfix for 100% CPU in image drawing Changed Files: source/net/yacy/visualization/ChartPlotter.java |
Fri Oct 31 23:17:56 CET 2014 by Michael Peter Christen | fix for bad timing computation in postprocessing Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java |
Thu Oct 30 20:53:57 CET 2014 by orbiter | more fixes and enhancements to postprocessing Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java |
Thu Oct 30 15:47:44 CET 2014 by orbiter | enhanced debug code in host browser Changed Files: htroot/HostBrowser.java |
Thu Oct 30 15:20:35 CET 2014 by orbiter | fix for npe (in rare cases) Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java |
Thu Oct 30 15:01:27 CET 2014 by orbiter | fix for literal computation Changed Files: source/net/yacy/cora/federate/solr/logic/LongLiteral.java |
Thu Oct 30 12:41:04 CET 2014 by Michael Peter Christen | fix for broken protocol navigation Changed Files: htroot/yacysearchtrailer.java |
Wed Oct 29 16:52:58 CET 2014 by orbiter | more IPv6 fixes Changed Files: readme.txt, source/net/yacy/cora/protocol/Domains.java, source/net/yacy/cora/protocol/ftp/FTPClient.java |
Tue Oct 28 15:36:13 CET 2014 by Michael Peter Christen | IPv6 fix Changed Files: source/net/yacy/cora/protocol/Domains.java, source/net/yacy/peers/Seed.java, source/net/yacy/server/serverSwitch.java |
Wed Oct 22 11:25:07 CEST 2014 by sixcooler | fix for ConnectionInfo.cleanup of server-connections Changed Files: source/net/yacy/cora/protocol/ConnectionInfo.java |
Fri Oct 17 21:32:07 CEST 2014 by orbiter | fix for bad json Changed Files: htroot/yacy/seedlist.java |
Tue Oct 14 12:48:15 CEST 2014 by Michael Peter Christen | npe fix Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java |
Mon Oct 13 16:53:00 CEST 2014 by Michael Peter Christen | fix for api icon in yacysearch_location.html Changed Files: htroot/yacysearch_location.html |
Mon Oct 13 14:28:11 CEST 2014 by Michael Peter Christen | fixed location search Changed Files: source/net/yacy/cora/document/feed/RSSMessage.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Sat Oct 11 00:34:07 CEST 2014 by Michael Peter Christen | more ipv6 fixes Changed Files: htroot/MessageSend_p.java, source/net/yacy/peers/Protocol.java, source/net/yacy/peers/SeedDB.java |
Wed Oct 08 15:48:45 CEST 2014 by Michael Peter Christen | fixed appearance of RSS icon on search result page Changed Files: htroot/env/base.css, htroot/yacysearch.html, skins/27c3.css, skins/28c3.css, skins/generic_pd.css, skins/pdblue.css, skins/pdbootstrap.css |
Wed Oct 08 15:22:29 CEST 2014 by Michael Peter Christen | misc bugfixes (concurrency, memory protection) Changed Files: source/net/yacy/cora/protocol/ConnectionInfo.java, source/net/yacy/gui/Audio.java, source/net/yacy/kelondro/table/Table.java |
Wed Oct 08 15:21:49 CEST 2014 by Michael Peter Christen | more ipv6 bugfixes Changed Files: htroot/HostBrowser.java, source/net/yacy/cora/document/id/DigestURL.java, source/net/yacy/cora/document/id/MultiProtocolURL.java, source/net/yacy/crawler/robots/RobotsTxt.java |
Wed Oct 08 13:44:03 CEST 2014 by Michael Peter Christen | fix for local search Changed Files: source/net/yacy/peers/RemoteSearch.java |
Wed Oct 08 12:38:56 CEST 2014 by Michael Peter Christen | more ipv6 bugfixes Changed Files: htroot/Blog.java, htroot/BlogComments.java, htroot/Bookmarks.java, htroot/ConfigPortal.java, htroot/ConfigRobotsTxt_p.java, htroot/ConfigSearchBox.java, htroot/Load_MediawikiWiki.java, htroot/Load_PHPBB3.java, htroot/Messages_p.java, htroot/News.java, htroot/Status.java, htroot/Table_YMark_p.java, htroot/Tables_p.java, htroot/ViewProfile.java, htroot/Wiki.java, htroot/api/bookmarks/get_bookmarks.java, htroot/api/bookmarks/get_folders.java, htroot/api/ynetSearch.java, htroot/goto_p.java, htroot/mediawiki_p.java, htroot/yacy/hello.java, htroot/yacy/seedlist.java, htroot/yacysearch_location.java, source/net/yacy/peers/Network.java, source/net/yacy/peers/Protocol.java, source/net/yacy/peers/Seed.java, source/net/yacy/peers/SeedDB.java, source/net/yacy/search/snippet/ResultEntry.java |
Tue Oct 07 23:30:32 CEST 2014 by Michael Peter Christen | fix for remote search process Changed Files: source/net/yacy/cora/federate/solr/connector/ConcurrentUpdateSolrConnector.java, source/net/yacy/peers/Protocol.java, source/net/yacy/peers/RemoteSearch.java, source/net/yacy/server/serverAccessTracker.java |
Tue Oct 07 22:16:18 CEST 2014 by Michael Peter Christen | fix for bad node flag setting with IPv6 Changed Files: source/net/yacy/cora/protocol/Domains.java, source/net/yacy/http/ProxyHandler.java, source/net/yacy/peers/Protocol.java |
Tue Oct 07 21:57:41 CEST 2014 by Michael Peter Christen | ipv6 fixes for Network.html front page Changed Files: htroot/Network.html, htroot/Network.java |
Tue Oct 07 20:09:48 CEST 2014 by orbiter | more ipv6 fixes Changed Files: source/net/yacy/peers/Protocol.java, source/net/yacy/peers/SeedDB.java |
Tue Oct 07 18:53:23 CEST 2014 by Michael Peter Christen | more ipv6 fixes Changed Files: source/net/yacy/cora/protocol/Domains.java, source/net/yacy/peers/Seed.java |
Tue Oct 07 17:52:13 CEST 2014 by Michael Peter Christen | fix for latest UPnP update Changed Files: htroot/yacysearch_location.java |
Mon Oct 06 17:44:27 CEST 2014 by Michael Peter Christen | more IPv6 bugfixes Changed Files: htroot/MessageSend_p.java, htroot/Network.java, source/net/yacy/cora/document/feed/RSSReader.java, source/net/yacy/cora/protocol/Domains.java, source/net/yacy/crawler/data/Cache.java, source/net/yacy/peers/Network.java, source/net/yacy/peers/PeerActions.java, source/net/yacy/peers/Protocol.java, source/net/yacy/peers/Seed.java, source/net/yacy/peers/SeedDB.java, source/net/yacy/peers/Transmission.java, source/net/yacy/server/http/HTTPDemon.java |
Sun Oct 05 11:03:57 CEST 2014 by Michael Peter Christen | toString fixes Changed Files: source/net/yacy/data/WorkTables.java |
Fri Oct 03 08:51:23 CEST 2014 by reger | fix char encoding parameter in UrlProxy Changed Files: source/net/yacy/http/servlets/UrlProxyServlet.java |
Wed Oct 01 15:34:43 CEST 2014 by Michael Peter Christen | unresolved pattern fix Changed Files: htroot/CrawlProfileEditor_p.xml |
Wed Oct 01 15:32:10 CEST 2014 by Michael Peter Christen | ipv6 fixes Changed Files: htroot/Network.html, htroot/Network.java, htroot/env/templates/submenuComputation.template, source/net/yacy/cora/protocol/Domains.java, source/net/yacy/http/AbstractRemoteHandler.java, source/net/yacy/peers/Protocol.java |
Wed Oct 01 12:22:55 CEST 2014 by Michael Peter Christen | fix for xss bugs found by CTF365 Changed Files: htroot/yacyinteractive.java, htroot/yacysearch.java |
Wed Oct 01 10:21:03 CEST 2014 by Michael Peter Christen | IPv6 host parsing bugfixes Changed Files: htroot/IndexCreateQueues_p.java, htroot/Settings_p.java, source/net/yacy/cora/document/id/MultiProtocolURL.java, source/net/yacy/cora/protocol/Domains.java, source/net/yacy/cora/protocol/HeaderFramework.java, source/net/yacy/http/AbstractRemoteHandler.java, source/net/yacy/http/YacyDomainHandler.java, source/net/yacy/peers/Protocol.java, source/net/yacy/server/http/HTTPDProxyHandler.java, source/net/yacy/server/http/HTTPDemon.java |
Fri Sep 26 22:34:11 CEST 2014 by reger | update German translation http://mantis.tokeek.de/view.php?id=447 Changed Files: locales/de.lng |
Sat Sep 20 13:06:46 CEST 2014 by Michael Peter Christen | fix for crawl limit for number of pages fail Changed Files: source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/crawler/HostQueue.java |
Tue Sep 16 17:46:07 CEST 2014 by Michael Peter Christen | fix for Mac.app config - but does still not run. looking for build bug. Changed Files: addon/YaCy.app/Contents/Info.plist |
Commit | Description |
---|---|
Wed Jan 21 12:45:55 CET 2015 by Michael Peter Christen | Release 1.82 Changed Files: build.properties |
Tue Jan 20 18:18:12 CET 2015 by Michael Peter Christen | reverted 'do not show all options' strategy. This is actually confusing new users. Will be activated maybe again if there is an optional tutorial mode which can be switched on for this special purpose of running a tutorial. Changed Files: source/net/yacy/http/servlets/YaCyDefaultServlet.java |
Fri Jan 09 16:45:43 CET 2015 by Michael Peter Christen | a test with http://validator.w3.org/feed/#validate_by_input shows that the time format was wrong; we must use RFC-822 Changed Files: source/net/yacy/cora/document/feed/RSSMessage.java, source/net/yacy/cora/protocol/HeaderFramework.java |
Fri Jan 09 02:06:30 CET 2015 by reger | Add option for extended search (Autosearch) to Bookmark.html asking all connected peers for the searchterm added as description to the bookmark created by the bookmark icon. Intended for searches/research projects with not sufficient results from local and DHT selected remote target peers. Function: the process checks newly created bookmarks for description starting with "query=..." and takes this to ask every peer for 20 search results and adds it to the local index in a background job. link to start/stop the process added to /Bookmarks.html Changed Files: htroot/Bookmarks.html, htroot/Bookmarks.java, source/net/yacy/data/BookmarksDB.java, source/net/yacy/search/AutoSearch.java |
Fri Jan 09 01:33:45 CET 2015 by reger | Add title import for bookmark icon if avail in index Changed Files: htroot/yacysearch.java |
Fri Jan 09 01:31:57 CET 2015 by reger | - add javadoc to busythread with hint about the init parameter useage - remove obsolete 10_httpd config parameter Changed Files: htroot/PerformanceQueues_p.java, source/net/yacy/kelondro/workflow/AbstractBusyThread.java |
Tue Jan 06 15:22:59 CET 2015 by Michael Peter Christen | documents pushed over the api/push_p.html interface will have their unique flag set by default Changed Files: source/net/yacy/crawler/CrawlSwitchboard.java, source/net/yacy/crawler/data/CrawlProfile.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Tue Jan 06 14:22:43 CET 2015 by Michael Peter Christen | better scale Changed Files: htroot/NetworkHistory.java |
Tue Jan 06 14:14:25 CET 2015 by Michael Peter Christen | do not write frame links to webgraph Changed Files: source/net/yacy/document/Document.java, source/net/yacy/document/parser/html/ContentScraper.java |
Mon Jan 05 09:10:20 CET 2015 by reger | revert clickservlet (default was indeed a mistakenly) Changed Files: defaults/web.xml, defaults/yacy.init, htroot/ConfigPortal.html, htroot/ConfigPortal.java, htroot/yacysearchitem.java, source/net/yacy/search/SwitchboardConstants.java |
Mon Jan 05 08:21:51 CET 2015 by Michael Peter Christen | do not use the clickservlet by default. From my personal view, this technique should not be used at all! This project is about privacy, the existence of a click servlet is one example why people should NOT use a search portal if such exists. Changed Files: defaults/yacy.init |
Mon Jan 05 08:18:19 CET 2015 by Michael Peter Christen | please commit new files under your own name, this file was not created by me. Changed Files: source/net/yacy/http/servlets/ClickServlet.java |
Mon Jan 05 06:55:53 CET 2015 by reger | added url to bookmark icon link url is anyway needed, saves index lookup and works w/o commited url. Removed unused order parameter Changed Files: htroot/yacysearch.java, htroot/yacysearchitem.java |
Sun Jan 04 09:12:30 CET 2015 by reger | fix NPE in viewimage Caused by: java.lang.NullPointerException at net.yacy.peers.graphics.EncodedImage.<init>(EncodedImage.java:73) at ViewImage.respond(ViewImage.java:156) Changed Files: htroot/ViewImage.java |
Sun Jan 04 06:57:13 CET 2015 by reger | fix ConfigPortal jumps to iframe focus add focus parameter to yacysearch.html too Changed Files: htroot/yacysearch.html, htroot/yacysearch.java |
Sun Jan 04 02:59:21 CET 2015 by reger | add info text to metadata page (htmlresponsewriter) on no documents found Changed Files: source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java |
Fri Jan 02 04:20:02 CET 2015 by reger | improve TexParser.mimeOf( fileextension ) by returning 1st defined in supported list. This prevents unusual mapping of supported fileextension -> mimetype (like htm=application/x-tex) Changed Files: source/net/yacy/document/AbstractParser.java |
Fri Jan 02 02:44:03 CET 2015 by Michael Peter Christen | do not write iframe and embed links into webgraph, but use them anyway for crawling Changed Files: source/net/yacy/document/Document.java, source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Fri Jan 02 00:11:32 CET 2015 by Ryszard Go? | Fix for progress table background not resizing when the post-processing started/ended. Changed Files: htroot/Crawler_p.html |
Thu Jan 01 02:41:20 CET 2015 by reger | adjustments for Bookmark icon to act on BookmarkDB, it acts on YMarks but YMark interface seems not maintained, for future features (e.g. query memory) BookmarkDB is the likely choice to expand, besides the crawlstart bookmark also the result bookmark icon now adds to BookmarkDB. The YMark related code is (for now) left untouched so both tables are updated. Changed Files: htroot/yacysearch.java, htroot/yacysearchitem.java |
Mon Dec 29 03:50:00 CET 2014 by reger | remove obsolete config footer option (ConfigPortal user.login) no footer or footer-option in use remove unused yacy.init item allowUnlimitedReceiveIndexFrom Changed Files: defaults/yacy.init, htroot/ConfigPortal.html, htroot/ConfigPortal.java |
Sun Dec 28 15:52:43 CET 2014 by Michael Peter Christen | reacivated clear stacks code for termination of all crawls because this did not work wihtout that part of the code Changed Files: htroot/Crawler_p.java |
Sun Dec 28 15:48:37 CET 2014 by Michael Peter Christen | do not flush non-errors to stdout because this is a concurrency issue. the flush-call appeared very often in thread dumps with high load, so this hopefully gives some performances Changed Files: source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java |
Sun Dec 28 14:53:55 CET 2014 by Michael Peter Christen | do not translate gif images into png images for thumbnails. Instead, stream the original to the search result thumb viewer. This has two reasons: - animated gifs cause 100% cpu and deadlocks in the jvm gif parser; a known bug which is obviously not yet fixed - animated gifs now appear in the search result also as animation Changed Files: htroot/ViewImage.java, htroot/yacysearchitem.java, source/net/yacy/search/query/QueryModifier.java |
Sun Dec 28 14:36:43 CET 2014 by Michael Peter Christen | automatically set the Q flag for smb/ftp start urls (split pdf support) Changed Files: htroot/js/IndexCreate.js |
Sun Dec 28 14:27:42 CET 2014 by Michael Peter Christen | automatically swith on query option in case intranet protocols (smb/ftp) are used. This supports the new split-pdf option. Changed Files: htroot/Crawler_p.java |
Sat Dec 27 09:52:34 CET 2014 by arucard21 | Applied URL-decoding prior to HTML-encoding. This removes percent-encoding from text shown in HTML Changed Files: source/net/yacy/server/serverObjects.java |
Sat Dec 27 03:02:18 CET 2014 by Ryszard Go? | Postprocessing progress bar fix (Make it work as [probably] actually intended) Changed Files: htroot/Crawler_p.html, htroot/js/Crawler.js |
Sat Dec 27 00:10:14 CET 2014 by reger | Init Jetty using setDefaultDescriptor (web.xml) to defaults/web.xml so web.xml in defaults dir is applied first and optional DATA/SETTINGS/web.xml loaded on top. By using this Jetty feature (default web.xml) we assure that changes to the default are applied to existing installations and individual addition/changes are still respected. Changed Files: defaults/web.xml, source/net/yacy/http/Jetty9HttpServerImpl.java |
Fri Dec 26 18:21:35 CET 2014 by reger | adjust fieldtype and description of field httpstatus_redirect_s in CollectionSchema - the field is not used (delete candidate) Changed Files: source/net/yacy/search/schema/CollectionSchema.java |
Thu Dec 25 02:21:45 CET 2014 by reger | fix NPE related 500 (Bad Request) response of UrlProxy on blacklisted urls, by adding parameter HTTPDeamon and removing unused hostAddress lookup code in sendRespondError Changed Files: source/net/yacy/http/servlets/UrlProxyServlet.java, source/net/yacy/server/http/HTTPDemon.java |
Thu Dec 25 02:16:19 CET 2014 by reger | improve yacysearchitem, prevent allocation of String (modifyURL) if feature not used Changed Files: htroot/yacysearchitem.java |
Thu Dec 25 02:13:44 CET 2014 by reger | add xmpcore as direct dependency to pom (otherwise it's looked up at pdfbox archive path and not found there) Changed Files: pom.xml |
Wed Dec 24 12:23:59 CET 2014 by Michael Peter Christen | crawling of multi-page pdfs with artificial post part on smb or ftp shares is not possible with the disabled setting; this is not temporary disabled until a better solution is on the hand. Changed Files: htroot/js/IndexCreate.js |
Wed Dec 24 00:04:35 CET 2014 by reger | fix div by 0 in hello Caused by: java.lang.ArithmeticException: / by zero at hello.respond(hello.java:159) Changed Files: htroot/yacy/hello.java |
Tue Dec 23 19:11:21 CET 2014 by reger | update to SLF4J 1.7.9 Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/jcl-over-slf4j-1.7.9.jar, lib/log4j-over-slf4j-1.7.9.jar, lib/slf4j-api-1.7.9.jar, lib/slf4j-jdk14-1.7.9.jar, nbproject/project.xml, pom.xml |
Tue Dec 23 02:01:03 CET 2014 by reger | fix proxy redirect (http status 302) response fixes http://mantis.tokeek.de/view.php?id=517 The url given in bug report uses a gzip input stream which causes the HTTPClient.writeto() throw an IOException due to incomplete input stream. This in turn prevents the 302 reponse to the client browser. By limiting to serve target content just on httpstatus=200 will proxy the header reponse and client browsers redirect settings can be honored. Changed Files: source/net/yacy/http/ProxyHandler.java, source/net/yacy/server/http/HTTPDProxyHandler.java |
Tue Dec 23 00:37:51 CET 2014 by Michael Peter Christen | enhanced initialization of autotagging Changed Files: source/net/yacy/cora/lod/vocabulary/Tagging.java |
Mon Dec 22 20:36:29 CET 2014 by reger | add hint to Heuristics Config on "Greedy Learning Mode" in portal config, to point to a option to make this setting permanent. Changed Files: htroot/ConfigHeuristics_p.html, htroot/ConfigPortal.html |
Mon Dec 22 20:34:13 CET 2014 by reger | update to commons-fileupload-1.3.1.jar (includes a security fix) Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/commons-fileupload-1.3.1.License, lib/commons-fileupload-1.3.1.jar, nbproject/project.xml, pom.xml |
Sun Dec 21 19:17:06 CET 2014 by Michael Peter Christen | changed prefer strategy for http unique in such a way that http is preferred over https. While this is a bad idea from the standpoint of security it is more common applicable for environments where http and https mix and for some domains https is not available. Then the double-check is possible even if no postprocessing is performed. Changed Files: defaults/yacy.init, source/net/yacy/search/Switchboard.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Sun Dec 21 19:08:28 CET 2014 by Michael Peter Christen | fix to prevent assertion error in ranking servlet if no vocabularies are present that could be evaluated Changed Files: htroot/RankingSolr_p.java |
Sun Dec 21 17:31:51 CET 2014 by Michael Peter Christen | the miss cache does not seem to work, it sometimes contains urlhashes from documents which actually are inside the index. This can be reproduced using the crawl result table at http://localhost:8090/CrawlResults.html?process=5 The cache is temporary disabled to remove the bad behaviour, however a later reactivation of that feater may be possible. Changed Files: defaults/yacy.init, source/net/yacy/cora/federate/solr/connector/ConcurrentUpdateSolrConnector.java |
Sun Dec 21 14:02:06 CET 2014 by reger | fix refactored size() -> filesize() in YMarkMetadata Changed Files: source/net/yacy/data/ymark/YMarkMetadata.java |
Sun Dec 21 06:05:35 CET 2014 by reger | refactor size() -> filesize() of URIMetadataNode (harmonize with ResultEntry and to not get confused with Collection.size()) Changed Files: htroot/ViewFile.java, htroot/api/yacydoc.java, source/net/yacy/kelondro/data/meta/URIMetadataNode.java, source/net/yacy/search/snippet/ResultEntry.java |
Sun Dec 21 03:45:54 CET 2014 by reger | remove redundant caching of urlhash in URIMetadataNode (is already cached in underlaying DigestURL .url) upd pom keyword for maven-antrun-plugin Changed Files: pom.xml, source/net/yacy/kelondro/data/meta/URIMetadataNode.java |
Sat Dec 20 01:59:00 CET 2014 by reger | use peeraddress for link in remote crawl list to make link work without enabled proxy upd pom for Jetty (missing in last commit) Changed Files: htroot/RemoteCrawl_p.html, htroot/RemoteCrawl_p.java, pom.xml |
Fri Dec 19 17:41:38 CET 2014 by Michael Peter Christen | preventing the use of no-cache and expires in case that images are generated dynamically which will stay static in the future. This applies mainly to the search result favicon in front of search hits. These icons will now be generated once, but then caches in the browser. There is also a YaCy-internal cache for these icons which had prevented the re-generation of the icons in YaCy, but this cache is now superfluous since the browser should not call the servlet ViewImage again. Changed Files: htroot/api/snapshot.java, htroot/osm.java, htroot/yacysearchitem.html, htroot/yacysearchitem.java, source/net/yacy/peers/graphics/EncodedImage.java |
Fri Dec 19 17:38:58 CET 2014 by Michael Peter Christen | fixes for searches when initialization of large autotagging libraries have not been finished Changed Files: htroot/ViewImage.java, htroot/yacysearch.java, source/net/yacy/search/query/QueryParams.java |
Fri Dec 19 17:37:58 CET 2014 by Michael Peter Christen | fixes to usage of no-cache: use and recognize also the no-store directive Changed Files: htroot/NetworkPicture.java, source/net/yacy/crawler/retrieval/Response.java, source/net/yacy/http/servlets/SolrSelectServlet.java, source/net/yacy/http/servlets/YaCyDefaultServlet.java, source/net/yacy/peers/SeedDB.java, source/net/yacy/search/Switchboard.java, source/net/yacy/server/http/HTTPDemon.java |
Fri Dec 19 11:51:14 CET 2014 by Michael Peter Christen | reduction of http requests to YaCy using the correct cache-control, expires and last-modified headers in http response. Changed Files: source/net/yacy/http/servlets/YaCyDefaultServlet.java |
Fri Dec 19 01:58:37 CET 2014 by reger | fix missing AppPath upd Maven plugin versionid Changed Files: pom.xml, source/net/yacy/search/Switchboard.java |
Tue Dec 16 21:12:37 CET 2014 by reger | include xmpcore.jar in classpath used by metadata-extractor Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/xmpcore-5.1.2.license, nbproject/project.xml |
Tue Dec 16 21:10:53 CET 2014 by malykhin.dmitry | Update russian translation Changed Files: locales/ru.lng |
Tue Dec 16 13:53:12 CET 2014 by Michael Peter Christen | added query modifier 'on'. This makes it possible to search for date occurrences within the (web) page documents (not the document last-modified!). This works only if the solr field dates_in_content_sxt is enabled. A search request may then have the form "term on:<date>", like gift on:24.12.2014 gift on:2014/12/24 * on:2014/12/31 For the date format you may use any kind of human-readable date representation(!yes!) - the on:<date> parser tries to identify language and also knows event names, like: bunny on:eastern .. as long as the date term has no spaces inside (use a dot). Further enhancement will be made to accept also strings encapsulated with quotes. Changed Files: source/net/yacy/document/DateDetection.java, source/net/yacy/search/query/QueryModifier.java, source/net/yacy/search/query/QueryParams.java |
Tue Dec 16 13:18:49 CET 2014 by Michael Peter Christen | added (very experimental) Solr response writer for snapshot image results Changed Files: source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/SnapshotImagesReponseWriter.java, source/net/yacy/http/servlets/SolrSelectServlet.java |
Tue Dec 16 12:39:10 CET 2014 by Michael Peter Christen | added url, date, time and page number on pdf snapshot footer Changed Files: source/net/yacy/cora/util/Html2Image.java |
Tue Dec 16 12:09:57 CET 2014 by Michael Peter Christen | reactivated on-demand snapshot loading Changed Files: htroot/api/snapshot.java, source/net/yacy/crawler/data/Transactions.java, source/net/yacy/search/index/Segment.java |
Mon Dec 15 22:54:49 CET 2014 by reger | add final SolrQueryRequest.close to SolrServlet Changed Files: source/net/yacy/http/servlets/SolrServlet.java |
Mon Dec 15 05:56:12 CET 2014 by Michael Peter Christen | added a note that the servlet is linked using web.xml Changed Files: source/net/yacy/http/servlets/UrlProxyServlet.java |
Sun Dec 14 21:27:45 CET 2014 by reger | - fix path to default heuristic.cfg - deprecate unused ProxyServlet Changed Files: htroot/ConfigHeuristics_p.java, source/net/yacy/http/servlets/YaCyProxyServlet.java |
Sun Dec 14 19:17:13 CET 2014 by reger | add chardet.jar to Maven dependencies Changed Files: nbproject/project.xml, pom.xml |
Sun Dec 14 19:12:18 CET 2014 by reger | fix yacy.init comment http://mantis.tokeek.de/view.php?id=513 Changed Files: defaults/yacy.init |
Sun Dec 14 13:43:30 CET 2014 by Michael Peter Christen | add the actual DateDetection class... (missed in latest commit) Changed Files: source/net/yacy/document/DateDetection.java |
Sun Dec 14 04:02:13 CET 2014 by Michael Peter Christen | enable sku as anchor in html response writer Changed Files: defaults/solr.collection.schema, source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java |
Sat Dec 13 09:54:41 CET 2014 by Michael Peter Christen | enhanced tagging preparation speed which reduces initialization time for very large vocabularies Changed Files: source/net/yacy/cora/lod/vocabulary/Tagging.java |
Thu Dec 11 23:37:41 CET 2014 by Michael Peter Christen | refactoring date -> lastModified Changed Files: source/net/yacy/document/Document.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Wed Dec 10 14:10:05 CET 2014 by Michael Peter Christen | added concurrent generation of snapshot pdfs Changed Files: source/net/yacy/crawler/data/Snapshots.java, source/net/yacy/crawler/data/Transactions.java |
Wed Dec 10 13:11:51 CET 2014 by Michael Peter Christen | added charset detection to vocabulary reader Changed Files: htroot/Vocabulary_p.java |
Wed Dec 10 13:08:29 CET 2014 by Michael Peter Christen | added character set detection library from http://www-archive.mozilla.org/projects/intl/chardet.html Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/chardet.License, lib/chardet.jar, source/net/yacy/kelondro/util/FileUtils.java |
Wed Dec 10 12:20:27 CET 2014 by Michael Peter Christen | added new options to vocabulary editor: - new switch 'isFacet' which causes that the usage of the vocabulary for search facets is enabled or disabled. This shall be used for large vocabularies sind searched in solr are extremely slow if facets for a large set of alternative terms are generated - new option to disable auto-enrichment from synonyms - new option to add synonyms from another column when importing from csv - automatically recognize double-occurrences in synonyms and bundling terms for such synonyms Changed Files: htroot/Vocabulary_p.html, htroot/Vocabulary_p.java, source/net/yacy/cora/lod/vocabulary/Tagging.java, source/net/yacy/search/query/QueryParams.java |
Tue Dec 09 00:58:08 CET 2014 by reger | remove redundant null check in ResponseHeader.lastModified added a JUnit testcase for ResponseHeader dates (using age()), adjusted age() to pass all tests Changed Files: source/net/yacy/cora/protocol/ResponseHeader.java, test/net/yacy/cora/protocol/ResponseHeaderTest.java |
Mon Dec 08 11:41:28 CET 2014 by Michael Peter Christen | added confirmation dialogs for row deletion Changed Files: htroot/Table_API_p.html |
Mon Dec 08 11:35:40 CET 2014 by Michael Peter Christen | more robustness for broken table data in Table_API_p.html -- see bug report http://mantis.tokeek.de/view.php?id=495 Changed Files: htroot/Table_API_p.java |
Sun Dec 07 23:43:38 CET 2014 by Michael Peter Christen | enhancement for clearing the crawl queue Changed Files: htroot/Crawler_p.java |
Sun Dec 07 04:31:09 CET 2014 by reger | modified FieldReIndex to reindex queries with low number of documents first by using a internally a score map with number of documents as score and working through the list from low to high. Changed Files: htroot/IndexReIndexMonitor_p.html, htroot/IndexReIndexMonitor_p.java, source/net/yacy/search/index/ReindexSolrBusyThread.java |
Sat Dec 06 22:32:24 CET 2014 by reger | update to commons-logging-1.2 Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/commons-logging-1.2.License, lib/commons-logging-1.2.jar, nbproject/project.xml, pom.xml |
Sat Dec 06 01:44:03 CET 2014 by reger | Merge origin/master Changed Files: htroot/api/snapshot.java, source/net/yacy/cora/document/feed/RSSFeed.java, source/net/yacy/cora/document/feed/RSSMessage.java, source/net/yacy/crawler/data/Snapshots.java |
Sat Dec 06 01:42:24 CET 2014 by reger | coding fixes suggested in http://mantis.tokeek.de/view.php?id=509 http://mantis.tokeek.de/view.php?id=510 Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/server/http/HTTPDProxyHandler.java |
Sat Dec 06 00:25:05 CET 2014 by Michael Peter Christen | added rss feed output to snapshot servlet which can be used to get a list of latest/oldest entries in the snapshot database. This is an example: http://localhost:8090/api/snapshot.rss?depth=2&order=LATESTFIRST&host=yacy.net&maxcount=100 The properties depth, order, host and maxcount can be omited. The meaning of the fields are: host: select only urls from this host or all, if not given depth: select only urls at that crawl depth or all, if not given maxcount: select at most the given number of urls or 10, if not given order: either LATESTFIRST to select the youngest entries, OLDESTFIRST to select the first entries or ANY to select any The rss feed needs administration rights to work, a call to this servlet with rss extension must attach login credentials. Changed Files: htroot/api/snapshot.java, source/net/yacy/crawler/data/Snapshots.java |
Sat Dec 06 00:18:14 CET 2014 by Michael Peter Christen | added toString() methods to feed classes which makes it possible to export full rss feed files out of the RSSFeed class Changed Files: source/net/yacy/cora/document/feed/RSSFeed.java, source/net/yacy/cora/document/feed/RSSMessage.java |
Fri Dec 05 03:03:28 CET 2014 by reger | remove the unused Request variable (fix of prev. commit) Changed Files: source/net/yacy/crawler/retrieval/Request.java |
Fri Dec 05 01:15:41 CET 2014 by reger | Merge origin/master Changed Files: htroot/CrawlStartExpert.java, source/net/yacy/cora/util/Html2Image.java, source/net/yacy/kelondro/util/OS.java |
Thu Dec 04 01:21:24 CET 2014 by Michael Peter Christen | added Image Events as another option to generate images with a mac if no Ghostscript is available or does not work... Changed Files: source/net/yacy/cora/util/Html2Image.java, source/net/yacy/kelondro/util/OS.java |
Wed Dec 03 18:07:05 CET 2014 by Michael Peter Christen | added another path for the convert command because on older Macs ImageMagick has a different installation location Changed Files: htroot/CrawlStartExpert.java, source/net/yacy/cora/util/Html2Image.java |
Tue Dec 02 21:03:00 CET 2014 by reger | skip creation of unused Bluelist contenttransformer Changed Files: source/net/yacy/document/parser/html/AbstractTransformer.java, source/net/yacy/document/parser/html/ContentTransformer.java, source/net/yacy/document/parser/html/TransformerWriter.java, source/net/yacy/server/http/HTTPDProxyHandler.java |
Tue Dec 02 16:21:06 CET 2014 by Michael Peter Christen | showing list of all thread in threaddump using the ThreadMXBean counter (this obviously show more threads than before?) Changed Files: htroot/Threaddump_p.java |
Tue Dec 02 16:05:00 CET 2014 by Michael Peter Christen | set Busy- and Blocking-Threads to daemon mode (they will now not prevent YaCy from termination if still running) Changed Files: source/net/yacy/kelondro/workflow/AbstractBlockingThread.java, source/net/yacy/kelondro/workflow/AbstractBusyThread.java, source/net/yacy/kelondro/workflow/AbstractThread.java, source/net/yacy/kelondro/workflow/InstantBlockingThread.java |
Tue Dec 02 16:04:11 CET 2014 by Michael Peter Christen | show number of threads on status page Changed Files: htroot/Status.java, htroot/Status_p.inc |
Tue Dec 02 13:35:19 CET 2014 by Michael Peter Christen | in case that loading from the cache fails, load from wkhtmltopdf without cache using the user agent string given in the crawl profile Changed Files: source/net/yacy/cora/util/Html2Image.java, source/net/yacy/crawler/data/Snapshots.java, source/net/yacy/repository/LoaderDispatcher.java |
Tue Dec 02 12:52:36 CET 2014 by Michael Peter Christen | recognize more html file types for snapshots Changed Files: source/net/yacy/repository/LoaderDispatcher.java |
Tue Dec 02 12:52:05 CET 2014 by Michael Peter Christen | get cloned crawl start parameter for snapshots Changed Files: htroot/CrawlStartExpert.html, htroot/CrawlStartExpert.java |
Tue Dec 02 12:10:44 CET 2014 by Michael Peter Christen | recognize more html file extensions Changed Files: source/net/yacy/document/parser/htmlParser.java |
Tue Dec 02 11:51:12 CET 2014 by Michael Peter Christen | fix to xvfb-run usage (quotes did not parse in xvfb-run, default values are appropriate) Changed Files: source/net/yacy/cora/util/Html2Image.java |
Mon Dec 01 18:21:52 CET 2014 by Michael Peter Christen | added fail-over missing http proxy service (i.e. overload) and quiet mode Changed Files: source/net/yacy/cora/util/Html2Image.java |
Mon Dec 01 17:37:25 CET 2014 by Michael Peter Christen | moved snapshot generation out of the html handler to prevent that existing cache entries cause that the handler is not executed Changed Files: source/net/yacy/cora/util/Html2Image.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/repository/LoaderDispatcher.java |
Mon Dec 01 16:50:37 CET 2014 by Michael Peter Christen | more logging Changed Files: source/net/yacy/cora/util/Html2Image.java |
Mon Dec 01 16:38:07 CET 2014 by Michael Peter Christen | grr Changed Files: source/net/yacy/cora/util/Html2Image.java |
Mon Dec 01 16:26:28 CET 2014 by Michael Peter Christen | wrap wkhtmltopdf with xvfb if necessary Changed Files: source/net/yacy/cora/util/Html2Image.java |
Mon Dec 01 16:00:45 CET 2014 by Michael Peter Christen | more logging when failing to create pdf snapshot Changed Files: source/net/yacy/cora/util/Html2Image.java |
Mon Dec 01 15:20:10 CET 2014 by Michael Peter Christen | added the property timeoutrequests to configuration to disable TimeoutRequests. The purpose is to test if YaCy runs better on VMs where there is a limitation of concurrent processes; see /proc/user_beancounters in row numproc; this value is limited and should be low. Try to set timeoutrequests to keep this low. (works only after restart) Changed Files: defaults/yacy.init, source/net/yacy/cora/protocol/TimeoutRequest.java, source/net/yacy/search/Switchboard.java |
Mon Dec 01 01:12:51 CET 2014 by Michael Peter Christen | moved network configuration to Use Case submenu; this is necessary because the definiton of portal peers within the YaCy freeworld network is otherwise splitted into two different main menus. Changed Files: htroot/ConfigNetwork_p.html, htroot/env/templates/submenuConfig.template, htroot/env/templates/submenuUseCaseAccount.template |
Mon Dec 01 00:21:30 CET 2014 by reger | replace depreciated Solr DateField.formatExternal with recommended TrieDateField.formatExternal Changed Files: source/net/yacy/cora/federate/solr/responsewriter/EnhancedXMLResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java |
Sun Nov 30 19:43:53 CET 2014 by reger | update to guava.18.0.jar and jsch.0.1.51.jar Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/guava-18.0.jar, lib/jsch-0.1.51.License, lib/jsch-0.1.51.jar, nbproject/project.xml, pom.xml |
Sun Nov 30 19:42:33 CET 2014 by reger | skip unused call parameter for hashSentence() Changed Files: source/net/yacy/document/SnippetExtractor.java, source/net/yacy/document/WordTokenizer.java, source/net/yacy/search/snippet/MediaSnippet.java, source/net/yacy/search/snippet/TextSnippet.java |
Sun Nov 30 01:58:14 CET 2014 by reger | position api icon (ViewFile.html) Changed Files: htroot/ViewFile.html |
Sat Nov 29 22:36:02 CET 2014 by reger | update to poi-3.10.1.jar Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/poi-3.10.1.License, lib/poi-3.10.1.jar, lib/poi-scratchpad-3.10.1.License, lib/poi-scratchpad-3.10.1.jar, nbproject/project.xml, pom.xml |
Sat Nov 29 22:13:24 CET 2014 by reger | including small junit test case for WordTokenizer Changed Files: test/net/yacy/document/WordTokenizerTest.java |
Sat Nov 29 17:16:05 CET 2014 by reger | skip to tokenize punktuation as word in WordTokenizer remove unused variables in condenser related to Tokenizer Changed Files: source/net/yacy/document/Condenser.java, source/net/yacy/document/WordTokenizer.java |
Sat Nov 29 15:27:16 CET 2014 by reger | add. use host port parameter in YaCyApp Changed Files: source/net/yacy/gui/InfoPage.java, source/net/yacy/gui/YaCyApp.java, source/net/yacy/gui/framework/Switchboard.java |
Sat Nov 29 03:09:55 CET 2014 by reger | adjust translation text of error msg on empty query (ru: needs correction) Changed Files: locales/de.lng, locales/ru.lng |
Fri Nov 28 01:40:46 CET 2014 by reger | remove obsolete alternate link fix api link Changed Files: htroot/Table_API_p.html |
Fri Nov 28 01:25:52 CET 2014 by Michael Peter Christen | added image screenshot generator Changed Files: source/net/yacy/cora/util/Html2Image.java |
Thu Nov 27 20:50:55 CET 2014 by Michael Peter Christen | ignore url errors during search Changed Files: source/net/yacy/search/query/SearchEvent.java |
Thu Nov 27 12:13:20 CET 2014 by Michael Peter Christen | disabled postprocessing by default. If you read this: please disable postprocessing in your peer as well: open /IndexSchema_p.html, then deselect field process_sxt Changed Files: defaults/solr.collection.schema |
Thu Nov 27 12:11:54 CET 2014 by Michael Peter Christen | larger boost fields for ranking Changed Files: htroot/RankingSolr_p.html |
Thu Nov 27 08:08:05 CET 2014 by Michael Peter Christen | bold words in snippets should not be coloured black in the base style because there are styles with dark backgrounds which make the bold word invisible Changed Files: htroot/env/base.css |
Thu Nov 27 07:44:41 CET 2014 by Michael Peter Christen | changed vocabulary navigator object type to TreeMap to get a specific order into the vocabularies. This is now lexicographic which is not so much random as a hashed order Changed Files: source/net/yacy/search/query/SearchEvent.java |
Wed Nov 26 18:01:35 CET 2014 by Michael Peter Christen | added option to change the navbar-default, i.e. usable for dark skins Changed Files: defaults/yacy.init, htroot/env/templates/simpleheader.template, source/net/yacy/http/servlets/YaCyDefaultServlet.java |
Tue Nov 25 23:11:42 CET 2014 by Michael Peter Christen | trying facet.method fc instead of fcs to handle large facets Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java |
Mon Nov 24 20:53:19 CET 2014 by Michael Peter Christen | prevent that a local Solr search and a local RWI search are running concurrently. When a RWI search result is flushed into the result set, id does Solr Queries (which replaced the old-style Metadata Queries) and they are possibly running concurrently to a previously startet Solr search. Both methods may block each other with IO. To enhance the speed, they are now serialized. Because the Solr search results may result in better results using the more advanced and configurable Ranking methods, this result is preverred over the RWI search result. However, remote RWI search results are still feeded concurrently into the search result as well. Changed Files: source/net/yacy/search/query/SearchEvent.java |
Sun Nov 23 23:29:20 CET 2014 by reger | fix path lookup to ./defaults/yacy.badwords (fix of commit https://gitorious.org/yacy/rc1/commit/ee277b9b3e033e8261a97b8334b79059220f113a) Changed Files: source/net/yacy/search/Switchboard.java |
Sun Nov 23 23:12:01 CET 2014 by reger | fix empty text facet entry (noticed on Author facet) Changed Files: source/net/yacy/peers/Protocol.java |
Sun Nov 23 20:11:23 CET 2014 by Michael Peter Christen | more stacks shall be considered for on-demand loading, not only deep-depth stacks to prevent "too many open files" problem Changed Files: source/net/yacy/crawler/HostQueue.java |
Sun Nov 23 20:09:32 CET 2014 by Michael Peter Christen | reduce number of calls to queue.size() because that may be a bottleneck during crawling Changed Files: htroot/yacy/urls.java, source/net/yacy/crawler/HostBalancer.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/crawler/data/NoticedURL.java |
Sun Nov 23 20:07:32 CET 2014 by Michael Peter Christen | optimize usage of size() cache Changed Files: source/net/yacy/kelondro/index/OnDemandOpenFileIndex.java |
Sun Nov 23 05:22:23 CET 2014 by reger | allow for local yacy.stopwords and yacy.badwords list (in DATA/SETTINGS/) if file in DATA/SETTINGS it is loaded otherwise file in ./defaults is loaded (if locale ./defaults/stopwords.xx doesn't exist take solr/lang/stopwords_xx.txt as default) move yacy.stopwords, yacy.stopwords.de and yacy.badwords.example out of root directory to ./defaults directory Changed Files: defaults/yacy.badwords.example, defaults/yacy.stopwords, defaults/yacy.stopwords.de, source/net/yacy/search/Switchboard.java |
Sat Nov 22 22:49:23 CET 2014 by reger | remove redundant toLower for topwords Changed Files: source/net/yacy/search/query/SearchEvent.java |
Sat Nov 22 12:09:07 CET 2014 by Michael Peter Christen | better delete all files in path when removing host crawl stack Changed Files: source/net/yacy/crawler/HostBalancer.java |
Sat Nov 22 12:04:04 CET 2014 by Michael Peter Christen | if we have many hosts, use on-demand earlier Changed Files: source/net/yacy/crawler/HostQueue.java |
Sat Nov 22 12:01:00 CET 2014 by Michael Peter Christen | prevent division by zero Changed Files: source/net/yacy/visualization/ChartPlotter.java |
Fri Nov 21 14:38:54 CET 2014 by Michael Peter Christen | disabled crazy sleep loop Changed Files: source/net/yacy/kelondro/util/FileUtils.java |
Fri Nov 21 12:42:29 CET 2014 by Michael Peter Christen | when importing vocabulary csv files, accept also files without semicolon and truncate quotes from literals Changed Files: htroot/Vocabulary_p.java |
Thu Nov 20 18:46:06 CET 2014 by Michael Peter Christen | added hints to ranking to make ranking boosts using vocabularies easier Changed Files: htroot/RankingSolr_p.html, htroot/RankingSolr_p.java |
Thu Nov 20 18:45:27 CET 2014 by Michael Peter Christen | do not cache search requests to Solr if the result is used for doublechecking. If a double-check comes from cached results the doublecheck fails. Changed Files: htroot/IndexDeletion_p.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Thu Nov 20 18:44:29 CET 2014 by Michael Peter Christen | use a LinkedHashMap for factes to maintain facet order as given by solr Changed Files: htroot/HostBrowser.java, source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/CachedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/ConcurrentUpdateSolrConnector.java, source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java |
Thu Nov 20 02:04:43 CET 2014 by reger | include domtype to searcheventcache id to differenciate between local / global events for reuse of cached events fix for http://mantis.tokeek.de/view.php?id=493 Changed Files: source/net/yacy/search/query/QueryParams.java |
Wed Nov 19 18:12:43 CET 2014 by Michael Peter Christen | added option to enrich vocabularies with synonyms from synonym database Changed Files: htroot/Vocabulary_p.html, htroot/Vocabulary_p.java, source/net/yacy/cora/language/synonyms/SynonymLibrary.java |
Tue Nov 18 15:02:34 CET 2014 by Michael Peter Christen | added new solr schema fields which record the occurences of vocabulary matchings. These matches can be used for result boosting, i.e. if a document contains words from a specific vocabulary, boost it. Changed Files: defaults/solr.collection.schema, source/net/yacy/migration.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/SearchEvent.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java |
Mon Nov 17 14:23:21 CET 2014 by Michael Peter Christen | fix for 2-day network stats table: showing 48 instead of 24 hours from peer history Changed Files: htroot/Network.html |
Mon Nov 17 14:22:40 CET 2014 by Michael Peter Christen | added option in vocabulary editor to import CSV files with different encodings (preselected windows-type character encoding which is typical for CSV files). Fixed also other problems with character encoding in dictionary files. Automatically generated vocabularies are now also noted in the API steering. Changed Files: htroot/Vocabulary_p.html, htroot/Vocabulary_p.java, source/net/yacy/cora/lod/vocabulary/Tagging.java |
Mon Nov 17 01:24:30 CET 2014 by reger | adjust tag cloud font size calculation to limit max font size to ~ TOPWORDS_MAXSIZE Changed Files: htroot/yacysearchtrailer.java |
Sun Nov 16 01:26:07 CET 2014 by reger | add a check of java version string >=1.7 to startup class stopping start with error msg on version < 1.7 Changed Files: source/net/yacy/yacy.java |
Fri Nov 14 16:34:55 CET 2014 by Michael Peter Christen | added fix to postprocessing: avoid caching of postprocessing collection to always get fresh lists of documents. This is necessary since the postprocessing changes the same documents which the postprocessing-collection query selects. Changed Files: htroot/Vocabulary_p.html, htroot/api/status_p.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Fri Nov 14 10:02:50 CET 2014 by Michael Peter Christen | added high-precision scheduler for API processes. This allows also to make the execution in dependency of available RAM or CPU load. The default value for CPU load is 4.0 and the check runs once a minute. Changed Files: defaults/yacy.init, htroot/Table_API_p.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java |
Thu Nov 13 01:30:12 CET 2014 by Michael Peter Christen | added missing class for latest changes Changed Files: source/net/yacy/kelondro/table/IndexTable.java |
Thu Nov 13 01:15:31 CET 2014 by Michael Peter Christen | fix in key enumeration methods for cases where the enumeration is done in reverse order. Changed Files: source/net/yacy/kelondro/index/BufferedObjectIndex.java, source/net/yacy/kelondro/index/RAMIndex.java |
Wed Nov 12 21:32:34 CET 2014 by sixcooler | added a input-field for setting 'fileHost' Set this to avoid error-messages like 'proxy use not allowed / granted' on accessing your Peer by its hostname. Changed Files: htroot/SettingsAck_p.java, htroot/Settings_ServerAccess.inc, htroot/Settings_p.java |
Tue Nov 11 13:57:04 CET 2014 by Michael Peter Christen | another fix to ordering of table indexes; fixes also network stats graphics Changed Files: source/net/yacy/kelondro/index/RAMIndex.java, source/net/yacy/kelondro/index/RAMIndexCluster.java, source/net/yacy/kelondro/table/SplitTable.java, source/net/yacy/kelondro/util/StackIterator.java |
Sun Nov 09 22:06:00 CET 2014 by reger | remove not used accordion javascript call for facet navs Changed Files: htroot/yacysearchtrailer.html, htroot/yacysearchtrailer.java |
Sun Nov 09 04:17:14 CET 2014 by reger | skip creation of local var in proxyhandler.storetocache Changed Files: source/net/yacy/http/ProxyHandler.java |
Sat Nov 08 21:10:10 CET 2014 by reger | upd NB project.xml to codec-1.9 Changed Files: nbproject/project.xml |
Fri Nov 07 22:43:50 CET 2014 by sixcooler | fix assertation-failure in version-string for Solr-4.10.2 by changing the assert - hope that is ok + add forgotten NB-Projekt-changes Changed Files: nbproject/project.xml, source/net/yacy/search/index/Fulltext.java |
Sun Nov 02 21:16:51 CET 2014 by orbiter | fix for search in case where local peer has no local seed address in portal mode Changed Files: source/net/yacy/peers/RemoteSearch.java |
Sun Nov 02 20:30:49 CET 2014 by orbiter | added reverse button to tables, by default on now (to see latest entries first) Changed Files: htroot/Tables_p.html, htroot/Tables_p.java |
Sun Nov 02 20:10:32 CET 2014 by orbiter | added (missing) Tables_p.xml for table xml api Changed Files: htroot/Tables_p.xml |
Sun Nov 02 20:08:49 CET 2014 by orbiter | removed unused options from BusyThreads Changed Files: source/net/yacy/kelondro/workflow/AbstractBusyThread.java, source/net/yacy/kelondro/workflow/InstantBusyThread.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/ReindexSolrBusyThread.java |
Sun Nov 02 12:52:23 CET 2014 by Michael Peter Christen | more enhancements to posprocessing speed Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/ConcurrentUpdateSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Fri Oct 31 17:44:45 CET 2014 by Michael Peter Christen | another fix for postprocessing (the query for "" on numeric field did not work in external solr) Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java |
Fri Oct 31 17:30:24 CET 2014 by Michael Peter Christen | more fixes in postprocessing: partitioning of the complete queue to enable smaller queries Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Thu Oct 30 21:52:52 CET 2014 by orbiter | more concurrency for postprocessing Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java |
Thu Oct 30 18:05:48 CET 2014 by orbiter | enhanced postprocessing by usage of a field-list generation to prevent lazy initialization of the documents. This is useful because the documents must be read completely anyway. Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Wed Oct 29 21:41:41 CET 2014 by Michael Peter Christen | better scaling of network statistic graphs Changed Files: htroot/NetworkHistory.java |
Wed Oct 29 17:23:58 CET 2014 by orbiter | replaced old /api/table_p.xml servlet with /Tables_p.xml to avoid double code Changed Files: htroot/Tables_p.html, htroot/Tables_p.java |
Wed Oct 29 13:37:44 CET 2014 by Michael Peter Christen | added new index size history image in /Status.html page Changed Files: htroot/NetworkHistory.java, htroot/Status.html, source/net/yacy/visualization/ChartPlotter.java |
Wed Oct 29 13:21:35 CET 2014 by Michael Peter Christen | added network history in /Network.html?page=5 Changed Files: htroot/Network.html, htroot/Network.java, htroot/NetworkHistory.java, htroot/Tables_p.java |
Wed Oct 29 10:50:08 CET 2014 by Michael Peter Christen | added debug code for statistics about document attributes related to domains Changed Files: defaults/yacy.init, htroot/HostBrowser.html, htroot/HostBrowser.java |
Sun Oct 26 23:33:21 CET 2014 by reger | RankingSolr: display only available or configured boost fields Changed Files: htroot/RankingSolr_p.html, htroot/RankingSolr_p.java |
Fri Oct 24 12:57:37 CEST 2014 by Michael Peter Christen | replaced input text field with text field for index deletion with query and replaced GET with POST method. This should make it possible to tubmit here very large queries for deletion. Changed Files: .gitignore, htroot/IndexDeletion_p.html |
Fri Oct 24 12:32:44 CEST 2014 by sixcooler | bump to httpcore-4.3.3 Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/dependencies.txt, lib/httpcore-4.3.3.License, lib/httpcore-4.3.3.jar, nbproject/project.xml, pom.xml |
Mon Oct 20 18:05:37 CEST 2014 by orbiter | removed spaces in seedlist.xml to reduce data Changed Files: htroot/yacy/seedlist.xml |
Fri Oct 17 14:17:49 CEST 2014 by Michael Peter Christen | added field postprocessing.partialUpdate to settings which can be used to switch on or off partial updates. Both options should cause the same result. Default is on. Changed Files: defaults/yacy.init, source/net/yacy/search/Switchboard.java, source/net/yacy/search/schema/CollectionConfiguration.java |
Fri Oct 17 13:25:17 CEST 2014 by Michael Peter Christen | fix for a ssl bug that appear only in java 7. The bug was reported in http://forum.yacy-websuche.de/viewtopic.php?f=23&t=5407&p=30956#p30956 a solution was described in http://teknosrc.com/javax-net-ssl-sslprotocolexception-handshake-alert-unrecognized_name-solved/ which worked for this example given in the yacy forum Changed Files: source/net/yacy/yacy.java |
Fri Oct 17 12:45:26 CEST 2014 by Michael Peter Christen | concurrently initialize the error cache; extended also the cache by factor 10 up to 1000 entries. This error cache is only used to catch up paused crawls between shutdown+startup Changed Files: source/net/yacy/search/index/ErrorCache.java |
Fri Oct 17 12:44:28 CEST 2014 by Michael Peter Christen | ipv6 fix for api /yacy/seedlist.[json|xml], multiple IPs are now attached to the seed info. API clients must be adopted. Documentation will be fixed in http://www.yacy-websuche.de/wiki/index.php/Dev:APIseedlist Also added a new retrieval option for seeds, they can now be retrieved by their name with the get parameter name=<name> Changed Files: htroot/yacy/seedlist.java, htroot/yacy/seedlist.json |
Thu Oct 16 20:36:12 CEST 2014 by sixcooler | added a timeout on Jetty connectors Changed Files: source/net/yacy/http/Jetty9HttpServerImpl.java |
Wed Oct 15 18:13:54 CEST 2014 by sixcooler | do not overwrite yacy.conf in case of an exception may be a fix for http://mantis.tokeek.de/view.php?id=180 Changed Files: source/net/yacy/kelondro/util/FileUtils.java |
Wed Oct 15 11:19:25 CEST 2014 by Michael Peter Christen | removed warnings Changed Files: htroot/Table_API_p.java, htroot/Table_YMark_p.java, htroot/Tables_p.java, htroot/api/table_p.java, source/net/yacy/document/parser/augment/AugmentParser.java, source/net/yacy/search/Switchboard.java |
Wed Oct 15 11:07:08 CEST 2014 by Michael Peter Christen | automatically zoom to location/POI Changed Files: htroot/yacysearch_location.html |
Wed Oct 15 10:31:24 CEST 2014 by orbiter | enhanced graphics computation (avoiding long string parsing for colours) Changed Files: htroot/NetworkHistory.java, htroot/NetworkPicture.java, htroot/imagetest.java, source/net/yacy/dbtest.java, source/net/yacy/peers/graphics/NetworkGraph.java, source/net/yacy/peers/graphics/ProfilingGraph.java, source/net/yacy/visualization/ChartPlotter.java |
Wed Oct 15 09:13:23 CEST 2014 by orbiter | added proper copyright notice to OSM tiles presented at the search result page Changed Files: htroot/osm.java, source/net/yacy/peers/graphics/OSMTile.java |
Wed Oct 15 00:55:57 CEST 2014 by Michael Peter Christen | enhanced location search Changed Files: htroot/env/grafics/earthsearch.png, htroot/yacysearch_location.html, htroot/yacysearch_location.java, htroot/yacysearchtrailer.html |
Wed Oct 15 00:55:42 CEST 2014 by Michael Peter Christen | better profiling of solr queries Changed Files: source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java |
Mon Oct 13 23:51:19 CEST 2014 by Michael Peter Christen | added new solr field url_paths_count_i which can be used to enhance the index browser and maybe also for ranking; possibly also for SEO-with-YaCy applications. Changed Files: defaults/solr.collection.schema, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java |
Mon Oct 13 18:33:39 CEST 2014 by Michael Peter Christen | make browsing of file://z: - paths in index browser easier - this will now show the root paths on a shared drive Changed Files: htroot/HostBrowser.java |
Mon Oct 13 16:51:27 CEST 2014 by Michael Peter Christen | fix-fix for https://gitorious.org/yacy/rc1/commit/30d4402cd1bbd5629d23562178a049ef7c3b25e9 Changed Files: source/net/yacy/cora/document/feed/RSSMessage.java |
Sun Oct 12 06:32:13 CEST 2014 by reger | add filter to citation page and a on/off button to display only sentences with citations, while maintaining the sentence number. Make the filtered list the default in search result citation link Changed Files: htroot/api/citation.html, htroot/api/citation.java, htroot/yacysearchitem.html |
Sat Oct 11 09:02:12 CEST 2014 by Michael Peter Christen | explain crawl denial when not switched to intranet mode Changed Files: source/net/yacy/crawler/CrawlStacker.java |
Fri Oct 10 14:40:31 CEST 2014 by Michael Peter Christen | slightly enhanced Network table computation by using a lazy initialized bitfield for peer flags Changed Files: source/net/yacy/peers/Seed.java |
Fri Oct 10 14:32:21 CEST 2014 by Michael Peter Christen | refactoring (class name should start with uppercase letter) Changed Files: htroot/NetworkHistory.java, source/net/yacy/peers/Seed.java, source/net/yacy/utils/Bitfield.java |
Fri Oct 10 14:16:16 CEST 2014 by Michael Peter Christen | added also the NetworkHistory servlet... Changed Files: htroot/NetworkHistory.java |
Fri Oct 10 14:06:47 CEST 2014 by Michael Peter Christen | added network history graph image /NetworkHistory.png which can show many different statistics about the history of the peer. Changed Files: source/net/yacy/dbtest.java, source/net/yacy/kelondro/blob/BEncodedHeap.java, source/net/yacy/kelondro/blob/Tables.java, source/net/yacy/peers/graphics/ProfilingGraph.java, source/net/yacy/visualization/ChartPlotter.java |
Thu Oct 09 13:31:36 CEST 2014 by Marc Nause | Minor changes: *) reduced visibility of a method *) updated comments Changed Files: source/net/yacy/utils/upnp/UPnP.java |
Thu Oct 09 13:27:20 CEST 2014 by Michael Peter Christen | fix for values in CrawlProfileEditor table and xml; now the full profile is available in the xml. Changed Files: htroot/CrawlProfileEditor_p.html, htroot/CrawlProfileEditor_p.xml, source/net/yacy/crawler/data/CrawlProfile.java, source/net/yacy/server/serverObjects.java |
Wed Oct 08 18:48:57 CEST 2014 by Michael Peter Christen | fixed crawl profile xml result which did not show the correct crawl status. Changed Files: htroot/CrawlProfileEditor_p.xml, source/net/yacy/crawler/data/CrawlProfile.java |
Wed Oct 08 17:12:35 CEST 2014 by Michael Peter Christen | added another decoration flag to switch off network graphics in crawler monitor and index browser: decoration.grafics.linkstructure Please set this to false to remove the graphics from the interface. Changed Files: defaults/yacy.init, htroot/Crawler_p.java, htroot/HostBrowser.java, source/net/yacy/search/SwitchboardConstants.java |
Wed Oct 08 15:20:43 CEST 2014 by Michael Peter Christen | added a high cpu cycle monitor to PerformanceQueues Changed Files: htroot/PerformanceQueues_p.html, htroot/PerformanceQueues_p.java, htroot/PerformanceQueues_p.xml, source/net/yacy/kelondro/workflow/AbstractBusyThread.java, source/net/yacy/kelondro/workflow/BusyThread.java |
Wed Oct 08 15:04:35 CEST 2014 by Michael Peter Christen | less volume for effect sounds Changed Files: htroot/yacy/search.java, htroot/yacy/transferURL.java, source/net/yacy/gui/Audio.java, source/net/yacy/search/Switchboard.java |
Wed Oct 08 14:27:38 CEST 2014 by Michael Peter Christen | less load and more ram prerequisite for crawl steps Changed Files: defaults/yacy.init |
Tue Oct 07 23:42:41 CEST 2014 by Michael Peter Christen | removed the atmo sound clips because they had been too large Changed Files: htroot/env/soundclips/sources.txt, source/net/yacy/gui/Audio.java |
Tue Oct 07 22:36:01 CEST 2014 by Michael Peter Christen | ipv6 fix: avoid that shrinked own ip set is overwritten with (non-valid) set of local IPs Changed Files: source/net/yacy/search/Switchboard.java |
Tue Oct 07 18:32:39 CEST 2014 by Michael Peter Christen | argh.. adding missing java class for latest audio feature Changed Files: source/net/yacy/gui/Audio.java |
Mon Oct 06 04:51:31 CEST 2014 by reger | add link extraction to pdfParser this extracts clickable links in pdf and adds it to the list of links include a test case for this function this is the corrected comment for commit: https://gitorious.org/yacy/rc1/commit/aa2e15d846cdee90b70ea882747148b14f257c49 Changed Files: source/net/yacy/document/parser/pdfParser.java |
Sun Oct 05 20:05:03 CEST 2014 by reger | allow url parameter in worktable apicall allow url=wwwl?param=a¶m=b (with ?, & encoded) fix: http://mantis.tokeek.de/view.php?id=100 fix double adding of '&' in MultiProtocolURL.escape() Changed Files: source/net/yacy/document/parser/pdfParser.java, test/net/yacy/document/parser/pdfParserTest.java, test/parsertest/umlaute_linux.pdf |
Sun Oct 05 14:50:22 CEST 2014 by orbiter | lazy handling of process_sxt field (part of postprocessing) Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java |
Sat Oct 04 04:11:48 CEST 2014 by reger | allow url parameter in worktable apicall allow url=wwwl?param=a¶m=b (with ?, & encoded) fix: http://mantis.tokeek.de/view.php?id=100 fix double adding of '&' in MultiProtocolURL.escape() Changed Files: htroot/Load_RSS_p.java, source/net/yacy/cora/document/id/MultiProtocolURL.java, source/net/yacy/data/WorkTables.java |
Fri Oct 03 22:08:07 CEST 2014 by reger | preserve content_type (mime) if supplied in preference of construct in from file type. (this eventually can benefit image search by using mime only) reduce redundant field assignment for Solrdocuments created from URIMetadataNode (URIMetadataNode = SolrDocument with partially assigned fields) Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java |
Fri Oct 03 20:54:45 CEST 2014 by reger | upd to jsoup-1.8.1.jar Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/jsoup-1.8.1.jar, nbproject/project.xml, pom.xml |
Fri Oct 03 20:49:40 CEST 2014 by reger | open rejected urls in new browser Changed Files: htroot/IndexCreateParserErrors_p.html |
Fri Oct 03 01:43:05 CEST 2014 by reger | fix image search expand box, cut-off of 2nd capture line height tested with IE11 and Firefox 32 (change worked for both to show 2nd line without cutting off height) +fix charset parameter in metadataImageParser +update start errMsgTxt to "java 1.7" Changed Files: htroot/js/highslide/highslide.js, source/net/yacy/document/parser/images/metadataImageParser.java, source/net/yacy/yacy.java |
Thu Oct 02 09:38:06 CEST 2014 by Michael Peter Christen | typo in javadoc Changed Files: source/net/yacy/cora/protocol/HeaderFramework.java |
Wed Oct 01 23:53:41 CEST 2014 by reger | add html5 autofocus to query input field (leave onload untouched = redundant, for IE9 http://www.w3schools.com/tags/att_input_autofocus.asp) adjust Peer-to-Peer/ Privacy switch label to display "Peer-to-Peer" as 2nd switch option in active stealth mode Changed Files: htroot/index.html, htroot/yacysearch.html, htroot/yacysearch_location.html, htroot/yacysearchtrailer.html |
Wed Oct 01 04:35:34 CEST 2014 by reger | handle noarchive tag, skip writing page to cache http://mantis.tokeek.de/view.php?id=44 Changed Files: source/net/yacy/crawler/data/Cache.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java |
Wed Oct 01 03:47:57 CEST 2014 by Michael Peter Christen | when pinging other peers, be able to select the right IP option Changed Files: htroot/Network.java, source/net/yacy/peers/Network.java, source/net/yacy/peers/Protocol.java |
Tue Sep 30 22:22:13 CEST 2014 by reger | search result showPicture update search parameter used parameter &cat=image is obsolete and returns no results - remove &cat=image and &cat=href references - remove &tenant= references (unused) Use contentdom=image and inurl: parameter to make showPicture link display something (open in new window because of used inurl modifier changes original query) Changed Files: htroot/index.java, htroot/yacysearch.html, htroot/yacysearch.java, htroot/yacysearchitem.html, htroot/yacysearchtrailer.java |
Tue Sep 30 05:04:47 CEST 2014 by reger | added metadataImageParser for tif and psd (Photoshop) images. This is a modified genericImageParser adding tif (and psd) support even if java ImageIO plugin for tif is not installed in JDK. Adds just tif and psd to the available parsers. Uses the same library to extract metadata, so could eventually be merged with genericImageParser. All detected metadata are added to the parsed document (potentially some more as with genericImageParser) Changed Files: source/net/yacy/document/TextParser.java, source/net/yacy/document/parser/images/genericImageParser.java, source/net/yacy/document/parser/images/metadataImageParser.java |
Mon Sep 29 07:42:51 CEST 2014 by reger | use javax ImageIO getReader to add supported image extension/mime genericImageParser uses javax ImageIO, supported images depend on available plugins for ImageIO package (this is JDK installation specific). Jpeg, png and gif are availabel by default. Tif and others only on avalable plugin (in classpath). Add supported image type dynamically on startup. Changed Files: source/net/yacy/document/parser/images/genericImageParser.java |
Mon Sep 29 02:24:29 CEST 2014 by reger | remove unused variable timeout Changed Files: source/net/yacy/search/query/QueryParams.java |
Fri Sep 26 23:49:10 CEST 2014 by reger | skip loader wait cycle on concurrent access in nocache configuration. In nocache config resource is loaded online, leaving no benefit to wait for a faster cache hit. Changed Files: source/net/yacy/repository/LoaderDispatcher.java |
Wed Sep 24 13:32:58 CEST 2014 by Michael Peter Christen | activated the new apk parser which was already ready but not included in the parser initialization. To make the apk parser usable, the handling of application type links had to be modified. Now all documents which have not a parser attached are placed to the noload-queue while all other documents are parsed using the associated parser class. This may have side-Effects on other parsers and the display of different file classes (images, apps, videos). Changed Files: source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/crawler/retrieval/Request.java, source/net/yacy/document/TextParser.java, source/net/yacy/document/parser/apkParser.java |
Mon Sep 22 15:28:54 CEST 2014 by orbiter | added a hack to forward solr search results from an external attached solr to the YaCy built-in solr search servlet. Its not complete and not fully correct (there is still a utf8 encoding problem) but it is a way to get easily requests forwarded through YaCy to an external Solr. Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/responsewriter/EnhancedXMLResponseWriter.java, source/net/yacy/http/servlets/SolrSelectServlet.java, source/net/yacy/search/index/Fulltext.java |
Sun Sep 21 22:35:03 CEST 2014 by reger | add link to thread pool settings in status panel Changed Files: htroot/PerformanceQueues_p.html, htroot/Status_p.inc |
Sun Sep 21 03:48:54 CEST 2014 by reger | fix NPE in ViewFile - show snippet on document not in index Changed Files: htroot/ViewFile.html, htroot/ViewFile.java |
Sun Sep 21 00:10:20 CEST 2014 by reger | adjust link to peer in Network list (www path obsolete) Changed Files: htroot/Network.html |
Sun Sep 21 00:04:54 CEST 2014 by reger | upd Maven pom (to current dev version) Changed Files: pom.xml |
Thu Sep 18 14:36:57 CEST 2014 by Michael Peter Christen | added warning for not well-formed postprocessing queries Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java |
Thu Sep 18 14:26:45 CEST 2014 by Michael Peter Christen | added internal api for partial updates to Solr Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/ConcurrentUpdateSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java |
Thu Sep 18 11:11:09 CEST 2014 by orbiter | added option to reverse-sort YaCy tables (internal API change only) Changed Files: htroot/Table_YMark_p.java, htroot/Tables_p.java, htroot/api/table_p.java, source/net/yacy/crawler/CrawlSwitchboard.java, source/net/yacy/kelondro/blob/Tables.java |
Wed Sep 17 00:22:23 CEST 2014 by Michael Peter Christen | next development cycle. Please be careful with the usage of next commits, maybe new and unstable things will come... Changed Files: build.properties |
Tue Sep 16 23:14:13 CEST 2014 by reger | catch TimeoutException during ping and do not delete yacy.conf during prereadconfigfile found a situation after crash (reboot) with existing running semaphore but YaCy not running. Ping generated exception which finally deleted the conf file (during pre-read procedure) - change to ping (catch exception solved it) - additionally removed delete yacy.conf file (if needed we need to make a backup) Changed Files: source/net/yacy/cora/protocol/Scanner.java, source/net/yacy/cora/protocol/TimeoutRequest.java, source/net/yacy/migration.java, source/net/yacy/search/Switchboard.java, source/net/yacy/yacy.java |
Tue Sep 16 16:43:17 CEST 2014 by reger | better fix for NPE in image search replace https://gitorious.org/yacy/rc1/commit/8931e14514deff5ab66e7504c45bb78046e5f696 Changed Files: source/net/yacy/search/query/SearchEvent.java |