YaCy Release 1.3

Release 1.3

Major Changes   
Jump to: Bugfixes / Other Changes

CommitDescription
Thu Dec 27 04:16:31 CET 2012
by Michael Peter Christen
update to pdf parser
Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/apache-solr-core-4.0.0.License, lib/apache-solr-solrj-4.0.0.License, lib/fontbox-1.7.1.License, lib/fontbox-1.7.1.jar, lib/jempbox-1.7.1.License, lib/jempbox-1.7.1.jar, lib/pdfbox-1.7.1.License, lib/pdfbox-1.7.1.jar, source/net/yacy/document/parser/pdfParser.java
Sat Dec 22 16:27:14 CET 2012
by Michael Peter Christen
extended the Scheduler: introduced scheduled events
- an event type (once, regular) can be selected
- for this event type, a fixed time can be selected. This may be either
directly after startup or at one of the full hours at a day (==25
options)
The main point about this feature is the opportunity to start an action
directly after startup. That makes it possible to create YaCy
distributions which, after started at the first time, start to index
parts of the intranet/internet by itself.
Changed Files: htroot/Table_API_p.html, htroot/Table_API_p.java, source/net/yacy/cora/date/GenericFormatter.java, source/net/yacy/data/WorkTables.java, source/net/yacy/kelondro/data/meta/URIMetadataNode.java, source/net/yacy/kelondro/workflow/AbstractBusyThread.java, source/net/yacy/kelondro/workflow/BusyThread.java, source/net/yacy/search/Switchboard.java
Wed Dec 19 01:56:33 CET 2012
by Michael Peter Christen
using the author field as solr-native facet. this makes it necessary to
introduce a copy-field for the author field to be copied to a string
field. This field is then used to generate facets. Without this field,
the facet would consist only of the words of the author names, not of
the full author string.
Changed Files: defaults/solr/schema.xml, htroot/yacy/search.java, htroot/yacysearch.java, source/net/yacy/cora/federate/solr/YaCySchema.java, source/net/yacy/peers/Protocol.java, source/net/yacy/peers/RemoteSearch.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/RankingProcess.java, source/net/yacy/search/query/SearchEvent.java
Tue Dec 18 14:42:35 CET 2012
by Michael Peter Christen
- added a new solr field references_i which stores the number of
INCOMING links to the corresponding web page. This information is taken
from the reverse link index (a 'little sister' of the RWI index).
- this field can be of use to enhance the ranking because a web page
with more incoming links can be more more important than others. But
this is not true for typical link pages like menues. Therefore the
number of outgoing links is needed.
- added a new solr attribute 'bf' to solr queries which is a boost
function extension. this field can contain a formula which comuptes the
boost according to given field values. After some experiments the
following forumla is now default:
div(add(1,references_i),pow(add(1,inboundlinkscount_i),1.6))^0.4
This takes the number of references and the inbound links. Further
experiments are needed to enhance that forumula.
Changed Files: defaults/solr.keys.list, htroot/gsa/searchresult.java, source/net/yacy/cora/federate/solr/Boost.java, source/net/yacy/cora/federate/solr/YaCySchema.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/query/QueryParams.java
Tue Dec 18 02:29:03 CET 2012
by Michael Peter Christen
removed dependency of vocabulary navigation from Jena and it's
triplestore; the vocabulary search is now done using generic solr fields
which are created on-the-fly during runtime.
Changed Files: htroot/yacysearchtrailer.java, source/net/yacy/cora/federate/solr/YaCySchema.java, source/net/yacy/cora/lod/JenaTripleStore.java, source/net/yacy/cora/lod/vocabulary/Tagging.java, source/net/yacy/document/Condenser.java, source/net/yacy/document/Document.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/index/SolrConfiguration.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/RankingProcess.java, source/net/yacy/search/query/SearchEvent.java
Sat Dec 15 00:05:46 CET 2012
by Michael Peter Christen
distinguishing modified query string and original query string
Changed Files: htroot/AccessTracker_p.java, htroot/IndexControlRWIs_p.java, htroot/api/timeline.java, htroot/gsa/searchresult.java, htroot/yacysearch.java, htroot/yacysearchitem.java, htroot/yacysearchtrailer.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/query/AccessTracker.java, source/net/yacy/search/query/QueryGoal.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/RankingProcess.java
Sun Dec 02 16:54:29 CET 2012
by Michael Peter Christen
added a Boost class which stores solr query boost values. The class can
be configured using the yacy.init file. The boost information is taken
from the configuration each time when a query to solr is done.
Changed Files: defaults/yacy.init, htroot/Ranking_p.java, htroot/gsa/searchresult.java, htroot/solr/select.java, source/net/yacy/cora/document/analysis/EnhancedTextProfileSignature.java, source/net/yacy/cora/federate/solr/Boost.java, source/net/yacy/cora/federate/solr/YaCySchema.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/document/Condenser.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java, source/net/yacy/search/index/SolrConfiguration.java, source/net/yacy/search/query/QueryGoal.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/SearchEventCache.java, source/net/yacy/server/serverSwitch.java
Mon Nov 26 13:11:55 CET 2012
by Michael Peter Christen
using more pre-compile pattern for split methods
Changed Files: htroot/api/bookmarks/get_folders.java, htroot/api/ymarks/get_metadata.java, htroot/api/ymarks/get_treeview.java, source/net/yacy/cora/document/MultiProtocolURI.java, source/net/yacy/cora/document/RSSMessage.java, source/net/yacy/cora/document/analysis/Classification.java, source/net/yacy/cora/federate/solr/responsewriter/GSAResponseWriter.java, source/net/yacy/cora/language/synonyms/SynonymLibrary.java, source/net/yacy/cora/protocol/Domains.java, source/net/yacy/cora/protocol/HeaderFramework.java, source/net/yacy/cora/util/CommonPattern.java, source/net/yacy/kelondro/data/meta/DigestURI.java, source/net/yacy/search/index/SolrConfiguration.java, source/net/yacy/search/query/RankingProcess.java, source/net/yacy/search/query/SearchEvent.java, source/net/yacy/server/http/HTTPDFileHandler.java, source/net/yacy/server/http/RobotsTxtConfig.java, source/net/yacy/utils/cryptbig.java
Sat Nov 24 22:30:05 CET 2012
by Michael Peter Christen
- added a field cache for solr queries which call only for a single
value
- fixed a version conflict exception within a solr add request
Changed Files: htroot/PerformanceMemory_p.java, htroot/js/Crawler.js, source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java, source/net/yacy/cora/federate/solr/connector/MultipleSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/SolrConfiguration.java
Fri Nov 23 13:58:39 CET 2012
by Michael Peter Christen
- added another enumeration method in kelondro data structure to get a
more random access to data for the balancer
- added random access inside the balancer
Changed Files: source/net/yacy/crawler/Balancer.java, source/net/yacy/kelondro/index/BufferedObjectIndex.java, source/net/yacy/kelondro/index/Cache.java, source/net/yacy/kelondro/index/Index.java, source/net/yacy/kelondro/index/RAMIndex.java, source/net/yacy/kelondro/index/RAMIndexCluster.java, source/net/yacy/kelondro/index/RowCollection.java, source/net/yacy/kelondro/table/SQLTable.java, source/net/yacy/kelondro/table/SplitTable.java, source/net/yacy/kelondro/table/Table.java
Fri Nov 23 01:35:28 CET 2012
by Michael Peter Christen
removed overhead by preventing generation of full search results when
only the url is requested
Changed Files: htroot/IndexControlRWIs_p.java, htroot/IndexControlURLs_p.java, htroot/api/ymarks/add_ymark.java, htroot/gsa/searchresult.java, htroot/yacysearch.java, source/net/yacy/data/ymark/YMarkMetadata.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/RankingProcess.java
Wed Nov 21 18:46:49 CET 2012
by Michael Peter Christen
added a feature to find similarities in documents.
This uses an enhanced version of the Nutch/Solr TextProfileSignatue.
As a result, a signature of the document is written to the solr search
index. Additionally for each time when a signature is written, it is
checked if the singature exists already in the index. If the signature
does not exist, the document is marked as unique. The unique attribute
can now be used to sort document lists and bring duplicates to the end
of a result list.
To enable this, a large portion of the search api to Solr had to be
changed. This affected mainly caching of 'exists' searches to enhance
the check for existing signatures and do this without actually doing a
solr query.
Because here the first time a long number is used as value in the Solr
store, also the value naming in the YaCySchema had to be adopted and
normalized. This caused that many files had to be changed.
Changed Files: debian/changelog, defaults/solr.keys.list, htroot/PerformanceMemory_p.java, htroot/Ranking_p.java, htroot/index.java, htroot/yacy/search.java, htroot/yacysearch.java, htroot/yacysearchitem.java, source/net/yacy/cora/document/MultiProtocolURI.java, source/net/yacy/cora/document/analysis/Classification.java, source/net/yacy/cora/document/analysis/EnhancedTextProfileSignature.java, source/net/yacy/cora/federate/solr/SolrType.java, source/net/yacy/cora/federate/solr/YaCySchema.java, source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java, source/net/yacy/cora/federate/solr/connector/MultipleSolrConnector.java, source/net/yacy/cora/federate/solr/connector/RetrySolrConnector.java, source/net/yacy/cora/federate/solr/connector/ShardSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java, source/net/yacy/cora/federate/solr/responsewriter/EnhancedXMLResponseWriter.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/crawler/retrieval/FileLoader.java, source/net/yacy/crawler/retrieval/Response.java, source/net/yacy/crawler/retrieval/SMBLoader.java, source/net/yacy/document/Condenser.java, source/net/yacy/document/Document.java, source/net/yacy/kelondro/data/meta/URIMetadataNode.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/index/SolrConfiguration.java, source/net/yacy/search/query/QueryGoal.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/RankingProcess.java, source/net/yacy/search/query/SearchEvent.java, source/net/yacy/search/query/SnippetWorker.java, source/net/yacy/search/ranking/RankingProfile.java, source/net/yacy/search/snippet/MediaSnippet.java, source/net/yacy/server/http/HTTPDFileHandler.java, source/net/yacy/server/serverObjects.java
Mon Nov 19 17:24:34 CET 2012
by Michael Peter Christen
- added field options to all solr queries. This can be used to restrict
the actual data which is fetched from solr.
- used the new field options to reduce generic options like getting the
load date or the count of search results. should increase overall speed
- used the new field options to reduce overhead in the host browser
during aquisition of links.
- used the field options to make checking of links in crawler faster
- if the crawler is paused, the crawl queue is not cleaned
Changed Files: htroot/HostBrowser.java, source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java, source/net/yacy/cora/federate/solr/connector/MultipleSolrConnector.java, source/net/yacy/cora/federate/solr/connector/RetrySolrConnector.java, source/net/yacy/cora/federate/solr/connector/ShardSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/data/DidYouMean.java, source/net/yacy/document/parser/augment/AugmentParser.java, source/net/yacy/kelondro/data/meta/URIMetadataRow.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Fulltext.java
Sun Nov 18 22:11:04 CET 2012
by Michael Peter Christen
Merge commit '2bb8f045cc92f31fc7e720cc30b38af417563890'
Changed Files: htroot/ContentControl_p.html, htroot/ContentControl_p.java, source/net/yacy/contentcontrol/ContentControlFilterUpdateThread.java, source/net/yacy/contentcontrol/SMWListImporter.java, source/net/yacy/contentcontrol/SMWListImporterFormatObsolete.java, source/net/yacy/contentcontrol/SMWListRow.java, source/net/yacy/contentcontrol/SMWListSyncThread.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/query/SearchEvent.java
Sun Nov 18 01:22:41 CET 2012
by orbiter
redesign of the QueryParams class: introduced QueryGoal which holds the
query string parser. This shall be used to create a proper full-string
matching which is handled then by QueryGoal.
Changed Files: htroot/AccessTracker_p.java, htroot/api/timeline.java, htroot/gsa/searchresult.java, htroot/yacy/search.java, htroot/yacysearch.java, htroot/yacysearchitem.java, htroot/yacysearchtrailer.java, source/net/yacy/document/parser/html/AbstractScraper.java, source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/peers/Protocol.java, source/net/yacy/peers/RemoteSearch.java, source/net/yacy/peers/graphics/NetworkGraph.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/query/AccessTracker.java, source/net/yacy/search/query/QueryGoal.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/RankingProcess.java, source/net/yacy/search/query/SearchEvent.java, source/net/yacy/search/query/SecondarySearchSuperviser.java, source/net/yacy/search/query/SnippetWorker.java, source/net/yacy/search/snippet/TextSnippet.java
Tue Nov 13 16:54:28 CET 2012
by Michael Peter Christen
added deletion of hosts during crawl start if deleteold option was given
Changed Files: htroot/CrawlResults.java, htroot/Crawler_p.java, htroot/IndexControlURLs_p.java, htroot/api/timeline.java, htroot/gsa/searchresult.java, source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java, source/net/yacy/cora/federate/solr/connector/MultipleSolrConnector.java, source/net/yacy/cora/federate/solr/connector/RetrySolrConnector.java, source/net/yacy/cora/federate/solr/connector/ShardSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/Segment.java
Tue Nov 13 11:45:56 CET 2012
by Michael Peter Christen
because we have the inurl:<term> - searchmodifier, we don't actually
need regular expressions as search attributes. They had now been removed
from the advanced search page while they are still created internally.
The filter is then expressed against solr as regular expression filter
query. If the expression points out a selection of an specific protocol,
host or filetype this is then translated into a facetted query.
Changed Files: htroot/api/timeline.java, htroot/gsa/searchresult.java, htroot/index.html, htroot/index.java, htroot/yacysearch.html, htroot/yacysearch.java, htroot/yacysearchitem.java, htroot/yacysearchlatestinfo.java, htroot/yacysearchtrailer.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/query/QueryParams.java


Bugfixes   
Jump to: YaCy Release 1.3 top / Other Changes

CommitDescription
Thu Dec 27 03:15:50 CET 2012
by Michael Peter Christen
update to search tests (use yacy interface and a bugfix)
Changed Files: bin/search.sh, bin/searchtest.sh
Wed Dec 26 19:15:11 CET 2012
by Michael Peter Christen
fix for smb crawl situation (lost too many urls)
Changed Files: source/net/yacy/crawler/Balancer.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/crawler/robots/RobotsTxt.java
Mon Dec 24 04:13:38 CET 2012
by reger
fix SeedUpload setting propery name for include template file
Changed Files: htroot/Settings_p.java
Sun Dec 23 01:30:52 CET 2012
by reger
- apply fix for localhost handling (from yacy2solr) also to metadata2solr 
Changed Files: source/net/yacy/search/index/SolrConfiguration.java
Sat Dec 22 23:03:39 CET 2012
by reger
fix: exception if default work files don't exist
Changed Files: source/net/yacy/search/Switchboard.java
Sat Dec 22 21:16:22 CET 2012
by Michael Peter Christen
fix for event starter: delete start time when event is removed
Changed Files: htroot/Table_API_p.java, source/net/yacy/data/WorkTables.java, source/net/yacy/search/Switchboard.java
Sat Dec 22 20:52:52 CET 2012
by Michael Peter Christen
fix for config basic: do not accept empty peer names
Changed Files: htroot/ConfigBasic.java
Sun Dec 16 20:53:45 CET 2012
by reger
fix: set defaul language to "en"
Changed Files: source/net/yacy/kelondro/data/word/WordReferenceVars.java
Mon Dec 10 21:08:04 CET 2012
by reger
quickfix for translated link containig word "browse" in ru & uk, see http://bugs.yacy.net/view.php?id=213
Changed Files: locales/ru.lng, locales/uk.lng
Fri Dec 07 01:35:02 CET 2012
by Michael Peter Christen
fix for bad xml in gsa result when doing a query with quotes
Changed Files: source/net/yacy/cora/federate/solr/responsewriter/GSAResponseWriter.java
Fri Dec 07 00:31:10 CET 2012
by Michael Peter Christen
fix for wrong display of error urls in HostBrowser
Changed Files: htroot/HostBrowser.html, htroot/HostBrowser.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/crawler/HarvestProcess.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/crawler/retrieval/SitemapImporter.java, source/net/yacy/search/Switchboard.java
Thu Dec 06 17:40:52 CET 2012
by Michael Peter Christen
fix for waitingtime computation for intranet configuration
Changed Files: source/net/yacy/crawler/data/Latency.java
Wed Dec 05 22:05:49 CET 2012
by Michael Peter Christen
patch for funny symbols in url paths (like tilde)
Changed Files: htroot/HostBrowser.java
Sat Dec 01 22:41:21 CET 2012
by reger
fix: prevent regex pattern compile error for blacklist import for path '*' (extend it to '.*')
Changed Files: source/net/yacy/repository/Blacklist.java
Sat Dec 01 01:14:29 CET 2012
by reger
fix: respect config setting of "show Nav Top-Menu" in HostBrowser.html for public users (as hostbrowser is now available in search results)
Changed Files: htroot/HostBrowser.html, htroot/HostBrowser.java
Sun Nov 25 15:43:42 CET 2012
by Michael Peter Christen
added debug code to crawler monitor
Changed Files: htroot/Crawler_p.html, htroot/Crawler_p.java, source/net/yacy/crawler/CrawlSwitchboard.java
Sat Nov 24 10:27:29 CET 2012
by orbiter
fixes for filesystem indexing
Changed Files: source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java, source/net/yacy/crawler/data/CrawlProfile.java, source/net/yacy/crawler/robots/RobotsTxt.java
Sun Nov 11 21:19:18 CET 2012
by reger
fix: remove fixed individual testing IP (85.25.151.30 = server4you.de) from default/yacy.network.freeworld.unit
Changed Files: defaults/yacy.network.freeworld.unit


Other Changes   
Jump to: YaCy Release 1.3 top / Bugfixes

CommitDescription
Thu Dec 27 05:11:11 CET 2012
by Michael Peter Christen
Release 1.3
Changed Files: build.properties
Thu Dec 27 04:37:21 CET 2012
by Michael Peter Christen
updated slf4j and log4j
Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/jcl-over-slf4j-1.7.2.jar, lib/log4j-1.2.17.License, lib/log4j-1.2.17.jar, lib/log4j-over-slf4j-1.7.2.jar, lib/slf4j-api-1.7.2.jar, lib/slf4j-jdk14-1.7.2.jar
Thu Dec 27 03:19:21 CET 2012
by Michael Peter Christen
use the search configuration to default the cacheStrategy to the value
as given in the search configuration
Changed Files: htroot/yacysearch.java, source/net/yacy/cora/federate/yacy/CacheStrategy.java, source/net/yacy/search/query/QueryGoal.java
Thu Dec 27 03:17:45 CET 2012
by Michael Peter Christen
use solr boost configuration to select search fields. At this time it is
possible to enter a negative boost value to switch that value off. This
might be different in the future with a better input interface.
Changed Files: defaults/yacy.init, source/net/yacy/cora/federate/solr/Boost.java, source/net/yacy/search/SwitchboardConstants.java
Wed Dec 26 21:25:27 CET 2012
by Michael Peter Christen
- made image search in interactive search using the ViewImage servlet -
that enables viewing of images for intranet SMB servers.
- added a filter search for protocol, tld and ext again; otherwise p2p
search produces a lot of rubbish
Changed Files: htroot/js/yacyinteractive.js, source/net/yacy/search/query/QueryParams.java
Mon Dec 24 23:29:02 CET 2012
by reger
SeedUpload url : check to reject localhost url included in saveSeedList (same check as in / copied from Seed.isProper() ), to prevent identity change on next startup (due to rejected seeduploadurl).
Changed Files: source/net/yacy/peers/Network.java
Sat Dec 22 20:54:05 CET 2012
by Michael Peter Christen
copy work tables from defaults/data/work if exist there and not in
DATA/WORK
This can be used to create start-up behavior work scripts in the
api.bheap table
Changed Files: source/net/yacy/data/WorkTables.java, source/net/yacy/search/Switchboard.java
Wed Dec 19 12:45:40 CET 2012
by Michael Peter Christen
removed protocol, tld, ext from the urlmask and created specific
navigation field for these
Changed Files: htroot/yacy/search.java, htroot/yacysearch.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/SearchEvent.java
Wed Dec 19 10:41:22 CET 2012
by Michael Peter Christen
search process enhancements
Changed Files: htroot/yacysearchtrailer.java, source/net/yacy/cora/protocol/ftp/FTPClient.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/RankingProcess.java, source/net/yacy/search/query/SearchEvent.java
Wed Dec 19 02:38:05 CET 2012
by Michael Peter Christen
- removed all extension types from extension navigation which are not
proper/known
- automatically show the protocol navigation if there is more than http
and https
- automatically show the extension navigation if there is some media
content
Changed Files: htroot/yacysearchtrailer.java, source/net/yacy/cora/document/analysis/Classification.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/SearchEvent.java
Wed Dec 19 01:54:35 CET 2012
by Michael Peter Christen
using the publisher information for the author field if no author is
given. This applies to cases where only the copyright field in the html
header is filled but not the author field
Changed Files: source/net/yacy/search/index/SolrConfiguration.java
Wed Dec 19 01:00:57 CET 2012
by Michael Peter Christen
- using a filter query for facet restriction
- calculating the whole search result in at most two sub-queries from
solr
Changed Files: source/net/yacy/peers/RemoteSearch.java, source/net/yacy/search/query/QueryParams.java
Wed Dec 19 00:59:40 CET 2012
by Michael Peter Christen
using the solr facets as navigation in yacyinteractive.html instead of
counting locally result types
Changed Files: htroot/js/yacyinteractive.js
Tue Dec 18 17:20:42 CET 2012
by Michael Peter Christen
added another solr field clickdepth_i which reflects the number of
clicks which are necessary to get from the portal of a host to a
specific document. At this time, only the start document is flagged with
clickdepth '0', all other with '-1'. To get the actual clickdepth, a
process must use crawled information to collect the actual number of
clicks. This will be added in another/next step.
Changed Files: defaults/solr.keys.list, source/net/yacy/cora/federate/solr/YaCySchema.java, source/net/yacy/search/index/SolrConfiguration.java
Tue Dec 18 12:52:20 CET 2012
by Michael Peter Christen
- fix for localhost detection
- added IPv6 patterns for localhost detection
Changed Files: source/net/yacy/cora/protocol/Domains.java, source/net/yacy/crawler/data/CrawlQueues.java
Sun Dec 16 21:01:13 CET 2012
by reger
PerformanceQueues: disable input for hardcoded httpd performance values
Changed Files: htroot/PerformanceQueues_p.html, htroot/PerformanceQueues_p.java
Sat Dec 15 09:14:49 CET 2012
by Michael Peter Christen
- fixes for host navigation
- fixes for filetype navigation
- removed unused code
Changed Files: htroot/yacy/search.html, htroot/yacy/search.java, htroot/yacysearch.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/query/RankingProcess.java
Tue Dec 11 13:38:28 CET 2012
by Michael Peter Christen
- fixed 'delete from subpath' during crawl start which deleted nothing;
now works;
- changed some crawl start html design details
Changed Files: htroot/CrawlStartExpert_p.html, htroot/Crawler_p.java, source/net/yacy/search/index/Fulltext.java
Tue Dec 11 10:44:25 CET 2012
by Aleksej
fixes in the Russian translation, chmod a-x cn.lng
Changed Files: locales/cn.lng, locales/ru.lng
Mon Dec 10 21:17:45 CET 2012
by orbiter
if maxFileSize < 0 then the file size limit is without limit.
Changed Files: source/net/yacy/cora/protocol/http/HTTPClient.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/kelondro/blob/ArrayStack.java
Mon Dec 10 21:01:14 CET 2012
by orbiter
more search command tools
Changed Files: bin/search.sh, bin/search1.sh, bin/searchall.sh, bin/searchall1.sh, bin/up.sh, bin/up1.sh
Mon Dec 10 21:00:30 CET 2012
by orbiter
you can now search for '*' to get just ALL entries in the search index
as result list. This makes sense if you intend to search just by using
the navigation tools to cut the data set into navigation 'slices'.
Changed Files: htroot/yacysearch.java
Mon Dec 10 20:59:43 CET 2012
by orbiter
allow larger no-proxy expressions
Changed Files: htroot/Settings_Proxy.inc
Mon Dec 10 20:55:11 CET 2012
by orbiter
you can now search for '*' to get just ALL entries in the search index
as result list. This makes sense if you intend to search just by using
the navigation tools to cut the data set into navigation 'slices'.
Changed Files:
Mon Dec 10 20:44:29 CET 2012
by orbiter
re-integrating useForHost method (lost sometime?) to get the noProxy
pattern working again. Without using this method all remote urls
including the localhost had been accessed through the configured proxy
Changed Files: source/net/yacy/cora/protocol/http/HTTPClient.java, source/net/yacy/cora/protocol/http/ProxySettings.java, source/net/yacy/search/Switchboard.java, source/net/yacy/server/http/HTTPDProxyHandler.java
Mon Dec 10 20:02:35 CET 2012
by reger
fix Servlet template on conditional file include with use of conditional template pattern in included template file (example IndexCreateQueues_p.html)
see bug http://bugs.yacy.net/view.php?id=215
Changed Files: source/net/yacy/server/http/TemplateEngine.java
Mon Dec 10 07:22:42 CET 2012
by orbiter
- fix for bad url conversion in bookmarks when using smb urls
- fix for localhost hosts in solr schema host handling
Changed Files: source/net/yacy/data/BookmarksDB.java, source/net/yacy/search/index/SolrConfiguration.java
Sat Dec 08 06:34:48 CET 2012
by reger
- making blacklist path part case insensitive (solving http://bugs.yacy.net/view.php?id=171)
- blacklist test adding explicite response text "not blocked" if no blacklist match
Changed Files: htroot/BlacklistTest_p.html, htroot/BlacklistTest_p.java, source/net/yacy/repository/Blacklist.java
Sat Dec 08 00:19:20 CET 2012
by reger
remove NOT NEEDED reference to solr.YaCySchema from ConfigurationSet to be able to use ConfigurationSet for other conf files (than solr.keys.default.list).
Changed Files: source/net/yacy/cora/federate/yacy/ConfigurationSet.java
Fri Dec 07 15:49:23 CET 2012
by Michael Peter Christen
introduced a better place to update the lastacc time value in latency
Changed Files: source/net/yacy/crawler/data/Latency.java, source/net/yacy/crawler/retrieval/FTPLoader.java, source/net/yacy/crawler/retrieval/HTTPLoader.java
Fri Dec 07 15:35:44 CET 2012
by Michael Peter Christen
removed Latency update after URL selection because that causes
a completely wrong behaviour when cache fresh cases appear. Makes
re-crawling MUCH faster!
Changed Files: source/net/yacy/crawler/data/Latency.java
Fri Dec 07 14:56:34 CET 2012
by Michael Peter Christen
- clear the search cache when altering the solr boosts
- better positions for submit buttons
Changed Files: htroot/RankingSolr_p.html, htroot/RankingSolr_p.java
Fri Dec 07 14:54:49 CET 2012
by Michael Peter Christen
using a filter query for the site parameter in GSA api
Changed Files: htroot/gsa/searchresult.java
Fri Dec 07 02:00:12 CET 2012
by Michael Peter Christen
latency fix: only set last-visit time if access was actually by the
robot
Changed Files: source/net/yacy/crawler/retrieval/HTTPLoader.java
Fri Dec 07 01:27:24 CET 2012
by Michael Peter Christen
added another blacklist-cleaner into balancer
Changed Files: source/net/yacy/crawler/Balancer.java
Thu Dec 06 00:12:16 CET 2012
by Michael Peter Christen
- check blacklist (again) when taking urls from the crawl stack because
the blacklist may get extended during crawling
- removed debug output
Changed Files: source/net/yacy/crawler/Balancer.java
Wed Dec 05 18:20:43 CET 2012
by Michael Peter Christen
more robustness during shutdown
Changed Files: source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java
Wed Dec 05 18:16:06 CET 2012
by Michael Peter Christen
Brute-force attempt to start solr in case of a memory problem.
I don't actually know if this is correct. It is a desperate try to get
YaCy running on production servers which must get alive even with
strange hacks like this. This is also related to a forum posting in
http://forum.yacy-websuche.de/viewtopic.php?t=4528&p=27135#p27135
Changed Files: source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/RemoteSolrConnector.java
Wed Dec 05 12:26:42 CET 2012
by Michael Peter Christen
update to Solr Boost handling
Changed Files: htroot/RankingSolr_p.java, htroot/gsa/searchresult.java, source/net/yacy/cora/federate/solr/Boost.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/query/QueryGoal.java, source/net/yacy/search/query/SearchEventCache.java
Mon Dec 03 17:01:19 CET 2012
by Michael Peter Christen
Added a new servlet to configure the solr ranking using field boosts
Changed Files: htroot/RankingRWI_p.html, htroot/RankingSolr_p.html, htroot/RankingSolr_p.java, htroot/env/templates/submenuSearchConfiguration.template, source/net/yacy/cora/federate/solr/Boost.java
Mon Dec 03 00:01:41 CET 2012
by Michael Peter Christen
renamed Ranking_p.html to RankingRWI_p.html
because there will be another Ranking servlet as well at next
Changed Files: htroot/RankingRWI_p.html, htroot/RankingRWI_p.java, htroot/env/templates/submenuSearchConfiguration.template
Sun Dec 02 17:29:37 CET 2012
by Michael Peter Christen
enhanced exists()-method for solr; should reduce a lot of IO during DHT
target selection
Changed Files: source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java
Sun Dec 02 16:53:02 CET 2012
by Michael Peter Christen
added number of characters in url to default index to be able to use
this field for ranking
Changed Files: defaults/solr.keys.list
Sun Dec 02 16:52:12 CET 2012
by Michael Peter Christen
added more logging to get info which url causes performance problems
Changed Files: source/net/yacy/document/TextParser.java
Wed Nov 28 00:09:53 CET 2012
by reger
prevent Solr "version conflict" on update by set Solr "_version_" field to 0 (=no version check)
Changed Files: source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java
Mon Nov 26 15:18:51 CET 2012
by Michael Peter Christen
improvements in GSA result writer
Changed Files: source/net/yacy/cora/federate/solr/responsewriter/GSAResponseWriter.java
Mon Nov 26 13:40:53 CET 2012
by Michael Peter Christen
replaced more split and replaceAll missing pattern pre-compilation with
pre-compiled pattern
Changed Files: htroot/api/bookmarks/get_folders.java, htroot/gsa/searchresult.java, source/net/yacy/YaCySearchClient.java, source/net/yacy/cora/document/MultiProtocolURI.java, source/net/yacy/cora/document/analysis/Classification.java, source/net/yacy/cora/lod/vocabulary/Tagging.java, source/net/yacy/cora/util/CommonPattern.java, source/net/yacy/crawler/data/CrawlProfile.java, source/net/yacy/data/wiki/WikiBoard.java
Mon Nov 26 12:24:35 CET 2012
by Michael Peter Christen
enhanced search result processing behavior
- query less at one time; query more often
- in between the small queries, evaluate results
- remove fields from search results which are not needed
Changed Files: htroot/gsa/searchresult.java, source/net/yacy/cora/federate/solr/responsewriter/GSAResponseWriter.java, source/net/yacy/crawler/CrawlSwitchboard.java, source/net/yacy/peers/Protocol.java, source/net/yacy/peers/RemoteSearch.java, source/net/yacy/search/query/QueryParams.java, source/net/yacy/search/query/SearchEvent.java
Sun Nov 25 22:49:26 CET 2012
by reger
fix: display and calculate authors and namespace search navigator if configured (otherwise skip overhead)
(leave hosts, topics and  not in ConfigPortal included filetype,  protocoll navigator untouched) 
Changed Files: source/net/yacy/search/query/SearchEvent.java
Sun Nov 25 12:20:41 CET 2012
by orbiter
added link to
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
to the /RegexTest.html servlet
Changed Files: htroot/RegexTest.html
Sun Nov 25 11:58:57 CET 2012
by orbiter
start the local search only if this peer is doing a remote search or
when it is doing a local search and the peer is old
Changed Files: source/net/yacy/search/query/SearchEvent.java
Sun Nov 25 01:34:39 CET 2012
by Michael Peter Christen
- removed multi-add of documents (no used)
- inserted specialized code for size request
Changed Files: source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java, source/net/yacy/cora/federate/solr/connector/MultipleSolrConnector.java, source/net/yacy/cora/federate/solr/connector/RetrySolrConnector.java, source/net/yacy/cora/federate/solr/connector/ShardSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java
Fri Nov 23 14:09:48 CET 2012
by Michael Peter Christen
introduced more structure in HostBrowser, table view, better counting,
distinguishing of error cases (fail/excluded)
Changed Files: htroot/HostBrowser.html, htroot/HostBrowser.java
Fri Nov 23 14:00:30 CET 2012
by Michael Peter Christen
added a new fail type attribute for the index to distinguish two
separate fail types: network fail and forced exclusion (i.e. by robots
or forwarding rules).
Changed Files: defaults/solr.keys.list, source/net/yacy/cora/federate/solr/FailType.java, source/net/yacy/cora/federate/solr/YaCySchema.java, source/net/yacy/crawler/data/ZURL.java, source/net/yacy/search/index/SolrConfiguration.java
Thu Nov 22 13:03:33 CET 2012
by Michael Peter Christen
- using edismax in gsa interface
- generating less field data for gsa search results
- using a boost query in gsa interface to move double content to the end
of the result list
Changed Files: htroot/gsa/searchresult.java, source/net/yacy/cora/federate/solr/responsewriter/GSAResponseWriter.java
Sun Nov 18 22:04:34 CET 2012
by Michael Peter Christen
Merge remote-tracking branch 'reger/master'
Changed Files: defaults/yacy.network.freeworld.unit
Sun Nov 18 22:04:11 CET 2012
by Michael Peter Christen
Merge remote-tracking branch 'regerdev/master'
Changed Files: nbproject/project.xml, source/net/yacy/kelondro/data/meta/URIMetadataRow.java
Sun Nov 18 16:03:34 CET 2012
by Michael Peter Christen
FINALLY YaCy can now search for full strings using double- or
singlequoted strings in the search query line!!!
Changed Files: htroot/api/timeline.java, htroot/gsa/searchresult.java, htroot/yacysearch.java, htroot/yacysearchitem.java, source/net/yacy/search/query/QueryGoal.java, source/net/yacy/search/query/QueryParams.java
Tue Nov 13 17:32:19 CET 2012
by cominch
content control: use up-to-date definitions
Changed Files: source/net/yacy/search/Switchboard.java
Tue Nov 13 10:54:21 CET 2012
by orbiter
- redesign of crawl start servlet
- for domain-limited crawls, the domain is deleted now by default before
the crawl is started
Changed Files: htroot/CrawlStartExpert_p.html, htroot/CrawlStartSite_p.html, htroot/Crawler_p.java
Mon Nov 12 11:19:39 CET 2012
by orbiter
- removed scheduled crawling options in crawl start because it is
superfluous there; it can be changed in the scheduler servlet. It's also
confusing in the presence of the delete-option, which will be
implemented next.
- removed unused crawl start servlet
- some refactoring to make the time parser reusable
Changed Files: htroot/CrawlStartExpert_p.html, htroot/CrawlStartExpert_p.java, htroot/CrawlStartSite_p.html, htroot/Crawler_p.java
Mon Nov 12 11:17:50 CET 2012
by cominch
SMW Import: replaced JSON import routines with stable ones
Changed Files: source/net/yacy/contentcontrol/SMWListImporterFormatObsolete.java, source/net/yacy/contentcontrol/SMWListSyncThread.java
Fri Nov 09 16:25:24 CET 2012
by Michael Peter Christen
removed hightlighting of search results within collections in GSA
interface
Changed Files: htroot/gsa/searchresult.java, htroot/solr/select.java
Fri Nov 09 16:24:56 CET 2012
by Michael Peter Christen
added icons and a selection for hosts with urls pending for crawler or
with errors
Changed Files: htroot/HostBrowser.html, htroot/HostBrowser.java, htroot/env/grafics/burn-e.gif, htroot/env/grafics/construction.gif
Fri Nov 09 16:22:24 CET 2012
by cominch
refactor package
Changed Files: source/net/yacy/contentcontrol/ContentControlFilterUpdateThread.java, source/net/yacy/contentcontrol/SMWListImporter.java, source/net/yacy/contentcontrol/SMWListRow.java, source/net/yacy/contentcontrol/SMWListSyncThread.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/query/SearchEvent.java
Fri Nov 09 15:44:59 CET 2012
by cominch
remove old SMW importer which was part of the ymarks package
Changed Files: source/net/yacy/interaction/contentcontrol/SMWListSyncThread.java
Fri Nov 09 13:48:40 CET 2012
by cominch
update and generalization of the SMW import and content control routines
Changed Files: htroot/ContentControl_p.html, htroot/ContentControl_p.java, source/net/yacy/interaction/contentcontrol/ContentControlFilterUpdateThread.java, source/net/yacy/interaction/contentcontrol/SMWListImporter.java, source/net/yacy/interaction/contentcontrol/SMWListRow.java, source/net/yacy/interaction/contentcontrol/SMWListSyncThread.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/query/SearchEvent.java