File2 — provides access to file systems
The File component provides access to file systems, allowing files to be processed by any other Apache Camel components or messages from other components to be saved to disk.
The URI format for a file endpoint is one of:
file:directoryName
[?options
] file://directoryName
[?options
]
Where directoryName represents the underlying file directory.
You can append query options to the URI in the following format,
?option=value&option=value&...
![]() | Note |
---|---|
Apache Camel supports only endpoints configured with a starting directory, so
directoryName must be a directory. If you want
to consume a single file only, use the fileName
option; for example, by setting |
![]() | Warning |
---|---|
The JDK File IO API is limited in detecting whether another application is
currently writing/copying a file, and the implementation can differ depending on OS
platform. This could lead Apache Camel to assume that the file is not locked by another
process and to start consuming it. Therefore you must investigate what will suit
your environment. To help you, Apache Camel provides different
|
Table 8, “Common file options” list the options that can be set on any file endpoint.
Table 8. Common file options
Name | Default | Description |
---|---|---|
autoCreate
|
true
| Specifies whether to automatically create missing directories in the file's pathname. For the file consumer, it creates the starting directory. For the file producer, it creates the directory where the files should be written. |
bufferSize
| 128kb | Specifies the size, in bytes, of the Write buffer. |
fileName
|
null
| Use Expression such as File Language to dynamically set the filename. For consumers, it's used as a filename filter. For producers, it's used to evaluate the filename to write. If an expression is set, it takes precedence over the The expression options support both For the consumer, you can use it to filter filenames, enabling you to consume, for example,
today's file using the File
Language syntax:
From Camel 2.11 onwards, producers support the
|
flatten
|
false
| Specifies whether to flatten the file name path, stripping away any leading paths to leave just the file name. Flattening allows you to consume recursively into subdirectories, but when writing them to another directory, the files are written to a single directory. Setting this option to |
charset
|
null
| Camel 2.5: this option is used to specify the encoding of the file, and camel will set the Exchange property with Exchange.CHARSET_NAME with the value of this option. |
copyAndDeleteOnRenameFail
|
true
| Camel 2.9: whether to fallback and do a copy and delete file, in case the file could not be renamed directly. This option is not available for the FTP component. |
Table 9, “File consumer options” list the options that can be set on a file consuming endpoint.
Table 9. File consumer options
Name | Default | Description |
---|---|---|
initialDelay
|
1000
| Milliseconds before polling the file/directory starts. |
delay
|
500
| Milliseconds before the next poll of the file/directory. |
useFixedDelay
|
true
| Set to true to use fixed delay between pools, otherwise
fixed rate is used. See ScheduledExecutorService in JDK for details. |
runLoggingLevel
|
TRACE
| Camel 2.8: The consumer logs a start/complete log line when it polls. This option allows you to configure the logging level for that. |
recursive
|
false
| If a directory, will look for files in all the sub-directories as well. |
delete
|
false
| If true , the file will be deleted after it is processed |
noop
|
false
| If true , the file is not moved or deleted in any way. This
option is good for readonly data, or for ETL type
requirements. If noop=true , Apache Camel will set
idempotent=true as well, to avoid consuming the same
files over and over again. |
preMove
|
null
| Use Expression such as File Language to dynamically set the filename
when moving it before processing. For example
to move in-progress files into the order directory set this
value to order . |
move
|
.camel
| Use Expression such as File Language to dynamically set the filename
when moving it after processing. To move files
into a .done subdirectory just enter
.done . |
moveFailed
|
null
| Use Expression such as File Language to dynamically set the filename
when moving failed files after processing. To
move files into a error subdirectory just enter
error . Note: When moving
the files to another location it can/will handle the error when you move it to another location so
Apache Camel cannot pick up the file again. |
include
|
null
| Is used to include files, if filename matches the regex pattern. |
exclude
|
null
| Is used to exclude files, if filename matches the regex pattern. |
antInclude
|
null
| Camel 2.10: Ant style filter inclusion, for
example
antInclude=\* {}*/*{} .txt .
Multiple inclusions may be specified in comma-delimited format. See Filtering using ANT path matcher for more details about
ant path filters. |
antExclude
|
null
| Camel 2.10: Ant style filter exclusion. If both
antInclude and antExclude are used,
antExclude takes precedence over
antInclude . Multiple exclusions may be specified in
comma-delimited format. See Filtering using ANT path matcher
for more details about ant path filters. |
antFilterCaseSensitive
|
true
| Camel 2.11: Ant style filter which is case sensitive or not. |
idempotent
|
false
| Option to use the Idempotent
Consumer EIP pattern to let Apache Camel skip already processed
files. By default, uses a memory- based LRUCache that holds 1000
entries. If noop=true , idempotent will be enabled to
avoid consuming the same files over and over again. |
idempotentKey
|
Expression
| Camel 2.11: To use a custom idempotent key.
By default, the absolute path of the file is used. You can use the File Language, for example to use the
file name and file size, you can do:
idempotentKey=$-$ . |
idempotentRepository
|
null
| Pluggable repository as a org.apache.camel.processor.idempotent.MessageIdRepository class.
Will by default use MemoryMessageIdRepository if none is
specified and idempotent is true . |
inProgressRepository
|
memory
| Pluggable in-progress repository as a org.apache.camel.processor.idempotent.MessageIdRepository class. The in-progress repository is used to account the current in progress files being consumed. By default a memory based repository is used. |
filter
|
null
| Pluggable filter as a
org.apache.camel.component.file.GenericFileFilter class.
Will skip files if filter returns false in its
accept() method. Apache Camel also ships with an ANT path matcher filter in the
camel-spring component. See Filtering using ANT path matcher
for more details.
|
sorter
|
null
| Pluggable sorter as a java.util.Comparator<org.apache.camel.component.file.GenericFile> class. |
sortBy
|
null
| Built-in sort using the File Language. Supports nested sorts, so you can have a sort by file name and as a 2nd group sort by modified date. See sorting section below for details. |
readLock
|
markerFile
|
Used by consumer, to only poll the files if it has exclusive read-lock on the file (i.e. the file is not in-progress or being written). Apache Camel will wait until the file lock is granted. The
|
readLockTimeout
| 0 (for FTP, 2000 ) | Optional timeout in milliseconds for the read-lock, if supported by the
read-lock. If the read-lock could not be granted and the timeout triggered, then
Apache Camel will skip the file. At next poll Apache Camel, will try the file again, and
this time maybe the read-lock could be granted. Currently
fileLock , changed and
rename support the timeout. |
readLockCheckInterval
|
1000 (for FTP, 5000 ) |
Camel 2.6: Interval in millis for the
read-lock, if supported by the read lock. This interval is used for sleeping
between attempts to acquire the read lock. For example when using the
changed read lock, you can set a higher interval period
to cater for slow writes. The default of 1 sec. may be
too fast if the producer is very slow writing the file.
|
readLockMinLength
|
1
| Camel 2.10.1: This option applied only for
readLock=changed . This option allows you to configure a
minimum file length. By default Camel expects the file to contain data, and thus
the default value is 1. You can set this option to zero, to allow consuming
zero-length files. |
readLockLoggingLevel
|
WARN
| Camel 2.12: Logging level used when a read lock could not be acquired. By default a WARN is logged. You can change this level, for example to OFF to not have any logging. This option is only applicable for readLock of types: changed, fileLock, rename. |
exclusiveReadLockStrategy
|
null
| Pluggable read-lock as a
org.apache.camel.component.file.GenericFileExclusiveReadLockStrategy
implementation. |
minDepth
| 0 | Camel 2.8: The minimum depth to start
processing when recursively processing a directory. Using
minDepth=1 means the base directory. Using
minDepth=2 means the first sub directory. This option is
not supported by FTP2 consumer. |
maxDepth
|
Integer.MAX_VALUE
| Camel 2.8: The maximum depth to traverse when recursively processing a directory. This option is not supported by FTP2 consumer. |
doneFileName
|
null
| Camel 2.6: If provided, Camel will only consume files if a done file exists. This option configures what file name to use. Either you can specify a fixed name. Or you can use dynamic placeholders. The done file is always expected in the same folder as the original file. See using done file and writing done file sections for examples. |
processStrategy
|
null
| A pluggable
org.apache.camel.component.file.GenericFileProcessStrategy
allowing you to implement your own readLock option or
similar. Can also be used when special conditions must be met before a file can
be consumed, such as a special ready file exists. If this
option is set then the readLock option does not apply. |
maxMessagesPerPoll
|
0
| An integer that defines the maximum number of messages to gather per poll. By default, no maximum is set. Can be used to set a limit of e.g. 1000 to avoid having the server read thousands of files as it starts up. Set a value of 0 or negative to disabled it. |
startingDirectoryMustExist
|
false
| Whether the starting directory must exist. Mind that the autoCreate
option is default enabled, which means the starting directory is normally
auto-created if it doesn't exist. You can disable autoCreate and
enable this to ensure the starting directory must exist. Will throw an
exception, if the directory doesn't exist. |
directoryMustExist
|
false
| Similar to startingDirectoryMustExist but this applies during
polling recursive sub-directories. |
scheduler
|
null
| Camel 2.12: To use a custom scheduler to trigger the consumer to run. See more details at Poll Enrich. For example, there is a Quartz2 scheduler and a Spring-based scheduler, both of which support CRON expressions. |
backoffMultiplier
|
0
| Camel 2.12: To let the scheduled polling
consumer backoff if there has been a number of subsequent idles/errors
in a row. The multiplier is then the number of polls that will be
skipped before the next actual attempt is happening again. When this
option is in use then backoffIdleThreshold and/or
backoffErrorThreshold must also be configured.
See more details at Poll Enrich. |
backoffIdleThreshold
|
0
| Camel 2.12: The number of subsequent idle polls
that should happen before the backoffMultipler should
kick-in. |
backoffErrorThreshold
|
0
| Camel 2.12: The number of subsequent error
polls (failed due some error) that should happen before the
backoffMultipler should kick-in. |
The default behavior for file consumer is:
The file is locked for the duration of the processing.
After the route has completed, files are moved into the
.camel
subdirectory, so they appear to have been
deleted.
The File Consumer always skips any file whose name starts with a dot, such
as .
, .camel
, .m2
or
.groovy
.
Only files (not directories) are matched for valid filename, when options
such as includeNamePrefix
,
includeNamePostfix
, excludeNamePrefix
,
excludeNamePostfix
, regexPattern
are
used.
Table 10, “File producer options” list the options that can be set on a file producing endpoint.
Table 10. File producer options
Name | Default | Description |
---|---|---|
fileExist
|
Override
| Specifies what to do if a file with the same name already exists. The following values can be specified:
|
tempPrefix
|
null
| This option is used to write the file using a temporary name and then, after the write is complete, rename it to the real name. Can be used to identify files being written and also avoid consumers (not using exclusive read locks) reading in progress files. Is often used by FTP2 when uploading big files. |
tempFileName
|
null
| Camel 2.1: The same as tempPrefix option but offering a more
fine grained control on the naming of the temporary filename as it uses the
File Language. |
keepLastModified
|
false
| Camel 2.2: Will keep the last modified
timestamp from the source file (if any). Will use the
Exchange.FILE_LAST_MODIFIED header to located the
timestamp. This header can contain either a java.util.Date or
long with the timestamp. If the timestamp exists and the
option is enabled it will set this timestamp on the written file. Note: This option only applies to the file producer. You cannot use
this option with any of the ftp producers. |
eagerDeleteTargetFile
|
true
| Camel 2.3: Whether or not to eagerly delete any
existing target file. This option only applies when you use
fileExists=Override and the
tempFileName option as well. You can use this to disable
(set it to false) deleting the target file before the temp file is written. For
example you may write big files and want the target file to exists during the
temp file is being written. This ensure the target file is only deleted until
the very last moment, just before the temp file is being renamed to the target
filename. |
doneFileName
|
null
| Camel 2.6: If provided, then Camel will write a 2nd done file when the original file has been written. The done file will be empty. This option configures what file name to use. Either you can specify a fixed name. Or you can use dynamic placeholders. The done file will always be written in the same folder as the original file. See writing done file section for examples. |
allowNullBody
|
false
| Camel 2.10.1:Specifies whether a null body is allowed during file writing. When set to true, an empty file will be created. When set to false, and attempting to send a null body to the file component, a GenericFileWriteException of 'Cannot write null body to file.' will be thrown. If the `fileExist` option is set to 'Override', then the file will be truncated, and if set to `append` the file will remain unchanged. |
forceWrites
|
true
| Camel 2.10.5/2.11: Specifies hether to force syncing writes to the file system. Disable this option if you do not need this level of guarantee; for example, when writing to logs/audit logs to increase performance. |
Any move or delete operations is executed after (post command) the routing has
completed; so during processing of the Exchange
the file is still
located in the inbox folder.
Lets illustrate this with an example:
from("file://inbox?move=.done").to("bean:handleOrder");
When a file is dropped in the inbox
folder, the file consumer
notices this and creates a new FileExchange
that is routed to the
handleOrder
bean. The bean then processes the
File
object. At this point in time the file is still located in
the inbox
folder. After the bean completes, and thus the route is
completed, the file consumer will perform the move operation and move the file to the
.done
sub-folder.
The move and preMove options is considered as a directory name (though if you use an expression such as File Language, or Simple then the result of the expression evaluation is the file name to be used - eg if you set
move=../backup/copy-of-${file:name}
then that's using the File Language which we use return the file name to be used), which can be either relative or absolute. If relative, the directory is created as a sub-folder from within the folder where the file was consumed.
By default, Apache Camel will move consumed files to the .camel
sub-folder relative to the directory where the file was consumed.
If you want to delete the file after processing, the route should be:
from("file://inobox?delete=true").to("bean:handleOrder");
We have introduced a pre move operation to move files before they are processed. This allows you to mark which files have been scanned as they are moved to this sub folder before being processed.
from("file://inbox?preMove=inprogress").to("bean:handleOrder");
You can combine the pre move and the regular move:
from("file://inbox?preMove=inprogress&move=.done").to("bean:handleOrder");
So in this situation, the file is in the inprogress
folder when
being processed and after it's processed, it's moved to the .done
folder.
The move and preMove
option is Expression-based, so we have the full power
of the File Language
to do advanced configuration of the directory and name pattern. Apache Camel will, in fact,
internally convert the directory name you enter into a File Language
expression. So when we enter move=.done
Apache Camel will convert this
into: ${file:parent}/.done/${file:onlyname}
. This is only done if Apache Camel
detects that you have not provided a ${ }
in the option value yourself. So
when you enter an expression containing ${ }
, the expression is interpreted
as a File Language expression.
So if we want to move the file into a backup folder with today's date as the pattern, we can do:
move=backup/${date:now:yyyyMMdd}/${file:name}
The moveFailed
option allows you to move files that could not be processed succesfully to another location such
as a error folder of your choice. For example to move the files in an error folder with
a timestamp you can use
moveFailed=/error/${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:name.ext}
.
See more examples at File Language.
The following headers are supported by this component:
Table 11. File producer headers
Header | Description |
---|---|
CamelFileName
| Specifies the name of the file to write (relative to the endpoint
directory). The name can be a String ; a
String with a File
Language or Simple expression;
or an Expression object. If it's
null then Apache Camel will auto-generate a filename
based on the message unique ID. |
CamelFileNameProduced
| The actual absolute filepath (path + name) for the output file that was written. This header is set by Camel and its purpose is providing end-users with the name of the file that was written. |
CamelOverruleFileName
| Camel 2.11: Is used for overruling
CamelFileName header and use the value instead (but only
once, as the producer will remove this header after writing the file). The value
can be only be a String. Notice that if the option fileName
has been configured, then this is still being evaluated. |
Table 12. File consumer headers
Header | Description |
---|---|
CamelFileName
| Name of the consumed file as a relative file path with offset from the starting directory configured on the endpoint. |
CamelFileNameOnly
| Only the file name (the name with no leading paths). |
CamelFileAbsolute
| A boolean option specifying whether the consumed
file denotes an absolute path or not. Should normally be
false for relative paths. Absolute paths should
normally not be used but we added to the move option to allow moving
files to absolute paths. But can be used elsewhere as well. |
CamelFileAbsolutePath
| The absolute path to the file. For relative files this path holds the relative path instead. |
CamelFilePath
| The file path. For relative files this is the starting directory + the relative filename. For absolute files this is the absolute path. |
CamelFileRelativePath
| The relative path. |
CamelFileParent
| The parent path. |
CamelFileLength
| A long value containing the file size. |
CamelFileLastModified
| A Date value containing the last modified
timestamp of the file. |
As the file consumer is BatchConsumer
it supports batching the
files it polls. By batching it means that Apache Camel will add some properties to the
exchange so you know the number of files polled the current index in that order.
Table 13. Exchange properties used by a file consumer
Property | Description |
---|---|
CamelBatchSize
| The total number of files that was polled in this batch. |
CamelBatchIndex
| The current index of the batch. Starts from 0. |
CamelBatchComplete
| A boolean value indicating the last exchange in
the batch. Is only true for the last entry. |
This allows you, for example, to know how many files exists in this batch and for instance let the Aggregator aggregate this number of files.
When Apache Camel is producing files (writing files) there are a few gotchas affecting how
to set a filename of your choice. By default, Apache Camel will use the message ID as the
filename, and since the message ID is normally a unique generated ID, you will end up
with filenames such as: ID-MACHINENAME-2443-1211718892437-1-0
. If
such a filename is not desired, then you must provide a filename in the
CamelFileName
message header. The constant,
Exchange.FILE_NAME
, can also be used.
The sample code below produces files using the message ID as the filename:
from("direct:report").to("file:target/reports");
To use report.txt
as the filename you have to do:
from("direct:report").setHeader(Exchange.FILE_NAME, constant("report.txt")).to( "file:target/reports");
Or the same as above, but with CamelFileName
:
from("direct:report").setHeader("CamelFileName", constant("report.txt")).to( "file:target/reports");
And a syntax where we set the filename on the endpoint with the fileName URI option.
from("direct:report").to("file:target/reports/?fileName=report.txt");
Filename can be set either using the expression
option or as a string-based File Language expression in the CamelFileName
header. See
the File Language for
syntax and samples.
Beware if you consume files from a folder where other applications write files
directly. Take a look at the different readLock
options to see what
suits your use cases. The best approach is however to write to another folder and after
the write move the file in the drop folder. However if you write files directly to the
drop folder then the option changed
could better detect whether a
file is currently being written/copied as it uses a file changed algorithm to see
whether the file size / modification changes over a period of time. The other read lock
options rely on Java File API that sadly is not always very good at detecting this. You
may also want to look at the doneFileName
option, which uses a marker
file (done) to signal when a file is done and ready to be consumed.
Available as of Camel 2.6
See also section writing done files below.
If you want only to consume files when a done file exists, then you can use the
doneFileName
option on the endpoint.
from("file:bar?doneFileName=done");
Will only consume files from the bar folder, if a file name done exists in the same directory as the target files. Camel will automatically delete the done file when it's done consuming the files.
However its more common to have one done file per target file. This means there is a
1:1 correlation. To do this you must use dynamic placeholders in the
doneFileName
option. Currently Camel supports the following two
dynamic tokens: file:name
and file:name.noext
which must be enclosed in ${ }. The consumer only supports the static part of the done
file name as either prefix or suffix (not both).
from("file:bar?doneFileName=${file:name}.done");
In this example only files will be polled if there exists a done file with the name file name.done. For example
hello.txt
- is the file to be consumed
hello.txt.done
- is the associated done file
You can also use a prefix for the done file, such as:
from("file:bar?doneFileName=ready-${file:name}");
hello.txt
- is the file to be consumed
ready-hello.txt
- is the associated done file
Available as of Camel 2.6
After you have written af file you may want to write an additional
done file as a kinda of marker, to indicate to others that the
file is finished and has been written. To do that you can use the
doneFileName
option on the file producer endpoint.
.to("file:bar?doneFileName=done");
Will simply create a file named done
in the same directory as the
target file.
However its more common to have one done file per target file. This means there is a
1:1 correlation. To do this you must use dynamic placeholders in the
doneFileName
option. Currently Camel supports the following two
dynamic tokens: file:name
and file:name.noext
which must be enclosed in ${ }
.
.to("file:bar?doneFileName=done-${file:name}");
Will for example create a file named done-foo.txt
if the target
file was foo.txt
in the same directory as the target file.
.to("file:bar?doneFileName=${file:name}.done");
Will for example create a file named foo.txt.done
if the target
file was foo.txt
in the same directory as the target file.
.to("file:bar?doneFileName=${file:name.noext}.done");
Will for example create a file named foo.done
if the target file
was foo.txt
in the same directory as the target file.
from("file://inputdir/?delete=true").to("file://outputdir")
from("file://inputdir/?delete=true").to("file://outputdir?overruleFile=copy-of-${file:name}")
Listen on a directory and create a message for each file dropped there. Copy the
contents to the outputdir
and delete the file in the
inputdir
.
from("file://inputdir/?recursive=true&delete=true").to("file://outputdir")
Listen on a directory and create a message for each file dropped there. Copy the
contents to the outputdir
and delete the file in the
inputdir
. Will scan recursively into sub-directories. Will lay
out the files in the same directory structure in the outputdir
as the
inputdir
, including any sub-directories.
inputdir/foo.txt inputdir/sub/bar.txt
Will result in the following output layout:
outputdir/foo.txt outputdir/sub/bar.txt
If you want to store the files in the outputdir directory in the same directory,
disregarding the source directory layout (e.g. to flatten out the path), you just add
the flatten=true
option on the file producer side:
from("file://inputdir/?recursive=true&delete=true").to("file://outputdir?flatten=true")
Will result in the following output layout:
outputdir/foo.txt outputdir/bar.txt
Apache Camel will by default move any processed file into a .camel
subdirectory in the directory the file was consumed from.
from("file://inputdir/?recursive=true&delete=true").to("file://outputdir")
Affects the layout as follows: before
inputdir/foo.txt inputdir/sub/bar.txt
after
inputdir/.camel/foo.txt inputdir/sub/.camel/bar.txt outputdir/foo.txt outputdir/sub/bar.txt
from("file://inputdir/").process(new Processor() { public void process(Exchange exchange) throws Exception { Object body = exchange.getIn().getBody(); // do some business logic with the input body } });
The body will be a File
object that points to the file that was
just dropped into the inputdir
directory.
from("file://inputdir/").convertBodyTo(String.class).to("jms:test.queue")
By default the file endpoint sends a FileMessage
which contains a
File
object as the body. If you send this directly to the JMS
component the JMS message will only contain the File
object but not
the content. By converting the File
to a String
,
the message will contain the file contents, which is probably what you want.
The route above using Spring DSL:
<route> <from uri="file://inputdir/"/> <convertBodyTo type="java.lang.String"/> <to uri="jms:test.queue"/> </route>
Apache Camel is of course also able to write files, i.e. produce files. In the sample below we receive some reports on the SEDA queue that we process before they are written to a directory.
public void testToFile() throws Exception { MockEndpoint mock = getMockEndpoint("mock:result"); mock.expectedMessageCount(1); mock.expectedFileExists("target/test-reports/report.txt"); template.sendBody("direct:reports", "This is a great report"); assertMockEndpointsSatisfied(); } protected JndiRegistry createRegistry() throws Exception { // bind our processor in the registry with the given id JndiRegistry reg = super.createRegistry(); reg.bind("processReport", new ProcessReport()); return reg; } protected RouteBuilder createRouteBuilder() throws Exception { return new RouteBuilder() { public void configure() throws Exception { // the reports from the seda queue is processed by our processor // before they are written to files in the target/reports directory from("direct:reports").processRef("processReport").to("file://target/test-reports", "mock:result"); } }; } private static class ProcessReport implements Processor { public void process(Exchange exchange) throws Exception { String body = exchange.getIn().getBody(String.class); // do some business logic here // set the output to the file exchange.getOut().setBody(body); // set the output filename using java code logic, notice that this is done by setting // a special header property of the out exchange exchange.getOut().setHeader(Exchange.FILE_NAME, "report.txt"); } }
Exchange.FILE_NAME
Using a single route, it is possible to write a file to any number of subdirectories. If you have a route setup as such:
<route> <from uri="bean:myBean"/> <to uri="file:/rootDirectory"/> </route>
You can have myBean
set the header
Exchange.FILE_NAME
to values such as:
Exchange.FILE_NAME = hello.txt => /rootDirectory/hello.txt Exchange.FILE_NAME = foo/bye.txt => /rootDirectory/foo/bye.txt
This allows you to have a single route to write files to multiple destinations.
In this sample we want to move consumed files to a backup folder using today's date as a sub-folder name:
from("file://inbox?move=backup/${date:now:yyyyMMdd}/${file:name}").to("...");
See File Language for more samples.
Apache Camel supports Idempotent
Consumer directly within the component so it will skip already
processed files. This feature can be enabled by setting the
idempotent=true
option.
from("file://inbox?idempotent=true").to("...");
Camel uses the absolute file name as the idempotent key, to detect duplicate files. From Camel 2.11 onwards you can customize this key by using an expression in the idempotentKey option. For example to use both the name and the file size as the key
<route> <from uri="file://inbox?idempotent=true&dempotentKey=${file:name}-${file-size}"/> <to uri="bean:processInbox"/> </route>
By default Apache Camel uses an in-memory based store for keeping track of consumed files,
it uses a least recently used cache holding up to 1000 entries. You can plugin your own
implementation of this store by using the idempotentRepository
option
using the #
sign in the value to indicate it's a referring to a bean
in the Registry with the specified
id
.
<!-- define our store as a plain spring bean --> <bean id="myStore" class="com.mycompany.MyIdempotentStore"/> <route> <from uri="file://inbox?idempotent=true&dempotentRepository=#myStore"/> <to uri="bean:processInbox"/> </route>
Apache Camel will log at DEBUG
level if it skips a file because it has
been consumed before:
DEBUG FileConsumer is idempotent and the file has been consumed before. Will skip this file: target\idempotent\report.txt
In this section we will use the file based idempotent repository
org.apache.camel.processor.idempotent.FileIdempotentRepository
instead of the in-memory based that is used as default. This repository uses a 1st level
cache to avoid reading the file repository. It will only use the file repository to
store the content of the 1st level cache. Thereby the repository can survive server
restarts. It will load the content of the file into the 1st level cache upon startup.
The file structure is very simple as it stores the key in separate lines in the file. By
default, the file store has a size limit of 1mb and when the file grows larger, Apache Camel
will truncate the file store and rebuild the content by flushing the 1st level cache
into a fresh empty file.
We configure our repository using Spring XML creating our file idempotent repository
and define our file consumer to use our repository with the
idempotentRepository
using \#
sign to indicate
Registry lookup:
<!-- this is our file based idempotent store configured to use the .filestore.dat as file --> <bean id="fileStore" class="org.apache.camel.processor.idempotent.FileIdempotentRepository"> <!-- the filename for the store --> <property name="fileStore" value="target/fileidempotent/.filestore.dat"/> <!-- the max filesize in bytes for the file. Apache Camel will trunk and flush the cache if the file gets bigger --> <property name="maxFileStoreSize" value="512000"/> <!-- the number of elements in our store --> <property name="cacheSize" value="250"/> </bean> <camelContext xmlns="http://camel.apache.org/schema/spring"> <route> <from uri="file://target/fileidempotent/?idempotent=true&dempotentRepository=#fileStore&ove=done/${file:name}"/> <to uri="mock:result"/> </route> </camelContext>
In this section we will use the JPA based idempotent repository instead of the in-memory based that is used as default.
First we need a persistence-unit in META-INF/persistence.xml
where
we need to use the class
org.apache.camel.processor.idempotent.jpa.MessageProcessed
as
model.
<persistence-unit name="idempotentDb" transaction-type="RESOURCE_LOCAL"> <class>org.apache.camel.processor.idempotent.jpa.MessageProcessed</class> <properties> <property name="openjpa.ConnectionURL" value="jdbc:derby:target/idempotentTest;create=true"/> <property name="openjpa.ConnectionDriverName" value="org.apache.derby.jdbc.EmbeddedDriver"/> <property name="openjpa.jdbc.SynchronizeMappings" value="buildSchema"/> <property name="openjpa.Log" value="DefaultLevel=WARN, Tool=INFO"/> </properties> </persistence-unit>
Then we need to setup a Spring jpaTemplate
in the spring XML
file:
<!-- this is standard spring JPA configuration --> <bean id="jpaTemplate" class="org.springframework.orm.jpa.JpaTemplate"> <property name="entityManagerFactory" ref="entityManagerFactory"/> </bean> <bean id="entityManagerFactory" class="org.springframework.orm.jpa.LocalEntityManagerFactoryBean"> <!-- we use idempotentDB as the persitence unit name defined in the persistence.xml file --> <property name="persistenceUnitName" value="idempotentDb"/> </bean>
And finally we can create our JPA idempotent repository in the spring XML file as well:
<!-- we define our jpa based idempotent repository we want to use in the file consumer --> <bean id="jpaStore" class="org.apache.camel.processor.idempotent.jpa.JpaMessageIdRepository"> <!-- Here we refer to the spring jpaTemplate --> <constructor-arg index="0" ref="jpaTemplate"/> <!-- This 2nd parameter is the name (= a cateogry name). You can have different repositories with different names --> <constructor-arg index="1" value="FileConsumer"/> </bean>
And then we just need to reference the jpaStore bean
in the file consumer endpoint, using the idempotentRepository
option and
the #
syntax:
<route> <from uri="file://inbox?idempotent=true&dempotentRepository=#jpaStore"/> <to uri="bean:processInbox"/> </route>
Apache Camel supports pluggable filtering strategies. You can then configure the endpoint with such a filter to skip certain files being processed.
In the sample we have built our own filter that skips files starting with
skip
in the filename:
public class MyFileFilter implements GenericFileFilter { public boolean accept(GenericFile pathname) { // we dont accept any files starting with skip in the name return !pathname.getFileName().startsWith("skip"); } }
And then we can configure our route using the filter
attribute to reference our filter (using #
notation) that we have
defined in the spring XML file:
<!-- define our sorter as a plain spring bean --> <bean id="myFilter" class="com.mycompany.MyFileSorter"/> <route> <from uri="file://inbox?filter=#myFilter"/> <to uri="bean:processInbox"/> </route>
The ANT path matcher is shipped out-of-the-box in the camel-spring jar. So you need to depend on camel-spring if you are using Maven. The reasons is that we leverage Spring's AntPathMatcher to do the actual matching.
The file paths is matched with the following rules:
?
matches one character
*
matches zero or more characters
**
matches zero or more directories in a path
The sample below demonstrates how to use it:
<camelContext xmlns="http://camel.apache.org/schema/spring"> <template id="camelTemplate"/> <!-- use myFilter as filter to allow setting ANT paths for which files to scan for --> <endpoint id="myFileEndpoint" uri="file://target/antpathmatcher?recursive=true&ilter=#myAntFilter"/> <route> <from ref="myFileEndpoint"/> <to uri="mock:result"/> </route> </camelContext> <!-- we use the antpath file filter to use ant paths for includes and exlucde --> <bean id="myAntFilter" class="org.apache.camel.component.file.AntPathMatcherGenericFileFilter"> <!-- include and file in the subfolder that has day in the name --> <property name="includes" value="**/subfolder/**/*day*"/> <!-- exclude all files with bad in name or .xml files. Use comma to seperate multiple excludes --> <property name="excludes" value="**/*bad*,**/*.xml"/> </bean>
Apache Camel supports pluggable sorting strategies. This strategy it to use the build in
java.util.Comparator
in Java. You can then configure the endpoint
with such a comparator and have Apache Camel sort the files before being processed.
In the sample we have built our own comparator that just sorts by file name:
public class MyFileSorter implements Comparator<GenericFile> { public int compare(GenericFile o1, GenericFile o2) { return o1.getFileName().compareToIgnoreCase(o2.getFileName()); } }
And then we can configure our route using the sorter
option to reference to our sorter (mySorter
) we have defined in the
spring XML file:
<!-- define our sorter as a plain spring bean --> <bean id="mySorter" class="com.mycompany.MyFileSorter"/> <route> <from uri="file://inbox?sorter=#mySorter"/> <to uri="bean:processInbox"/> </route>
Apache Camel supports pluggable sorting strategies. This strategy it to use the File Language to configure the sorting. The
sortBy
option is configured as follows:
sortBy=group 1;group 2;group 3;...
Where each group is separated with semi colon. In the simple situations you just use one group, so a simple example could be:
sortBy=file:name
This will sort by file name, you can reverse the order by prefixing
reverse:
to the group, so the sorting is now Z..A:
sortBy=reverse:file:name
As we have the full power of File Language we can use some of the other parameters, so if we want to sort by file size we do:
sortBy=file:length
You can configure to ignore the case, using ignoreCase:
for string
comparison, so if you want to use file name sorting but to ignore the case then we
do:
sortBy=ignoreCase:file:name
You can combine ignore case and reverse, however reverse must be specified first:
sortBy=reverse:ignoreCase:file:name
In the sample below we want to sort by last modified file, so we do:
sortBy=file:modifed
And then we want to group by name as a 2nd option so files with same modifcation is sorted by name:
sortBy=file:modifed;file:name
Now there is an issue here, can you spot it? Well the modified timestamp of the file is too fine as it will be in milliseconds, but what if we want to sort by date only and then subgroup by name? Well as we have the true power of File Language we can use the its date command that supports patterns. So this can be solved as:
sortBy=date:file:yyyyMMdd;file:name
Yeah, that is pretty powerful, oh by the way you can also use reverse per group, so we could reverse the file names:
sortBy=date:file:yyyyMMdd;reverse:file:name
The option processStrategy
can be used to use a custom
GenericFileProcessStrategy
that allows you to implement your own
begin, commit and
rollback logic. For instance lets assume a system writes a file
in a folder you should consume. But you should not start consuming the file before
another ready file have been written as well.
So by implementing our own GenericFileProcessStrategy
we can
implement this as:
In the begin()
method we can test whether the special
ready file exists. The begin method returns a
boolean
to indicate if we can consume the file or
not.
in the commit()
method we can move the actual file and also
delete the ready file.
![]() | Important |
---|---|
When using |