public abstract class FastqRecordReader extends org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_MAX_READ_LENGTH
Default maximum read length,
10,000 bp. |
protected long |
end
First index value beyond the slice, i.e.
|
protected boolean |
isCompressed
True if the underlying data is compressed.
|
protected boolean |
isSplittable
True if the underlying data is splittable.
|
static String |
MAX_READ_LENGTH_PROPERTY
Maximum read length property name.
|
protected long |
pos
Current position in file.
|
Modifier | Constructor and Description |
---|---|
protected |
FastqRecordReader(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.mapreduce.lib.input.FileSplit split)
Builds a new record reader given a config file and an input split.
|
Modifier and Type | Method and Description |
---|---|
protected abstract boolean |
checkBuffer(int bufferLength,
org.apache.hadoop.io.Text buffer)
Checks to see whether the buffer is positioned at a valid record.
|
void |
close()
Close this RecordReader to future operations.
|
Void |
getCurrentKey()
FASTQ has no keys, so we return null.
|
org.apache.hadoop.io.Text |
getCurrentValue()
Returns the last interleaved FASTQ record.
|
float |
getProgress()
How much of the input has the RecordReader consumed?
|
void |
initialize(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context) |
protected boolean |
lowLevelFastqRead(org.apache.hadoop.io.Text readName,
org.apache.hadoop.io.Text value)
Parses a read from an interleaved FASTQ file.
|
protected String |
makePositionMessage()
Produces a debugging message with the file position.
|
protected abstract boolean |
next(org.apache.hadoop.io.Text value)
Reads from the input split.
|
boolean |
nextKeyValue()
Seeks ahead in our split to the next key-value pair.
|
protected int |
positionAtFirstRecord(org.apache.hadoop.fs.FSDataInputStream stream,
org.apache.hadoop.io.compress.CompressionCodec codec)
Position the input stream at the start of the first record.
|
static void |
setMaxReadLength(org.apache.hadoop.conf.Configuration conf,
int maxReadLength)
Set the maximum read length property to
maxReadLength . |
public static final int DEFAULT_MAX_READ_LENGTH
10,000
bp.public static final String MAX_READ_LENGTH_PROPERTY
protected long end
protected long pos
protected boolean isSplittable
protected boolean isCompressed
protected FastqRecordReader(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.mapreduce.lib.input.FileSplit split) throws IOException
conf
- The Hadoop configuration object. Used for gaining access
to the underlying file system.split
- The file split to read.IOException
public static void setMaxReadLength(org.apache.hadoop.conf.Configuration conf, int maxReadLength)
maxReadLength
.conf
- configurationmaxReadLength
- maximum read length, in base pairs (bp)protected abstract boolean checkBuffer(int bufferLength, org.apache.hadoop.io.Text buffer)
bufferLength
- The length of the line currently in the buffer.buffer
- A buffer containing a peek at the first line in the current
stream.protected final int positionAtFirstRecord(org.apache.hadoop.fs.FSDataInputStream stream, org.apache.hadoop.io.compress.CompressionCodec codec) throws IOException
stream
- The stream to reposition.IOException
public final void initialize(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException
initialize
in class org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
IOException
InterruptedException
public final Void getCurrentKey()
getCurrentKey
in class org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
public final org.apache.hadoop.io.Text getCurrentValue()
getCurrentValue
in class org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
public final boolean nextKeyValue() throws IOException, InterruptedException
nextKeyValue
in class org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
IOException
InterruptedException
public final void close() throws IOException
close
in interface Closeable
close
in interface AutoCloseable
close
in class org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
IOException
public final float getProgress()
getProgress
in class org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
protected final String makePositionMessage()
protected final boolean lowLevelFastqRead(org.apache.hadoop.io.Text readName, org.apache.hadoop.io.Text value) throws IOException
readName
- Text record containing read name. Output parameter.value
- Text record containing full record. Output parameter.RuntimeException
- Throws exception if FASTQ record doesn't
have proper formatting (e.g., record doesn't start with @).IOException
protected abstract boolean next(org.apache.hadoop.io.Text value) throws IOException
value
- Text record to write input value into.IOException
lowLevelFastqRead(Text, Text)
Copyright © 2020. All rights reserved.