Package org.bdgenomics.adam.io
Class FastqRecordReader
- java.lang.Object
-
- org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
-
- org.bdgenomics.adam.io.FastqRecordReader
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public abstract class FastqRecordReader extends org.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
A record reader for the interleaved FASTQ format. Reads over an input file and parses interleaved FASTQ read pairs into a single Text output. This is then fed into the FastqConverter, which converts the single Text instance into two Alignments.
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_MAX_READ_LENGTH
Default maximum read length,10,000
bp.protected long
end
First index value beyond the slice, i.e.protected boolean
isCompressed
True if the underlying data is compressed.protected boolean
isSplittable
True if the underlying data is splittable.static String
MAX_READ_LENGTH_PROPERTY
Maximum read length property name.protected long
pos
Current position in file.
-
Constructor Summary
Constructors Modifier Constructor Description protected
FastqRecordReader(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.mapreduce.lib.input.FileSplit split)
Builds a new record reader given a config file and an input split.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract boolean
checkBuffer(int bufferLength, org.apache.hadoop.io.Text buffer)
Checks to see whether the buffer is positioned at a valid record.void
close()
Close this RecordReader to future operations.Void
getCurrentKey()
FASTQ has no keys, so we return null.org.apache.hadoop.io.Text
getCurrentValue()
Returns the last interleaved FASTQ record.float
getProgress()
How much of the input has the RecordReader consumed?void
initialize(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
protected boolean
lowLevelFastqRead(org.apache.hadoop.io.Text readName, org.apache.hadoop.io.Text value)
Parses a read from an interleaved FASTQ file.protected String
makePositionMessage()
Produces a debugging message with the file position.protected abstract boolean
next(org.apache.hadoop.io.Text value)
Reads from the input split.boolean
nextKeyValue()
Seeks ahead in our split to the next key-value pair.protected int
positionAtFirstRecord(org.apache.hadoop.fs.FSDataInputStream stream, org.apache.hadoop.io.compress.CompressionCodec codec)
Position the input stream at the start of the first record.static void
setMaxReadLength(org.apache.hadoop.conf.Configuration conf, int maxReadLength)
Set the maximum read length property tomaxReadLength
.
-
-
-
Field Detail
-
DEFAULT_MAX_READ_LENGTH
public static final int DEFAULT_MAX_READ_LENGTH
Default maximum read length,10,000
bp.- See Also:
- Constant Field Values
-
MAX_READ_LENGTH_PROPERTY
public static final String MAX_READ_LENGTH_PROPERTY
Maximum read length property name.- See Also:
- Constant Field Values
-
end
protected long end
First index value beyond the slice, i.e. slice is in range [start,end).
-
pos
protected long pos
Current position in file.
-
isSplittable
protected boolean isSplittable
True if the underlying data is splittable.
-
isCompressed
protected boolean isCompressed
True if the underlying data is compressed.
-
-
Constructor Detail
-
FastqRecordReader
protected FastqRecordReader(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.mapreduce.lib.input.FileSplit split) throws IOException
Builds a new record reader given a config file and an input split.- Parameters:
conf
- The Hadoop configuration object. Used for gaining access to the underlying file system.split
- The file split to read.- Throws:
IOException
-
-
Method Detail
-
setMaxReadLength
public static void setMaxReadLength(org.apache.hadoop.conf.Configuration conf, int maxReadLength)
Set the maximum read length property tomaxReadLength
.- Parameters:
conf
- configurationmaxReadLength
- maximum read length, in base pairs (bp)
-
checkBuffer
protected abstract boolean checkBuffer(int bufferLength, org.apache.hadoop.io.Text buffer)
Checks to see whether the buffer is positioned at a valid record.- Parameters:
bufferLength
- The length of the line currently in the buffer.buffer
- A buffer containing a peek at the first line in the current stream.- Returns:
- Returns true if the buffer contains the first line of a properly formatted FASTQ record.
-
positionAtFirstRecord
protected final int positionAtFirstRecord(org.apache.hadoop.fs.FSDataInputStream stream, org.apache.hadoop.io.compress.CompressionCodec codec) throws IOException
Position the input stream at the start of the first record.- Parameters:
stream
- The stream to reposition.- Throws:
IOException
-
initialize
public final void initialize(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException
- Specified by:
initialize
in classorg.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
- Throws:
IOException
InterruptedException
-
getCurrentKey
public final Void getCurrentKey()
FASTQ has no keys, so we return null.- Specified by:
getCurrentKey
in classorg.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
- Returns:
- Always returns null.
-
getCurrentValue
public final org.apache.hadoop.io.Text getCurrentValue()
Returns the last interleaved FASTQ record.- Specified by:
getCurrentValue
in classorg.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
- Returns:
- The text corresponding to the last read pair.
-
nextKeyValue
public final boolean nextKeyValue() throws IOException, InterruptedException
Seeks ahead in our split to the next key-value pair. Triggers the read of an interleaved FASTQ read pair, and populates internal state.- Specified by:
nextKeyValue
in classorg.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
- Returns:
- True if reading the next read pair succeeded.
- Throws:
IOException
InterruptedException
-
close
public final void close() throws IOException
Close this RecordReader to future operations.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in classorg.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
- Throws:
IOException
-
getProgress
public final float getProgress()
How much of the input has the RecordReader consumed?- Specified by:
getProgress
in classorg.apache.hadoop.mapreduce.RecordReader<Void,org.apache.hadoop.io.Text>
- Returns:
- Returns a value on [0.0, 1.0] that notes how many bytes we have read so far out of the total bytes to read.
-
makePositionMessage
protected final String makePositionMessage()
Produces a debugging message with the file position.- Returns:
- Returns a string containing {filename}:{index}.
-
lowLevelFastqRead
protected final boolean lowLevelFastqRead(org.apache.hadoop.io.Text readName, org.apache.hadoop.io.Text value) throws IOException
Parses a read from an interleaved FASTQ file. Only reads a single record.- Parameters:
readName
- Text record containing read name. Output parameter.value
- Text record containing full record. Output parameter.- Returns:
- Returns true if read was successful (did not hit EOF).
- Throws:
RuntimeException
- Throws exception if FASTQ record doesn't have proper formatting (e.g., record doesn't start with @).IOException
-
next
protected abstract boolean next(org.apache.hadoop.io.Text value) throws IOException
Reads from the input split.- Parameters:
value
- Text record to write input value into.- Returns:
- Returns whether this read was successful or not.
- Throws:
IOException
- See Also:
lowLevelFastqRead(Text, Text)
-
-